Two of the hottest topics in the last two years in the data management / analytics space are without a doubt the Data Fabric and the Data Mesh. There’s been a lot of confusion about them in terms of what they are. Frameworks or products? Why do we need them? Which one will survive the infamous Gartner Hype Cycle? Well, let me try to make things a bit simpler for everyone in this series of blog posts. I realize that the minute I said “series” it probably didn’t sound simple. But, bear with me as I believe this can be summarized in a non-self-promoting and direct manner, so we can all just understand and figure out what we all need or want to do.
Before we get too far, I’m going to give a spoiler alert…they are not in conflict with each other. This means you don’t necessarily have to choose 1 over the other. Yes, you heard that right. A Data Fabric is not, nor does it have to be, a hard alternative to the Data Mesh and vice versa. They both have much more in common than most realize and actually can work in unison. But, to understand why that is, we will need to cover some ground.
Definition: what is a Data Mesh?
-
Decentralization
-
Domain-Oriented
-
Self-Service
-
Data Platform Scalability
Where do you usually apply a Data Mesh?
A Data Mesh is applied in large organizations with multiple departments using many large databases, data lakes, and data warehouses. Think complex ETL/ELT, with hundreds or thousands of data engineers/data scientists/and data analysts.
What’s the big concept the Data Mesh prompts?
Generally speaking, the Data Mesh promotes two major concepts–data products and domain-based ownership–which I’ll briefly discuss at a high level.
Data Products or Data as a Product. Think of this as data outputs such as datasets, queries, and models. They’re not too different from, say, a data mart or cube, but more for the modern world where these “products” are easier and faster to generate and/or refresh. Consequently, data products don’t require armies of people to refresh and are thus much more easily consumable.
Domain, Domain, and Domain. The Data Mesh emphasizes the importance of the notion of where/who the data belongs to. Being isolated by domains, there is more agility in being able to find and access the data for self-service.
Now, what is a Data Fabric?
In short, the Data Fabric is a single product or framework that was previously known as the Modern Data Stack. There, I said it. The Modern Data Stack was a great idea in that it ushered in a new way to think about a decentralized or “best of breed” approach to architecting one’s data management environment.
However, after 2+ years, we’ve learned a painful and expensive lesson. In reality, only the biggest and richest companies can afford to buy and implement so many different products. Only these companies have the staff of hundreds or thousands to handle the complexity of piecing together so many different products.
But, what if you could combine those modern approaches in…ready for it…one PRODUCT? Yes, one product.
That is the whole idea behind what the Data Fabric really is. More specifically, the Data Fabric is an integrated data infrastructure that enables a seamless combination of?
-
Data discovery
-
Data access
-
Data integration
-
…and some AI or automation to help put it all together
The goal of the Data Fabric is to provide a unified and consistent view of data, simplifying data exploration, analysis, and collaboration for users.
That’s the product definition of a Data Fabric. In the framework definition, it’s a design that leverages various technologies and tools–such as data virtualization, data integration platforms, and data catalogs–to create a flexible, scalable, and adaptable data environment. In short the Data Fabric, when defined as a framework, is that of a more tightly integrated and simpler Modern Data Stack.
With this definition established, the question is: does one build a Data Fabric, or buy one? I’ll begin to answer that question in my next post, when I discuss what makes up a data fabric, how these components differ from a data mesh, and why it matters.