✅ Organizing data so that it becomes understandable to work with.
⏩ All software applications that rely on data have a data model. Depending on the goal the application serves, it needs a specific type.
When people in analytics talk about data modelling, they usually dive right into the technical aspects. Into the pros and cons of data vault vs star schema modelling, for example. Providing a broad definition can provide some context to avoid getting stuck in the details.
Relational – While serving your clients, you go through a process. When you put information in a system during this process, data comes to fruition. For example, you put client data in a CRM, orders in a sales system, invoices in a finance system, and so on. Operational systems, storing this type of information, usually use relational/transactional data models.
Dimensional – If you want transversal insights and analytics based on data in your operational systems, you probably need a data warehouse. The models in a data warehouse are mostly dimensional data models. Star schema is a specific type of dimensional modelling.
Graph – For storing nformation about connected entities (people, companies, traits, behaviors) a graph schema is usually the most flexible modelling option.
Is data modelling dead? ☠️
Sometimes, data modelling is declared dead. Just store everything, read and process it when needed. In my eyes, this approach has a lot of flexibility, but sooner or later it will bite you. You may never have heard of data modelling, or at least not of this general way of looking at it, but it is everywhere.
Think about all those reports that live in Excels in the different departments. They are full of data, organized to make it understandable. And in fact, they do support a lot of important business decisions. There’s a funny saying that Excel is probably the biggest warehouse on earth.
Or if you are querying your data warehouse separately for every single chart or table in a dashboard. You’re still organizing your data to make it understandable. You’re just doing it very inefficiently. In a way that’s neither scalable nor maintainable.
So when people claim they don’t do data modelling, they mean they don’t do it with intention.
✔️ … you like an efficient and effective way to get value from your data.
✔️ … you don’t want to just store all data you can get your hands on.
✔️ … you don’t want to write a new query for every question you get from the business.
You better think about modelling your data right. This starts with a conceptual understanding of the data you need to support your goal. And not by spinning up a database or data warehouse.
Your data is need being modelled, whether you want to or not. And you better be deliberate about it.