10 Ways Data Modelling can go Wrong

This Week: Common model mistakes and what to do about them

Dear Reader…

Perhaps the most underrated skill of a Data Engineer is their ability to pull together data models that work for all the stakeholders in a data project. Your ability to craft and communicate at a conceptual, logical and physical level data models enables teams to excel as they build fit-for-purpose data projects. This week we are going to take a look at some of the fundamentals of Data Modelling and how to improve your Models to get better alignment amongst stakeholders.

Some Key Concepts

Data modelling is the process of creating a visual representation of an information system to communicate connections between data points and structures. It involves defining the structure of data, including entities, attributes, and relationships, to provide a clear and organised view of data elements and their business context.

Data Modelling is the "Organising and standardising data to facilitate believable and useful information and knowledge for humans and machines."

Joe Reis, Co-Author of: Fundamentals of Data Engineering

Data modelling is a core data management discipline that helps organisations use their data effectively to meet business needs for information. It provides a blueprint for designing a new database or reengineering a legacy application by specifying the characteristics of data elements included in applications and in the database or file system structures used to process, store, and manage the data, according Dr. Andrew Ng. The key aspects of data modelling include:

  • Creating a Visual Representation: Data models are visual diagrams that illustrate data entities, their attributes, and the relationships between entities.

  • Defining Business Requirements: Data modeling involves gathering requirements from business stakeholders to ensure that the data model captures the business context of data and meets an organisation's information needs.

  • Specifying Data Structures: Data models define the structure of data, including entities, attributes, and relationships, to provide a clear and organised view of data elements.

  • Supporting Data Management: Data modeling helps establish common data definitions and internal data standards, often in connection with data governance programs, and plays a big role in data architecture processes.

Generally speaking Data Models are broken into three categories each serving a distinct purpose, from high-level conceptualisation to detailed physical implementation.

Conceptual Data Modelling

Conceptual data modelling is the first stage in the data modelling process. It provides a high-level, abstract representation of the data an organization uses or intends to use in its business operations. This type of modelling focuses on identifying the key entities, their relationships, and the essential concepts that are integral to the business model. It is technology-agnostic and aims to create a shared understanding between technology and business teams, enabling clear communications and fostering debate on essential concepts.

Logical Data Modelling

Logical data modeling is a more detailed and refined version of the conceptual model. It describes data elements in detail and is used to develop visual understandings of data entities, attributes, keys, and relationships. This type of modeling is platform-independent and provides a blueprint for the semantic layer in data management systems. It includes specific attributes of each entity, relationships between entities, and the cardinality of those relationships, serving as a foundation for planning and implementing databases and data analysis datasets.

Physical Data Modelling

Physical data modelling represents the actual structure of a database, defining how data is stored, organised, and accessed at the physical level. It provides a detailed view of the database schema, specifying tables, columns, data types, relationships, indexes, and constraints. This type of modelling is technology-specific and is used to guide developers in creating and optimising the actual database, taking into account storage considerations, performance improvement, and specific features of the database management system.

Writer RAG tool: build production-ready RAG apps in minutes

RAG in just a few lines of code? We’ve launched a predefined RAG tool on our developer platform, making it easy to bring your data into a Knowledge Graph and interact with it with AI. With a single API call, writer LLMs will intelligently call the RAG tool to chat with your data.

Integrated into Writer’s full-stack platform, it eliminates the need for complex vendor RAG setups, making it quick to build scalable, highly accurate AI workflows just by passing a graph ID of your data as a parameter to your RAG tool.

10 Ways Data Models can go Astray

1. Starting Too Late in Conceptual Data Modeling

Common Mistake: Delaying conceptual data modelling can lead to misunderstandings and misalignments between business and technical teams. This often occurs when projects rush into detailed design without a clear understanding of the overall data requirements.

Implications: Starting too late can result in significant rework and delays in the project timeline. Without a clear conceptual model, teams may develop data structures that do not align with business needs, leading to costly revisions and potential data integrity issues. Furthermore, this can cause confusion among stakeholders, making it difficult to achieve consensus on critical data elements and relationships.

2. Insufficient Stakeholder Involvement in Conceptual Data Modelling

Mistake: Failing to engage all relevant stakeholders in the conceptual process can result in a blueprint that does not accurately reflect business needs.

Implications: Insufficient stakeholder involvement can lead to a conceptual model that misses critical business requirements, resulting in a data structure that does not support business operations effectively. This can lead to data quality issues, as data may not be captured or stored in a way that truly supports the business. Moreover, it can cause cultural resistance and adoption challenges when the model is implemented.

3. Lack of Clarity in Conceptual Data Modelling

Mistake: Not clearly defining entities and relationships in the conceptual data model can lead to confusion and the risk of compound errors in subsequent development steps.

Implications: A lack of clarity in the conceptual model can result in misunderstandings and misinterpretations by developers and stakeholders, leading to incorrect data structures and relationships. This can cause data inconsistencies, redundancy, and integrity issues, making it difficult to maintain and scale the data system. Additionally, it can lead to prolonged development times and increased costs due to rework and corrections.

4. Inconsistent Naming Patterns in Logical Data Modelling

Mistake: Inconsistent naming conventions in logical data modeling can cause confusion and make models harder to understand and maintain.

Implications: Inconsistent naming patterns can lead to misunderstandings among developers and stakeholders, causing errors in data mapping and integration. This can result in inconsistencies and integrity issues, making it difficult to maintain and scale the system. Overall it can cause delays in development and increase costs due to rework and corrections.

5. Incorrect Granularity in Logical Data Modelling

Mistake: Failing to determine the appropriate level of detail in logical data modeling can lead to models that are either too simplistic or overly complex.

Implications: Incorrect granularity can result in data models that are not optimised for performance and usability. Models that are too simplistic may not capture essential information, leading to data quality issues and limitations in business decision-making. Conversely, overly complex models can be difficult to understand and maintain, leading to increased development times and costs.

6. Not Considering Application Usage in Logical Data Modelling

Mistake: Ignoring how applications will use the data in the logical modelling stage can result in a data model that is not optimised for performance and usability.

Implications: Get this wrong and you end up with data models that do not support business operations effectively. This can manifest itself in data quality issues, as data may not be captured or stored in a way that supports accurate decision-making. Moreover, it can cause performance issues, as data retrieval and manipulation may not be optimised for application needs.

7. Not Separating Keys from Indexes in Physical Data Modelling

Mistake: Confusing keys with indexes in physical data models can lead to inefficient operational performance.

Implications: Not clearly separating keys from Indexes can lead to slow data retrieval times, increased storage requirements, and potential data integrity issues. Furthermore, it can cause difficulties in maintaining and scaling the data system.

8. Rigidly Mapping UI to Data Fields in Physical Data Modelling

Mistake: Directly mapping user interface elements to data fields in physical data modeling can result in inflexible and inefficient data models.

Implications: Rigidly mapping UI to data fields can lead to data models that are not adaptable to changing business needs or UI requirements. This can result in data quality issues, as data may not be captured or stored in a way that supports business decision-making. Moreover, it can cause performance issues, as data retrieval and manipulation may not be optimized for application needs.

9. Failing to Plan for Data Model Evolution in Physical Data Modeling

Mistake: Not considering future changes and growth in physical data modeling can lead to models that are difficult to maintain and scale.

Implications: Failing to plan for data model evolution can result in data structures that are not flexible or scalable, leading to significant rework and delays when changes are needed. This can cause data quality issues, as data may not be captured or stored in a way that supports evolving business needs. Moreover, it can lead to increased costs due to rework and corrections, and potential data integrity issues.

Like this content? Join the conversation at the Data Innovators Exchange.

10. Mixing Conceptual, Logical and Physical Models

Last, but probably most fundamental, is muddling up the three different types of model design. There is a temptation to use Physical Modelling Entity Relationship diagramming tools to describe a Conceptual Model. The challenge though is the Entity Relationship diagrams are well on their way to a Logical Model, meaning that you are one step ahead of where you need to be.

Often times the best way to enrol business stakeholders is to use a whiteboard or “sticky note” collaboration tool - this keeps the conceptual stage simple. Much like an Architect will start with some sketches to test design ideas with clients. Only once a designer has moved through this phase will they draft a logical plan and then move onto a detailed specification. The key, as is case with any complex undertaking is to focus on the process before jumping into a build.

As Kagle puts in his article: Seven Data Modelling Mistakes…

A conceptual model needs to be fluid, needs to allow for input, and most importantly needs to provide a basis for experimentation. This is where you put together scenarios and use cases, and see what breaks. Too often, this area of true design is given short-shrift because it's easier to start drawing boxes into a UML tool, freezing the design long before it's fully tested. Changing it at the logical model stage is harder, at the physical stage (where there are explicit links and dependencies) and in code much harder still.

For a more advanced perspective on common mistakes check out this talk from Juha Korpella from Ellie.ai 

That’s a wrap for this week.
Thank you