Product News: AI enables intelligent semantic search and accelerates the use of large-scale data

Learn more
Glossary

Data lineage

Data lineage (or data traceability) provides full visibility of the data lifecycle inside and outside the organization, including any changes made.

What is data lineage?

As organizations become increasingly data-driven, they have to trust the data that they are working with. Data lineage (also known as data traceability) aims to build this trust by ensuring that there is a full picture of where particular data has come from, how it has been changed, processed, or enriched, where it has been used, who has used it, and where it will go in the future.

Companies need to be able to trace data upstream and downstream back to its original source to ensure quality, good governance and regulatory compliance, all the way to the end of its lifecycle. This helps them see how data is being reused, both inside and outside the organization.

 

Data lineage covers the full data lifecycle:

  • The origins of the data, and whether it is internal or external
  • The level of sensitivity of the data (such as if it contains personal customer information)
  • The systems it has flowed through
  • Any changes that have been made, including enrichment and standardization to meet governance requirements
  • Who it is shared with (internally and externally) and how this is used (such as for business intelligence, and within operational systems)

Data lineage solutions provide a visual representation of the data lifecycle, enabling data administrators to drill down into how it has been created and then transformed/moved and used throughout the organization and wider external ecosystem.

What is the difference between data lineage and data traceability?

The terms data lineage and data traceability are often used interchangeably as there is no real difference between them. They both describe the same process of understanding the data lifecycle and providing full visibility across it.

A third term – data provenance – refers to the origin of the data, i.e. how and where it was created.

Data lineage/data traceability can be broken down into two areas:

  • Business lineage: Looking at how data has been changed from a business perspective. It provides a simplified view of where data comes from, the policies/processes/standards that were applied to it and how it has been used. This gives business users trust in the data when using it in, for example, decision making.
  • Technical lineage: A more in-depth view of how data moves and transforms between systems, tables and columns, that is normally only understandable by technical/IT users. It covers areas such as the applications data flows through, technical transformations, look ups and staging tables. While too complex for business users it is vital to ensuring technical data quality and debugging errors in the data sharing process.

Why is data lineage important?

Data lineage is vital to delivering confidence in the data that is used to power a business. Strong data lineage allows organizations to:

  • Have trust that the data being used for business operations is accurate and high quality, so that any decisions based on it will therefore be valid. As companies increasingly introduce advanced analytics and AI that automate decision-making, traceability becomes even more critical.
  • Ensure data governance by tracking and monitoring how data is used (and by whom).
  • Support compliance by being able to prove that data meets both organizational policies and external privacy regulations, such as GDPR. This makes data lineage a key part of risk management when it comes to data.
  • Securely protect data by understanding the systems it flows through and who has access to it.
  • Enable debugging by highlighting errors that potentially impact data use and flow.
  • Manage technical migrations, such as to the cloud, by modeling data flows and the impact of any technology/system changes on downstream solutions.

What are the challenges to data lineage?

Organizations generate enormous amounts of data, and increasingly add to this with information from partners and their wider ecosystems. This brings five key challenges to data lineage:

  1. Volume and range: the number of different data sources continues to grow as organizations digitize and more and more data-producing devices (such as IoT sensors) are added to their infrastructure. This means that the amount of data an organization has to manage is growing exponentially and all need to be fully traceable across their life cycles.
  2. Speed: data now moves at a much greater velocity within organizations. Whereas in the past weekly or monthly reporting was standard, users now need access to trusted data on a real-time basis.
  3. Compliance: regulators (and consumers) are increasingly focused on ensuring that information, particularly personal data, is used and protected in ways that meet legislation such as the CCPA and GDPR. This adds a further level of importance to traceability to provide an audit trail to regulators as required.
  4. Complexity: All of these factors mean that organizations have a much more complex data environment to manage, again making traceability key.
  5. Collaboration: monitoring data across the organization and more importantly with external partners requires open collaboration between departments and organizations to break down silos.

Want to learn more about our data democratization platform? Contact one of our experts!

Learn more
Metadata management: increase efficiency with Opendatasoft’s customized templates Product
Metadata management: increase efficiency with Opendatasoft’s customized templates

Learn more about the metadata templates available on our data portal solution and how they help to improve data quality and compliance, increase efficiency and save time on a daily basis.

The importance of data portals to accelerating success in transport and mobility Mobility
The importance of data portals to accelerating success in transport and mobility

Driven by the need to decarbonize, increase efficiency and meet changing customer needs, the transport and mobility sector is undergoing a rapid transformation. Data is at the heart of this, with data portals critical to building an effective, sustainable and customer-centric transport ecosystem.

What is a Smart City? A Comprehensive Introduction Data Trends
What is a Smart City? A Comprehensive Introduction

Across the globe cities and municipalities are transforming themselves into smart cities, improving the urban environment for citizens, visitors, and businesses, while boosting efficiency and sustainability. In this blog we explain what a smart city is and how to build one successfully.

Metadata management: increase efficiency with Opendatasoft’s customized templates Product
Metadata management: increase efficiency with Opendatasoft’s customized templates

Learn more about the metadata templates available on our data portal solution and how they help to improve data quality and compliance, increase efficiency and save time on a daily basis.

The importance of data portals to accelerating success in transport and mobility Mobility
The importance of data portals to accelerating success in transport and mobility

Driven by the need to decarbonize, increase efficiency and meet changing customer needs, the transport and mobility sector is undergoing a rapid transformation. Data is at the heart of this, with data portals critical to building an effective, sustainable and customer-centric transport ecosystem.

What is a Smart City? A Comprehensive Introduction Data Trends
What is a Smart City? A Comprehensive Introduction

Across the globe cities and municipalities are transforming themselves into smart cities, improving the urban environment for citizens, visitors, and businesses, while boosting efficiency and sustainability. In this blog we explain what a smart city is and how to build one successfully.

Start creating the best data experiences
Request a demo