Product News: AI enables intelligent semantic search and accelerates the use of large-scale data

Learn more

What is data management? - Practical Guide

1. What is data management?

How is data management defined?

Data management is the end-to-end process of collecting, processing, storing, sharing and using data across an organization and its ecosystem. Data management processes need to be secure, efficient, and compliant with regulations in order to be effective and to add value to the organization.

As data becomes central to organizational success, and data volumes, and complexity increase, data management is now a vital discipline for both the public and private sectors.data management

An enterprise data management plan has to cover:

  • The creation and collection of data across the organization
  • How and where data is stored, both on-premise and in the cloud
  • Processes to keep data available for use
  • Data privacy, security and backup
  • Compliance requirements, especially around regulations, retention and protecting customer information
  • Data integration, processing, enrichment and transformation
  • Data sharing, both internally through data marketplaces and externally
  • Monitoring performance and driving improvements

What is the difference between data management and master data management?

Data management refers to the entire discipline and practice of managing all data within an organization. By contrast master data management (MDM), covers how data is created, shared, updated and used. Master data is non-transactional data that is used to provide context to transactional data by describing it and making it easier to understand and manage. For example, it covers entities such as product names, customer formats or how you describe your financial structures or offices. Master data does not refer to individual transactions but instead describes and helps categorize them. Companies use master data management tools to ensure their master data is consistent across all of their organization.

What data do organizations have to manage

What data do organizations have to manage?

In an increasingly digital world, data now comes from an ever-growing number and variety of sources both within and outside the organization. More data is being produced, more quickly, from more sources, than ever before.

This data can be:

  • From internal business systems such as finance, sales and CRM
  • Unstructured data such as videos, audio files or word processing documents.
  • From the systems of partners within your ecosystem
  • From new external sources such as social media and Internet of Things (IoT) sensors, smart devices or video cameras.

As well as the range of sources, the volume of data is increasing exponentially, requiring a data management architecture that can scale to potentially handle terabytes of data (also known as big data management).

Why do organizations have to manage data?

Today data is critical to business success and to gain value from it, organizations must manage it successfully and efficiently, using strong data management principles. Not overcoming data management challenges means that the potential value in the data will be lost, hitting competitiveness and hampering success.

Failure to successfully manage data leads to:

XLittle knowledge of what data an organization has, where it is stored and how it can potentially be used

XPoor technical performance across the data stack, including slow data processing

XChallenges to meeting compliance regulations

XReputational damage and cybersecurity risk due to data leaks/poor security

XData silos, where data from different systems is not integrated or made available to all

XDuplication of data and increased storage/processing costs

2. What are the benefits of data management?

In a data-driven world, harnessing data is essential to competitiveness. By following data management best practices, organizations benefit from:

Better decision-making

Through access to more complete information, managers can make better, more informed decisions based on high-quality management data. By having a 360 degree view of business performance, organizations can anticipate potential issues and risks and take preventative action. Access to large volumes of high-quality, complete data from a full range of sources is central to taking advantage of machine learning (ML) and artificial intelligence (AI) algorithms. These automate and scale decision-making and enable organizations to predict future events and respond in real-time.

Greater efficiency

Through immediate access to up-to-date data, employees are able to work more productively and effectively. They do not have to waste time searching for different datasets from other departments or systems or manually entering information from other sources. This means they can focus on using data to do their jobs more effectively, helping overall competitiveness and improving the employee experience, as time-consuming manual processes are eliminated. Insights from data can also be used to optimize performance across the business, further increasing efficiency gains.

Reduce costs

Volumes of data are increasing rapidly. All of this information needs to be stored, processed and protected, adding significantly to IT overheads and costs. An effective data management strategy is able to monitor and eliminate duplicate data from across the organization, breaking down data silos between departments. This reduces issues caused by data duplication and means that the organization requires less data processing or storage capacity. Having a single data management strategy can also remove administration time and free up staff for other roles.

Greater innovation

Data is the fuel for increased collaboration between teams, departments and partners. Bringing together data from different areas and systems (such as sales, HR and customer service) powers the creation of new, innovative uses for data, driving the creation of new products and services, new ways of working and better service to customers and citizens. It breaks down barriers between departments and enables knowledge to be shared more widely across the organization, as well as with partners within the wider ecosystem, citizens (in the case of the public sector) and other stakeholders.

Enable compliance

Data, particularly personally identifiable information (PII) is rightly subject to stringent regulations, such as GDPR and CCPA. Organizations must therefore implement data security management to keep information safe and secure, with clear processes and audit trails to demonstrate who has access to datasets and for what purposes. This requires an effective data management strategy to map all data, ensure it is protected, and is only used for specified purposes. This enables regulatory compliance and protects brand reputation by demonstrating a strong commitment to safeguarding customer and employee data.

Improved customer experience and greater transparency

Understanding and meeting customer needs is crucial to business competitiveness. Improving the customer experience is a continuous process that relies on access to holistic data from across the customer journey. Having a complete picture of customer data through data management enables companies to monitor and improve the experience they offer, and to personalize how they meet the needs of specific customers and groups.

Data is also essential to demonstrating transparency, whether around corporate activities (such as for ESG/CSR reporting) or within the public sector. Customers and citizens want to be provided with a complete picture of how the organizations they interact with are delivering on their objectives and meeting their needs – requiring strong management of data across the business.

 

3. Data management vs data governance

Data governance illustration

What is data governance?

Data governance is a key component of an effective data management strategy. It covers the policies and procedures around how you identify, organize, handle, manage, and use the data collected in your organization across its entire lifecycle.

The Data Governance Institute defines data governance as “A system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.”

How is data governance different to data management? Essentially governance sets the strategic principles and frameworks that are used to manage data, while data management solutions actually tactically carry out the process of managing that data.

The benefits of data governance

Data governance delivers both proactive and reactive benefits:

Proactive benefits

It creates new value from data by encouraging data sharing through breaking down silos between departments and providing staff with greater access to data. It also encourages the creation of a data culture by building a shared understanding and vocabulary around data.

Reactive benefits

Data governance is essential to ensuring that all data is secure, well-protected, compliant with regulations, is accessed and used correctly and is accurate. This manages and reduces risks around data.

Structuring a successful data governance program

Many data governance programs fail to deliver results, due to a perceived lack of business value and internal resistance from departments who see it as interference in their activities. This means that companies fail to gain the full benefits of their data management strategy.

What structure and competencies do you need for effective data governance?

Overcoming these challenges requires organizations to adopt the right structure for their program and to gain internal buy-in from both management and other teams. Putting the right structure in place ensures that data governance programs are linked to business objectives and have the broadest possible involvement from data owners from within the company.

Data governance teams must:

  • Be headed by a senior leader, such as the Chief Data Officer (CDO) or equivalent. They must be directly involved in the program and have overall responsibility for its success. They must champion the project internally.
  • Involve other senior management to set strategy and monitor performance through a Data Council. It is important that senior management are seen as highly visible supporters of the project
  • Be managed day-to-day by a dedicated Data Governance Officer with a combination of technical, business and collaboration skills
  • Involve data owners and practitioners from across the organization, responsible for ongoing activities and the setting and enforcement of rules

What framework should you adopt for data governance?

The details of data governance frameworks will differ between organizations, depending on their specific maturity, needs and industry. However, they should all contain certain key elements to ensure data is effectively managed from end-to-end across your organization:

  • Clear aims, objectives, and responsibilities, focused on the business value governance provides
  • Detailed policies covering data discovery, availability, integrity, quality, security, usability, and access/sharing
  • A common way and vocabulary to describe data across the organization, for example to ensure that terms are used correctly between departments
  • A set of rules to cover how data is handled, standards for quality, metadata definitions, access rights and usage
  • An organizational model/structure and team to manage data and enforce governance
  • A full communication and training program for all users on what data governance means and why the program is important
  • Ongoing measurement of activities against business-focused KPIs

Read more about achieving data governance success in our blog.

 

4. The data management lifecycle – step by step

The data management process covers the entire end-to-end process from first creating or collecting raw data all through to its usage and sharing across the organization and beyond. Organizations use a range of data management tools for this data lifecycle management.

Data collection

Data is created or collected from individual sources. These could be business systems (such as CRM, HR or sales), production systems (machines in factories), IoT sensors (for example collecting traffic or environmental data), or third-party data (provided by partners through APIs or other means). At this stage data should be checked to ensure it means data quality standards and data governance policies.

Data integration

Data from multiple sources and different systems is combined into a single repository, such as a data warehouse or a data lake, breaking down data silos. This enables easier cross-referencing and the elimination of duplicate data. As it enables better reporting and predictive analysis, data integration simplifies the decision-making chain for organizations.

data preparation

Data preparation

Data is prepared for its use. It is first cleaned (data cleansing) which involves identifying and fixing incorrect, incomplete, duplicate, unneeded, or otherwise erroneous data in a data set. Processors are then used to standardize formatting (for example ensuring all dates are in the same format) and to anonymize any personally identifiable information.

Data enrichment

In the data enrichment/data transformation step additional information is added to enrich existing datasets to make it more usable and valuable. For example, organizations can add geographic, weather or other reference data to provide context to the data. The Opendatasoft Data hub contains over 30,000 datasets that can be used to enrich your own datasets.

Data sharing

If data remains in the hands of data analysts it does not unlock its full value. It has to be shared more widely amongst non-experts, whether business decision-makers, citizens or employees who require it for their day-to-day jobs. This area is often neglected in data management strategies, reducing ROI from data. What is needed is to democratize data so that it can be shared and used by everyone.

Data lineage

To improve data management it is vital to understand how data flows across your organization, and how and where it is used. Data lineage tools and dashboards provide this insight, delivering full traceability over data, better understanding the needs of users, demonstrating impact and performance against KPIs, while enabling data management strategies to be continually optimized and improved.

5. The essential step of data democratization

We live in a data-driven world, and access to usable information in understandable formats is vital for businesses, employees, citizens and partners. Data cannot be left solely in the hands of experts, such as business intelligence analysts, or require specific skills to access it and make sense of it. That is driving the importance of data democratization, the process of seamlessly sharing data with all in ways that they can understand and make use of. There is more on data democratization and its importance in this blog.

The challenges to data democratization

Unfortunately too many organizations currently lack the data management capabilities to truly democratize their data.

Firstly, they have not yet created a data-centric culture within the organization where data is seen as a crucial resource for all, and where staff are confident in accessing, understanding and reusing data in their daily working lives. Overcoming this obstacle requires a focus on training and culture to build confidence and ensure your people are data-driven.

Secondly, data is simply not easily available in formats and locations that people can immediately access and use. Employees, citizens and partners struggle to find the data they require, as it is scattered across the organization in multiple silos and systems and is not easy to understand or work with.

Enabling data democratization

Solving this challenge requires organizations to centralize access to their data, creating a one stop shop that contains all available datasets, along with the tools to reuse this information. Depending on the audience for the data, there are three main types of data portal used to centralize and share data:

Internal data marketplace or portal

Creating an internal data marketplace is a powerful way of bringing together all datasets from across the organization and making them available to employees. An internal data portal has to have an engaging user interface, make it simple to find data and provide employees with confidence that data is accurate, high quality and meets their needs. To protect confidentiality and prevent misuse, internal data portals should have built-in access controls, ensuring that employees cannot view personal or other data that is not relevant to their role.

Partner data marketplace or portal

Solving business and societal challenges, such as around decarbonization, requires an ecosystem approach. Organizations need to work with their suppliers, partners and other stakeholders, collaborating through data. Creating a partner data marketplace enables this collaboration, providing datasets that can be enriched or used by partners, deepening relationships and driving innovation. Partner data portals also enable organizations to monetize their data, creating new services that they provide to partners and customers, either as datasets or through dashboards. These add value to their offerings and create new revenue streams.

open data portal

Open data portal

An online portal available to everyone through the internet, an open data portal shares all datasets with the world. Originally created by public sector bodies to increase transparency by sharing information on their activities with citizens, open data portals are increasingly being rolled-out by businesses. They recognize the benefits of being open to demonstrate how they are performing against key metrics, such as their ESG and CSR goals. An open data portal should be comprehensive and easy-to-use with a strong search function.

Making it easy to reuse data

Simply making data available through a portal is not enough to deliver data democratization. It must be easy to:

  • Visualize the data, through a full suite of data visualization tools that can be used to create interactive maps, informative dashboards, compelling graphics and understandable, relatable data stories. These have to be easy to use by non-specialists, and include drag and drop/no code options that accelerate the creation of visualizations within the organization. Tools should provide real-time previews to demonstrate the impact of changes and be able to create 100% responsive visualizations that fit all screen sizes.
  • Download the data, in a full range of open formats, such as text files or .CSV spreadsheets as well as commonly used formats such as .doc and .XLS. For more advanced users looking to automate access to data, provide APIs for each dataset. This allows users to link data to their own systems and makes it easy to download bulk data for detailed analysis. APIs also enable AI tools to automatically access and download data, without requiring human intervention.

6. Data management: how do you build the best technology stack?

Successful data management requires a combination of strategy, process and powerful data management technologies.

It is essential to create a flexible, scalable and secure data stack that covers the end-to-end data management process across the organization. This requires an overall data architecture and individual tools for data management within the stack.

Data architecture

How you structure your data architecture has a major impact on your data management platform. The data architecture sets out the infrastructure behind your data management program and is crucial to its success.

An organization’s IT architecture must allow it to share data and make it accessible to everyone. Data must be easily accessible, and not stuck in silos. However, if architectures are too centralized and concentrate activities within a central team, there is a risk that departments will not actively participate in data sharing, damaging your chances of success.

The Data mesh model

Increasingly organizations are looking to adopt data mesh architectures. These are designed to underpin data democratization by decentralizing responsibilities for particular data to those that are closest to them, but backed up by agreed, company-wide governance and metadata standards to ensure interoperability, with the architecture enabled by a shared self-service data infrastructure.

Rather than being a technology or tools, data mesh provides a framework and guidelines to help organizations work with data in the most optimized way, focused around three building blocks:

  • Distributed data products owned by independent cross-functional teams which can contain both embedded data engineers and data product owners
  • Centralized governance to ensure interoperability, consistency and security
  • A common data infrastructure to host, enhance and share data

Data mesh makes it easier to find and share high-quality data and turn it into data products for internal or external use. To learn more about the architecture read our blog “What is data mesh and why is it vital to data democratization?”.

Security

Clearly data needs to be protected at all times, both from external and internal threats. Your data management policy and architecture must have security at its core. Invest in data privacy management tools to keep information anonymous, and audit security at all stages of the data management process, from collection and storage to preparation and sharing. As part of security compliance manage who has access to which datasets, providing an audit trail and preventing unauthorized usage.

Data solutions

The data stack will contain a variety of data management products. It is essential that these all work together seamlessly to ensure interoperability, seamless processing and protection for your data. The data stack normally includes:

  • Data creation: systems creating data, such as CRM solutions for customer data management
  • Data collection: solutions that bring data together to make them easier to analyze. Options include:
    • Data warehouse – a storage space solely for structured data. This makes it easier to manage, but makes exploiting data less flexible.
    • Data lake – a storage space that contains all of an organization’s data in its raw form. This gives flexibility in analysis but can be difficult to govern.
    • Data lakehouse – a hybrid model between a data lake and a data warehouse. This adds some structure to unstructured data to make it easier to find and use.
    • Both data warehouses and data lakes require specialist skills and a high level of resource to implement and maintain, as our blog comparing the two technologies explains.
  • Data management platforms (DMPs): This is a software platform used to collect and manage internal and external data in order to target customers.
  • Business intelligence (BI) tools: Once data has been prepared data analysts use business intelligence tools to run reports and queries on it. BI tools are designed to be used by experts, meaning that the vast majority of people do not have the skills or training to operate them. This limits data democratization.
  • Data visualization tools: Similar to BI tools, data visualization software allows users to turn tabular data into maps, dashboards and other graphical forms. These make it more compelling and understandable to users. However, as with BI tools, many data visualization platforms require expertise and training to be used correctly.

While organizations have invested heavily in their data stack in terms of both technology licenses and skilled experts to run their data management systems, they still struggle to get full value from their data. This is because their stack lacks a data experience layer that enables them to effectively share and democratize their data at scale.

A data experience tool sits within your data management software stack and includes key features that allow you to:

  • Connect to all data sources to bring data together, wherever it has been created or stored
  • Enhance data to improve quality by applying processors to standardize and enrich data with external datasets
  • Publish data through powerful features that safeguard quality and make management and administration simple and straightforward. The tool should automate key tasks, such as the creation and sharing of metadata to increase efficiency
  • Visualize data through easy-to-use no code tools that mean non-experts can quickly create compelling data visualizations
  • Share data in ways that promote greater reuse by all – from APIs for expert users to download through standard file formats to shareable links, available on open data portals, internal data marketplaces and partner portals.
  • Analyze how data is being used and shared through data lineage capabilities. This helps understand data flows within your organization and therefore improve data management.

 

7. Examples of customers that manage the data lifecycle with Opendatasoft

Schneider Electric – creating internal and external data experiences

Schneider Electric uses a range of solutions to manage data across its lifecycle, ensuring it meets governance and sharing requirements. This includes cloud storage through Microsoft Azure and its Databricks data lakehouse solution. These feed data into the Opendatasoft platform, where Schneider Electric creates data experiences which are shared through its Exchange data and services marketplace.

Ministry of Sport – data management to improve decision-making

The French Ministry of Sport collects a range of internal and external data, from government agencies including the Ministry of Health and the French National Statistics Authority (INSEE). These are all centralized within the Opendatasoft platform where they are prepared and enriched. Data is then shared through visualizations and reuses with political decision-makers, partners, businesses and the general public.  

Lamie mutuelle: centralizing data to enable sharing

Insurance company Lamie has made Opendatasoft the cornerstone of its data management strategy. All data, whether collected internally or externally by partner organizations, is centralized on the ODS platform. It is then shared via the platform in multiple formats, such as via dashboards for employees, with its Zoho CRM solution and other internal systems via APIs and with customers through its dedicated customer and partner portals.

 

Download the ebook making data widely accessible and usable