Ebook: Democratizing data access and use

Download
Glossary

Dataset

A dataset is a collection of related data points, providing data in an understandable form to be shared and reused internally and externally.

What is a dataset?

A dataset (or data set) is a collection of related data points, stored in the same location, such as a table.

Each data point can be text, numbers, geographical information, or multimedia (such as an image or video).

For example, a simple, tabular dataset created by a retailer might include columns representing variables – the type of clothes, color, and stock levels. The rows then represent the values of each item, as shown in the example below:

Type Color Stock level
Shirt Blue 4
Socks Black 8
Hat Green 2

 

When describing data within a dataset a hierarchy is followed, going from smallest to largest:

  • Data point: The smallest element of data that cannot be further subdivided. “Shirt”, “Black” or “2” are all data points in the table above
  • Data object: A collection of grouped, related data points that fit together. For example “Blue shirt with 4 in stock” is a data object
  • Dataset: All of the data within the table.

Each data point within the dataset can be accessed individually and all of them share the same theme – in the example above all data points relate to clothes inventory.

Different datasets might be related, with these relationships described through data schemas. In our example, a second dataset might include the date and price of a sale of one of the items of clothing in dataset one. The data schema explains how the two datasets interrelate.

How can you reuse a dataset?

Datasets are intended to be shared, whether internally or externally. They therefore require supporting elements and tools to allow their reuse.

Metadata

This is all the information about the dataset: license, creation/modification date, producer, data model used, etc. This information allows the reuser to be reassured about the reliability and quality of the dataset. Some business sectors require the use of specific metadata to meet interoperability needs.

Data visualization

In its raw form, a dataset can be difficult to analyze. That’s why most datasets that are shared by organizations are accompanied by data visualizations, or at least tools to create it. These can be simple views like maps or graphs, or more advanced formats like dashboards or data stories.

APIs

APIs are essential when retrieving large datasets in real-time, and are generally provided by the producer of the dataset. Once connected, they allow you to automate the retrieval of information that is always up-to-date.

What can datasets be used fo?

Datasets are essential to creating value from data. Consequently the number and size of datasets that an organization has collected and made available internally and externally is a measure of how advanced its data sharing strategy is.

Internal uses to improve efficiency

  • by data experts: datasets can be collected with data warehouses or data lakes and then analyzed and queried using business intelligence tools
  • through self-service: They can be made available through a central data catalog to everyone within the organization, enabling them to be used for better decision-making and improved operations
  • for training AI: Artificial intelligence algorithms learn by understanding the relationships between data points within datasets, allowing them to make more informed decisions. Training them therefore requires access to very large volumes of data, from one or more datasets.

External uses to increase transparency

  • through open data: Public open data portals typically contain a large number of datasets, grouped into specific areas or themes. For example, UK Power Networks’ open data portal contains 39 datasets. These vary in size – one contains a complete list of its electricity distribution pylons (containing over 47,000 data points), and another is a list of all local authorities in its distribution area (116 records).
  • for hackathons/competitions: Sharing datasets with the wider community not only increases transparency but provides opportunities for innovation. Releasing specific datasets and allowing them to be used for hackathons or competitions provides new opportunities for innovation from inside or outside the organization.

External use to create new services

  • with a specific ecosystem: Datasets can be shared externally, either with a specific partner or with a wider, but closed ecosystem. Schneider Electric’s Exchange data marketplace shares 195 energy-related datasets with 540 users from 200 companies, enabling it to increase value for its partners, and for the company to launch new data services.

Want to learn more about our data democratization platform? Contact one of our experts!

Learn more

Extending our all-in-one data experience platform to spread data democratization Product
Extending our all-in-one data experience platform to spread data de...

Opendatasoft’s unique all-in-one data experience platform is continually being extended to help our clients turn their data into value. We explain key tech advances we’ve made in 2022 and look ahe...

How reusing and sharing data creates value for organizations Data Trends
How reusing and sharing data creates value for organizations

However, how data is used and how projects should be structured greatly varies between sectors. To understand how to deploy large-scale data projects that generate value, we interviewed Mick Levy, Dir...

Ebook: Democratizing data access and use Digital transformation
Ebook: Democratizing data access and use

We can all easily access, create and share photos, messages, and videos every day. Yet we can’t simply access and consume data in the same way. Increasing data access and making it available for eve...

Extending our all-in-one data experience platform to spread data democratization Product
Extending our all-in-one data experience platform to spread data de...

Opendatasoft’s unique all-in-one data experience platform is continually being extended to help our clients turn their data into value. We explain key tech advances we’ve made in 2022 and look ahe...

How reusing and sharing data creates value for organizations Data Trends
How reusing and sharing data creates value for organizations

However, how data is used and how projects should be structured greatly varies between sectors. To understand how to deploy large-scale data projects that generate value, we interviewed Mick Levy, Dir...

Ebook: Democratizing data access and use Digital transformation
Ebook: Democratizing data access and use

We can all easily access, create and share photos, messages, and videos every day. Yet we can’t simply access and consume data in the same way. Increasing data access and making it available for eve...

Start creating the best data experiences

Request a demo