Glossary

Structured and Unstructured Data

Structured and unstructured data are terms to describe the format and models of specific data, and impact how data is collected, stored and analyzed.

What is structured and unstructured data?

Data is available in a wide number of formats and types. These can be broadly broken down into two groups – structured and unstructured data. Both of these have specific characteristics, advantages and uses.

What is structured data?

Structured data is quantitative data that follows and fits into specific, defined models and formats. Examples of structured data include customer records containing names and addresses, credit card numbers, stock information, geolocation, or numerical answers to surveys. It is normally made up of numbers and values.

Structured data is normally collected in relational databases, spreadsheets or data warehouses. It is produced by business systems such as CRM, ERP or collected via structured webforms.

Structured data has these characteristics:

It has an identifiable structure built on a data model
It is organized in rows and columns, with fixed fields
It is stored in a tabular form, such as in databases
Users, both human and machine, can easily understand the meaning of the data and thus access and query it
Data points in the same class share the same attributes – for example a telephone number field will always be numerical and have a set number of digits

What is unstructured data?

In contrast, unstructured data is qualitative data that does not follow a specific, pre-defined data model or is not organized in a pre-defined manner. Examples of unstructured data include text documents, email text, images, audio files, videos and free text answers within surveys. Essentially, this data is not designed primarily to be analyzed.

Unstructured data is normally collected in NoSQL databases, documents, image libraries, or data lakes. It is produced by tools such as word processors, cameras, sensors, and email programs.

Unstructured data has these characteristics:

It has no identifiable structure or data model
It has no obvious organization
It cannot be easily analyzed for meaning or trends by either machines or humans without special training or tools
Data points can vary widely – the same NoSQL database could contain video, audio, image and text files

What is semi-structured data?

As the name suggests, semi-structured data is a hybrid between structured and unstructured data. While it does not have a predefined data model, it uses metadata (such as tags and semantic markers) to enable cataloging, searching and analysis.

Examples of semi-structured data formats are JSON, CSV, and XML. Text on web pages, such as this one, are also semi-structured, as a hierarchy of formats (H1, H2, H3) has been applied. However, it does not cover the actual text itself on the page, which is unstructured.

Taking the case of CSV files they have some structure (such as being tab-delimited), which makes them easier to organize and analyze. However this structure does not follow the defined model you would have in, for example, a full spreadsheet file, where each row and column has defined attributes.

While traditionally structured data has been the most generated and used within organizations, analysts believe that unstructured data is now in the majority. IDC predicts that the amount of data in the world will grow from 33 zettabytes in 2018 to 175 zettabytes by 2025. One zettabyte is equivalent to a trillion gigabytes. Of this 80% will be unstructured.

What are the advantages and disadvantages of structured data?

Structured data is designed to exist in a format that makes it easy to be captured, stored, accessed, organized and analyzed. It offers the following advantages and disadvantages:

Advantages:

It is easier to understand and analyze by business users, without specific training, as it follows logical models
It can be quickly and easily consumed and analyzed through machine learning algorithms, automating insights
It scales to enable companies to easily store and access large volumes of information.
Structured data takes up less storage space than similar amounts of unstructured data
There are many, mature tools available to collect, store and analyze structured data

Disadvantages:

Structured data is designed with a specific, predefined structure. This means it can only be used for its intended purpose, limiting its flexibility and usability
As it is stored in systems with rigid schemas (such as data warehouses) any changes in data requirements mean that all structured data must be updated, leading to a massive expenditure of time and resources

What are the advantages and disadvantages of unstructured data?

Unstructured data is not designed to be analyzed and does not follow conventional data models or defined schema. It offers the following advantages and disadvantages:

Advantages:

It is more flexible and adaptable. As it is stored in its native format it remains undefined until it is needed (schema-on-read), allowing a wider range of analysis
It can be accumulated and collected faster as it does not need to fit a predefined model
It can be stored in more flexible data lakes, rather than rigid data warehouses
It can be analyzed using natural language processing (NLP) to uncover deeper insights (such as sentiment in a piece of text, or predictive analytics to tell when a piece of machinery will fail)

Disadvantages:

Analysis requires specialist data science skills and expertise, making it hard for normal business users to gain insights manually from unstructured data
It can exist in multiple silos across the organization, particularly in systems and formats that are not designed to be easily accessible (such as email systems or PDF documents)
It requires specialist tools to create value, many of which are less mature than their structured data counterparts.
As it is stored as media files or in NoSQL databases, unstructured data requires more storage space than structured data.

Learn more

Blog

Centralize all your data assets through Opendatasoft’s unlimited connectivity

In this article, explore how Opendatasoft’s extensive range of connectors enable customers to successfully complete all their connectivity projects and seamlessly industrialize the collection, centralization and availability of all their data assets.

Blog

Denodo and Opendatasoft: A common connector to ramp up data usage

The combination of Denodo and Opendatasoft offers organizations a unique solution, combining the flexibility and performance of data virtualization with the power of data sharing to enable all teams, expert or not, to create and distribute digital data experiences to fuel their internal and/or external ecosystems.

Blog

How to create the best data experiences: key features that customers love when democratizing data

Data democratization requires a strong data experience platform that is flexible enough to meet a range of user needs. In this blog we bring together a selection of our customers’ favorite features that help them save time and deliver compelling data experiences.

Start creating the best data experiences

Request a demo