Product News: AI enables intelligent semantic search and accelerates the use of large-scale data

Learn more
Glossary

Structured and Unstructured Data

Structured and unstructured data are terms to describe the format and models of specific data, and impact how data is collected, stored and analyzed.

What is structured and unstructured data?

Data is available in a wide number of formats and types. These can be broadly broken down into two groups – structured and unstructured data. Both of these have specific characteristics, advantages and uses.

What is structured data?

Structured data is quantitative data that follows and fits into specific, defined models and formats. Examples of structured data include customer records containing names and addresses, credit card numbers, stock information, geolocation, or numerical answers to surveys. It is normally made up of numbers and values.

Structured data is normally collected in relational databases, spreadsheets or data warehouses. It is produced by business systems such as CRM, ERP or collected via structured webforms.

Structured data has these characteristics:

  • It has an identifiable structure built on a data model
  • It is organized in rows and columns, with fixed fields
  • It is stored in a tabular form, such as in databases
  • Users, both human and machine, can easily understand the meaning of the data and thus access and query it
  • Data points in the same class share the same attributes – for example a telephone number field will always be numerical and have a set number of digits

What is unstructured data?

In contrast, unstructured data is qualitative data that does not follow a specific, pre-defined data model or is not organized in a pre-defined manner. Examples of unstructured data include text documents, email text, images, audio files, videos and free text answers within surveys. Essentially, this data is not designed primarily to be analyzed.

Unstructured data is normally collected in NoSQL databases, documents, image libraries, or data lakes. It is produced by tools such as word processors, cameras, sensors, and email programs.

Unstructured data has these characteristics:

  • It has no identifiable structure or data model
  • It has no obvious organization
  • It cannot be easily analyzed for meaning or trends by either machines or humans without special training or tools
  • Data points can vary widely – the same NoSQL database could contain video, audio, image and text files

What is semi-structured data?

As the name suggests, semi-structured data is a hybrid between structured and unstructured data. While it does not have a predefined data model, it uses metadata (such as tags and semantic markers) to enable cataloging, searching and analysis.

Examples of semi-structured data formats are JSON, CSV, and XML. Text on web pages, such as this one, are also semi-structured, as a hierarchy of formats (H1, H2, H3) has been applied. However, it does not cover the actual text itself on the page, which is unstructured.

Taking the case of CSV files they have some structure (such as being tab-delimited), which makes them easier to organize and analyze. However this structure does not follow the defined model you would have in, for example, a full spreadsheet file, where each row and column has defined attributes.

While traditionally structured data has been the most generated and used within organizations, analysts believe that unstructured data is now in the majority. IDC predicts that the amount of data in the world will grow from 33 zettabytes in 2018 to 175 zettabytes by 2025. One zettabyte is equivalent to a trillion gigabytes. Of this 80% will be unstructured.

What are the advantages and disadvantages of structured data?

Structured data is designed to exist in a format that makes it easy to be captured, stored, accessed, organized and analyzed. It offers the following advantages and disadvantages:

Advantages:

  • It is easier to understand and analyze by business users, without specific training, as it follows logical models
  • It can be quickly and easily consumed and analyzed through machine learning algorithms, automating insights
  • It scales to enable companies to easily store and access large volumes of information.
  • Structured data takes up less storage space than similar amounts of unstructured data
  • There are many, mature tools available to collect, store and analyze structured data

Disadvantages:

  • Structured data is designed with a specific, predefined structure. This means it can only be used for its intended purpose, limiting its flexibility and usability
  • As it is stored in systems with rigid schemas (such as data warehouses) any changes in data requirements mean that all structured data must be updated, leading to a massive expenditure of time and resources

What are the advantages and disadvantages of unstructured data?

Unstructured data is not designed to be analyzed and does not follow conventional data models or defined schema. It offers the following advantages and disadvantages:

Advantages:

  • It is more flexible and adaptable. As it is stored in its native format it remains undefined until it is needed (schema-on-read), allowing a wider range of analysis
  • It can be accumulated and collected faster as it does not need to fit a predefined model
  • It can be stored in more flexible data lakes, rather than rigid data warehouses
  • It can be analyzed using natural language processing (NLP) to uncover deeper insights (such as sentiment in a piece of text, or predictive analytics to tell when a piece of machinery will fail)

Disadvantages:

  • Analysis requires specialist data science skills and expertise, making it hard for normal business users to gain insights manually from unstructured data
  • It can exist in multiple silos across the organization, particularly in systems and formats that are not designed to be easily accessible (such as email systems or PDF documents)
  • It requires specialist tools to create value, many of which are less mature than their structured data counterparts.
  • As it is stored as media files or in NoSQL databases, unstructured data requires more storage space than structured data.

 

Download the ebook making data widely accessible and usable

Learn more
How to create the best data experiences: key features that customers love when democratizing data Product
How to create the best data experiences: key features that customers love when democratizing data

Data democratization requires a strong data experience platform that is flexible enough to meet a range of user needs. In this blog we bring together a selection of our customers’ favorite features that help them save time and deliver compelling data experiences.

“Building a data democratization platform means adapting to every organization’s tech stack” Product
“Building a data democratization platform means adapting to every organization’s tech stack”

Our customers now enjoy improved connection features for retrieving data from a variety of sources, including SharePoint and Google Drive, and for quickly creating datasets. We sat down with Coralie Lohéac, the project’s coordinator, to find out more.

From SharePoint to Google Drive, our new connections increase the value of your data Product
From SharePoint to Google Drive, our new connections increase the value of your data

We are further expanding the range of connections between the Opendatasoft platform and third-party applications, ensuring integration with any technology stack. This improvement comes alongside another time-saving update: a complete redesign of the dataset creation process.

How to create the best data experiences: key features that customers love when democratizing data Product
How to create the best data experiences: key features that customers love when democratizing data

Data democratization requires a strong data experience platform that is flexible enough to meet a range of user needs. In this blog we bring together a selection of our customers’ favorite features that help them save time and deliver compelling data experiences.

“Building a data democratization platform means adapting to every organization’s tech stack” Product
“Building a data democratization platform means adapting to every organization’s tech stack”

Our customers now enjoy improved connection features for retrieving data from a variety of sources, including SharePoint and Google Drive, and for quickly creating datasets. We sat down with Coralie Lohéac, the project’s coordinator, to find out more.

From SharePoint to Google Drive, our new connections increase the value of your data Product
From SharePoint to Google Drive, our new connections increase the value of your data

We are further expanding the range of connections between the Opendatasoft platform and third-party applications, ensuring integration with any technology stack. This improvement comes alongside another time-saving update: a complete redesign of the dataset creation process.

Start creating the best data experiences
Request a demo