[WEBINAR] Product Talk: Using AI to enhance the data marketplace search experience

Save your place
Product

Open Data as Terraces

Open Terraces - 0

Open data is nothing more than a way to spread data. As there are plenty of different kinds of distribution networks, we can imagine several ways to grant easy access to data that are ready to be reused. We’ve been working on such a distribution architecture, and it looks like terraces.

Head Of Engineering, Opendatasoft
More articles
Copy to clipboard

“A terrace is a piece of sloped plane that has been cut into a series of successively receding flat surfaces or platforms, which resemble steps, for the purposes of more effective farming”. Wikipedia

Among all the pros of terraces, there are three main characteristics that may, in my opinion, inspire us. These three points echo the issues that someone in charge of a data portal can face.

  • Better exposure. And when it comes to data, reaching its public and its re-users may be quite difficult. Google is a natural gateway but its algorithms will favor data coming from organizations good with SEO but not necessarily with releasing data.
  • Better irrigation. Basically when you manage data, you know that everything has a cost, every step requires work. So anything that allows you to better manage the resources and automatize work is interesting.
  • Simplifies a cultivator’s work, well at least the ground is flat. If we want more open data, we have to make the lives of the people in charge easier.
Open terraces - 1 Copy to clipboard Open terraces - 2

Classical data distribution is pretty simple, generally coming in the form of a catalog of licensed datasets. Problems begin when you have a large number of datasets, multiple formats, and even more when you have various organizations sharing their data in the same place. It looks like a lake on top of a mountain or of a volcano. It’s a real passion for a lot of people: the trek is enjoyable, and what one sees at the top can be really interesting. But still, reaching the top might be inaccessible for most people.

Imagine someone who wants to compare his or her city with another one:

  • Step1: Find and go to the portal
  • Step2: search for the data
  • Step3: What format?!
  • Step4: OK, I know that Excel logo
  • Step5: open that file
  • Step6: How do I filter the data?
  • Step7: what’s my area ID??
  • Step20: reuse the data…
  • ಠ_ಠ

Others have tried to develop more advanced/user friendly open data portals. When the data are indexed before the distribution, filtering and searching becomes far more effective and the data are easier to find. Since they are indexed you can allow a user to try building maps or charts. They can truly see what the data look like before downloading them. That’s what we do at Opendatasoft but we are not the only ones.

Open Terraces - 3

How we like to see the data distribution on our portals

Copy to clipboard Open Terraces - 4

The open data movement is still very young, and we may not have found the perfect way to distribute data in every cases.

So we decided to craft a new architecture for open data portals.

The whole architecture is based on the concept of sub-domains. Our customers are now able to generate sub-domains from their open data portals’ back office. These sub domains can be perfectly shaped to a company’s departments, mirror administrative divisions or to allow an industry to share data with various actors within its territory.

The sub domains are real open data portals by themselves: they are full-featured, they have the same graphic charter than the principal portal, they have their own administrators, their own data and their own ways to advertise the data.

All of this is quite interesting, but I think the key feature here is the ability to federate data in both ways. The principal portal can distribute datasets or parts of datasets (think ‘at the good granularity’) to each of the sub portals. And if any of sub portals load a new piece of data, the principal one can unify it. This is also a new way to decentralize and distribute the data collection!

Open Terraces - 5
Architecture of the sub-domain way to deal with data distribution

Copy to clipboard

Success follows a Power Law

A lot of thing in life follow a Power Law, including the distribution of downloads among open datasets. This become even more true when we speak about information technologies: in an economy where anything can be commoditized in a few years, network effects and data-network effects are stronger and exacerbate the distribution.

That is one of the key factors in open data’s strength. As you don’t know who will have the best design to better serve your citizen or reusers, you create the infrastructure so that anyone can use the data and develop it’s own solution.

Both by creating new distribution channels and decentralizing the ability to leverage the data, and you increase your chances to see new services created around your data and you strengthen your position in the ecosystem.

 

Copy to clipboard

open data has always been about empowering people: citizens, the people administering the portal, and gradually more and more people within an organization consuming the data.
Releasing data empower people. Giving certain people the opportunity to handle a data portal all by themselves with access to all the necessary tools for data cleaning, ETL processes, APIs, char and map building, is a new way to build some very strong links with your community or into your organization.

Make data even easier to discover

At Opendatasoft, our goal is to make the data move faster and faster. By developing a new way to distribute data, depending on the granularity of the data and involving more and more people, we do believe more data will find their re-users.

This article was first published on Medium

Articles on the same topic : Open data

Read more about Open Data
The central role of data in delivering the Paris 2024 Olympic and Paralympic Games Company news
The central role of data in delivering the Paris 2024 Olympic and Paralympic Games

As we get closer to the start of the world's biggest sporting event, we look at the role of data in preparing for the Paris 2024 Olympic and Paralympic Games, which start on July 26th 2024.

Transforming banking operations with data portals Banking & Insurance
Transforming banking operations with data portals

Embracing data at scale enables banks to digitize their operations and improve efficiency, increase productivity, better manage risk and meet regulatory compliance needs. We explain how data portals are central to effective data sharing across banks and their operations.

What are the benefits to using your data portal to feed AI models? Digital transformation
What are the benefits to using your data portal to feed AI models?

Learn how data portals enhance the training and effectiveness of artificial intelligence models by providing reliable, high-quality and trustworthy data, which is essential to ethically deploy AI and harness its benefits.