Open Data as Terraces
Open Data is nothing more than a way to spread data. As there are plenty of different kinds of distribution networks, we can imagine several ways to grant easy access to data that are ready to be (legally) reused. At OpenDataSoft we’ve been working on such a distribution architecture, and it looks like terraces…
Open Data Terraces wat?!
a terrace is a piece of sloped plane that has been cut into a series of successively receding flat surfaces or platforms, which resemble steps, for the purposes of more effective farming. Wikipedia
Among all the pros of terraces, there are three main characteristics that may, in my opinion, inspire us. These three points echo the issues that someone in charge of a data portal can face.
- Better exposure. And when it comes to data, reaching its public and its re-users may be quite difficult. Google is a natural gateway but its algorithms will favor data coming from organizations good with SEO but not necessarily with releasing data.
- Better irrigation. Basically when you manage data, you know that everything has a cost, every step requires work. So anything that allows you to better manage the resources and automatize work is interesting.
- Simplifies a cultivator’s work, well at least the ground is flat. If we want more open data, we have to make the lives of the people in charge easier.
About Data Distribution
Classical data distribution is pretty simple, generally coming in the form of a catalog of licensed datasets. Problems begin when you have a large number of datasets, multiple formats, and even more when you have various organizations sharing their data in the same place. I wrote about those portals a few time ago. It looks like a lake on top of a mountain or of a volcano. It’s a real passion for a lot of people: the trek is enjoyable, and what one sees at the top can be really interesting. But still, reaching the top might be inaccessible for most people.
Imagine someone who wants to compare his or her city with another one:
- Step1: Find and go to the portal
- Step2: search for the data
- Step3: What format?!
- Step4: OK, I know that Excel logo
- Step5: open that file
- Step6: How do I filter the data?
- Step7: what’s my area ID??
- Step20: reuse the data…
Others have tried to develop more advanced/user friendly Open Data portals. When the data are indexed before the distribution, filtering and searching becomes far more effective and the data are easier to find. Since they are indexed you can allow a user to try building maps or charts. They can truly see what the data look like before downloading them. That’s what we do at OpenDataSoft but we are not the only ones.
How we like to see the data distribution on our portals
Introducing sub data portals
The Open Data movement is still very young, and we may not have found the perfect way to distribute data in every cases.
So we decided to craft a new architecture for Open Data portals.
The whole architecture is based on the concept of sub-domains. Our customers are now able to generate sub-domains from their Open Data portals’ back office. These sub domains can be perfectly shaped to a company’s departments, mirror administrative divisions or to allow an industry to share data with various actors within its territory.
The sub domains are real Open Data portals by themselves: they are full-featured, they have the same graphic charter than the principal portal, they have their own administrators, their own data and their own ways to advertise the data.
All of this is quite interesting, but I think the key feature here is the ability to federate data in both ways. The principal portal can distribute datasets or parts of datasets (think ‘at the good granularity’) to each of the sub portals. And if any of sub portals load a new piece of data, the principal one can unify it. This is also a new way to decentralize and distribute the data collection!
Architecture of the sub-domain way to deal with data distribution
Why would I do that?
Success follows a Power Law
A lot of thing in life follow a Power Law, including the distribution of downloads among open datasets. This become even more true when we speak about information technologies: in an economy where anything can be commoditized in a few years, network effects and data-network effects are stronger and exacerbate the distribution.
That is one of the key factors in Open Data’s strength. As you don’t know who will have the best design to better serve your citizen or reusers, you create the infrastructure so that anyone can use the data and develop it’s own solution.
Both by creating new distribution channels and decentralizing the ability to leverage the data, and you increase your chances to see new services created around your data and you strengthen your position in the ecosystem.
Ever more empowerment
Open Data has always been about empowering people: citizens, the people administering the portal, and gradually more and more people within an organization consuming the data.
Releasing data empower people. Giving certain people the opportunity to handle a data portal all by themselves with access to all the necessary tools for data cleaning, ETL processes, APIs, char and map building, is a new way to build some very strong links with your community or into your organization.
Make data even easier to discover
From our portal data.opendatasoft.com to our new Slack integration, our goal is to make the data move faster and faster. By developing a new way to distribute data, depending on the granularity of the data and involving more and more people, we do believe more data will find their re-users…
This article was first published on Medium
You want to cultivate your open data?
Grab our free 10-step guide now! It is loaded with hands-on advice on how to properly start your Open Data project.