So, what are you going to do with my Open Data?
Most people think that opening data is a scary thing because we never know how it might be used after all. In the following article, Nicolas Terpolilli, our Chief Data Officer, will debunk the myths about what comes after you open your data.
Then you begin to realize that believing in people is not just a romantic myth. But here you see that the first requirement for communication and education is for people to have a reason for knowing. It is the creation of the instrument or the circumstances of power that provides the reason and makes knowledge essential.
Saul Alinsky, Rules for Radicals
Data and the democratization of skills
AI is replacing Big Data as the main trending topic in tech. This is a good thing because we may now have the necessary distance to understand precisely what Big Data means. There is obviously no size or performance issue: what appeared like big in 2010 may now feel small, and the volume of data we are able to handle will only keep growing. It is the same for the performances. What has been remarkable in the last decade around data is the democratization of both the tools and the data sources. Not too long ago, a Fortune500 company needed an entire Ivy League engineering team and millions of dollars to parallelize strong calculation and analyze expensive data. Now, almost anybody can find a lot of Open Data. Anybody can plug him or herself into the Twitter Stream API. The “peace dividend of the smartphone war”, that is, the ever falling price of smartphone components gave birth to the Internet of Things: anybody can include a GPS and several other really cheap component to any “thing”, leading to tons of new data. Anybody has an access to a huge amount of real life data. On top of this, there is a complete ecosystem of tools, tutorials, libraries and books to analyze these data. You can run machine learning (or even deep learning if you prefer to stay trendy) in a few lines of code. And when you are ready for production, cloud computing allows you to deploy huge data infrastructures for a few hundred dollars. Even if Moore’s law is dead, that’s not really your problem.
Data democratization is really something huge. It has allowed a lot of companies to exist and better serve their customers. It has allowed city administrations to better serve their citizens, and even save some lives. I’m pretty sure there are more living people knowing how to make a linear regression than have ever before human history.
Anybody can learn online for free, how to handle data, how to understand it, how to stock it, how to share it, how to visualize it and how to cluster it. Open Knowledge Foundation’s School of Data was designed following the idea that more and more data will be available and that data is power, hence the classes. At the same time, open or freely distributed tools like SQL/NoSQL databases, Elastic Search, the whole Hadoop framework and parallelization tools, languages like R or Python and vizualization libraries like d3.js are available. This is great. This is the Big Data revolution. But we should go much further. My grandparents took time to master their 90’s Internet provider’s mailbox. It is so poorly designed that I’m ready to bet it is possible to guide them through basic data cleaning, cross analysis and map creation using a well thought-out design.
So, what are you going to do with my Open Data?
Back to Alinsky’s quote. In his book Rules for Radicals, he explains that asking people what kind of policies they would conduce if they were given $5,000,000, is totally dumb. Indeed, unless we really know what $5,000,000 means in reality AND we really have that money to spend on something real, our brain just can’t think it through profoundly.
“We should dispatch the $5,000,000 between 20 schools to test different education methods for a few years. Since we are living in a Power Law world, we may discover a model that is much, much better than every other model, and that would be the time to find $100,000,000 to apply that model to every other school, including my daughter’s.” Said no student’s parent ever.
The Wire school-system-focused 4th season
If this is not the most common answer among parents, it may be the answer of an investor, an entrepreneur or maybe any wealthy family. Why? Simply because $5,000,000 means something to them. They are used to both the circumstances and the instruments of investing, hence the answer.
Things are exactly the same in dealing with Open Data. Everytime we talk with somebody interested in opening datasets, he or she ends up asking us what the results of opening the data will be. That is normal; we are used to it, and there is no easy answer, and it depends mostly on how you open the data.
When you expect people to re-use your open data, you also have to empower them with the right context and instruments that allow them to really be open-minded – in the original sense of the term – about the data.
- Basic Open Data — that is easy-to-download datasets, in common formats and licences — is not sufficient. If the data are really interesting, it may be successful, but it probably won’t be a complete success.
- Open Data that include developer tools like APIs or Linked Data facilities (Sparql endpoint) are really empowering, but it empowers only developers. It is much better because an empowered developer can create good services or apps and, then, indirectly empower more people. But that is not fully satisfactory.
- Open Data just thrown out there without any context and, mostly, without any effort to federate a community around it, can have a tree-falling-in-the middle-of-the-forest kind of effect.
Your opened data might be somewhere in there 🙁
Attempting to imagine what will be the usages of opened datasets, before the opening is a simple act of delusion. Even more, if you are able to imagine their usage before hand, the planning and the execution of your data opening have failed. The game of Open Data is about releasing new materials, maybe some linked tools and organizing an ecosystem, a community, to give actors a reason to do something with it independent from pre-conceived ideas.
The organization of the community may also vary in the time. We observe a growing number of Open Data Hackathons for example. At first, a hackathon was a powerful tool to organize a community and to give some context to the data. But, the last hackathons I have been to, both as a mentor or as an attendee, were full of regulars. When well organized, they are still positive and still contribute to the spread of knowledge around the data. But I do believe there are other ways to make people meet and guide them in data discovery.
Knowledge, ready for the taking
It is the creation of the instrument or the circumstances of power that provides the reason. So how do we do that? Once you’ve started actually releasing real open datasets, and have real data to play with, the circumstances part is almost done. You’ll still have to maintain users confidence, (by not crashing their apps by (re)moving the data for example). Continuing to release some data and maintaining good relations with the community are important too. But still, usually by opening some good data you create de facto the circumstances for people to use them. The main issue, in my opinion, is that something is still missing from most of the Open Data approaches, the instruments.
At OpenDataSoft, we try to give people a few of the tools that let them become familiar with the data. On the publisher side, we provide a basic ETL infrastructure (we believe in Power Law so it’s a 80%-of-the-issues ETL) with some easy-to-use data cleaning processors. The platform also allows users to geocode their data, or to join the data with pre-indexed datasets. So it became handy to enhance the data themselves. We allow data to be typed directly onto the ETL system, because typing means describing and giving context to the re-users. The typing we provide is still really basic but we are working on a much larger ~semantic typing. We really want to give non-tech people the ability to work on and enhance the data, since they too know the lot about them. Once the data are open, on the data consumer side, we provide several other instruments. Chart builders, map builders, open source visualization widgets, HTML+CSS editor (so you just can copy-paste the widgets and design interactive dashboards) and obviously an API. Each tool allows the data consumer to get an idea of what’s behind the data, they give people a good idea of what they will be able to do with them. There is no need to download the data first, nor any need to have and master data tools; you just need to click Map, Analyze, API and you have it!
That’s maybe 80% of what good, ready-to-use instruments would provide. We work hard to enhance those existing tools but there are those 20% that need tools too. That is why we are working on projects like OpenDataInception. That’s also why we are working on developing a real data network with our customers open datasets. By giving a way to data producers to feed sub-data-portals with data to the choosen granularity, they are able to give much more context and to target the most interested people for each piece of data. If I live in a small city in France, I may not want to find huge national datasets, download them, open them, and then filter the data to find the data concerning my place. By crunching the data to their thinnest granularity and keeping a unique source, data producers are now able to bring beautiful dashboard to the people concerned by that granularity. Because of that design, people are able to understand much more quickly the data and their usefulness. They are then able to get to the data source, and make something of the data.
There is still more to think about. Linked Data and 5-stars-data are maybe the best way to provide circumstances to data consumers. But the scope of people having the skills to enjoy it seems to be too small. We can hope people will learn, we can teach them, but I think we have the duty to give them the tools to benefit from it from the start.
Open Data is on a good path. Everyday new open datasets are opened, and that is awesome. However, we still have work to make the data deliver their full potential. You can’t ask people to seriously imagine new services with your data without giving them both circumstances and instruments. So start to open data right know, learn fast and create real and honest connections with people. You’ll get real and honest results…
This article was first published on Medium
You want to open your data?
Grab our free 10-step guide now! It is loaded with hands-on advice on how to properly start your Open Data project.