Today, the OpenDataSoft team is launching something really special; something for all you Open Data geeks out there.
When working on building a state-of-the-art data and API solution, we often hear this question:
Where can I find clean and usable data?
At the same time, here at OpenDataSoft, we need a lot of data too. We hunt new data to create new possibilities and to entertain you with our Open Data Weekly series.
Over time, the idea of creating a unified resource gathering every open data portal in the world started to emerge. Opening it to everyone was baked right in the project’s conception. A couple of Dr. Evil gifs later, we started outlining the Open Data Inception project: a list of 2600+ Open Data portals around the world.
So, after the tasty map of French cheese, the state of Open Data in 2014 and SNCF Open Data, OpenDataSoft is thrilled to continue its Open Data Weekly series with this comprehensive list (and a map) of 2600+ Open Data portals in the world.
Gathering 2600+ Open Data Portals around the world
Our first step was to search for similar projects. Whether the projects were OpenGeocode or DataPortals by the Open Knowledge Foundation, what we found was interesting, but not exactly what we envisioned. OpenGeocode was mostly focused on American portals – for example, no French portals are listed. In addition, the two weren’t easy to understand either: is there an API? Can I download an entire dataset? How can I select one area of a map?
We also found lists on Quora and StackExchange showing interesting data. However, we still had a problem because neither list was structured, nor were they easy to reuse without clicking the links every time.
We therefore decided to combine what we found with the Open Data portals we knew of, adding those by hand that were not already listed. The OpenDataSoft platform allows users to add different data sources to a single dataset. Thus, we added the data that we collected, as well as a link to an online table where we were able to add data by hand, keeping them permanently synchronized to their main dataset.
When mixing different data sources into one dataset, it is important to find a common thread between all of these data. In our case, we limited ourselves to the name, the organization, the link to the portal, and a location. In most other cases, all of the other information was difficult to find, and we wanted a single consistent and useful list. Next, we used simple scripts, namely Clojure, to harmonize the different fields. For example, we capitalized textual fields or converted geographic data into one coordinate system.
Cleaning and Enriching our Dataset
When collecting data from multiple sources, you seldom have a clean resource at first: there are typos, missing coordinates, duplicates, and various typologies.
We also wanted to get two things out of our data:
- A list of all Open Data portals classified by country that people could easily browse through and bookmark.
- An independent website showing a cool map on which Open Data portals could be geotagged. This would give a good feeling on the density of Open Data portals.
On our original list, countries, cities and organizations were all placed on the same level. So we went through and created two columns to add standardized country names (in French and in English).
It almost immediately raised the question of our own geopolitical knowledge. Should we classify England, Wales and Northern Ireland in different rows or include them in the United Kingdom? What about the Isle of Man, which is a self-governing British Crown Dependency? In order to avoid unnecessary fraying, we used the United Nations list of sovereign states.
Here is a table of all organizations and countries with Open Data portals that we could gather.
Our second task was to clean and fill geographical coordinates for all of the Open Data portals on the list. We just had a little over 1000 portals already geotagged, so we added the remaining 600 by hand.
City portals were easy to map but what about that of the United Nations or nation-wide portals? The later were respectively mapped on the headquarters of their parent organization and on their district capital. For example, if a portal was a civic initiative throughout Spain, we mapped it in Madrid. If the portal was Cantabria’s, we mapped it on Santander.
The last steps were to remove duplicates, push the new dataset onto our public portal.
Uploading the open dataset on our platform
In order to put our data on a map in seconds, we uploaded it onto our cloud-based data solution. The OpenDataSoft platform automatically recognized coordinates and mapped the Open Data portals. When looking at the map on a global scale, portals are regrouped as clusters allowing visitors to grasp the density of Open Data available in the area.
We simply customized the basemap and the markers through the admin interface. Neither code nor any hassle were required.
Building opendatainception.io in minutes
One of the perks of the OpenDataSoft solution is its ability to generate widgets that are always connected to your data through APIs. OpenDataSoft provides a large open source widget library, which allows for easy and quick construction of effective dashboards.
The widget code we’ve copied and pasted in our HTML page. Easy as A.B.C.
Our dataset and our map of all the Open Data portals from around the world were ready. The widgets then proved useful to build our website.
We created a responsive website displaying all of the Open Data portals by adding the map widget and the search box widget to effortlessly explore the data. The two widgets are connected: if you search for Paris, New York or the Isle of Man, the map will show you these results instantly.
No surprise: the US boasts almost 500 open data portals
A few things we learned along the way:
- More than 200 countries have dedicated space to Open Data, whether these portals are official, unofficial or civic initiatives.
- The United States boasts an impressive 500 Open Data portals, ranging from citywide to those of intergovernmental organizations, such as the UN. We have our favorite portal of course, that of the City of Durham and Durham County.
What’s next? Improvements.
If our goal was to achieve a comprehensive list of all Open Data portals around the world, our work is by no means done.
Dead URLs, new portals created every day, portals we missed… Since we hope the list will help other people and professionals to get a unified and up-to-date resource, we’d be happy to receive feedback.
We will also add other sources that are not Open Data per se over time: data dumps, Github repositories, portals…
We forgot one of your portals? You found a dead link? Shoot us an email! We created a form just for that. Or, you can do it on Twitter.