Welcome to the OpenDataSoft Leadership Podcast Series, “Open Data Discussions”. Each month, Jason Hare, our Open Data Evangelist, features a different…
OpenGrid, OpenDataSoft, & the future of Open Smart Cities, Part 2
In the first part of this post, I shared an overview of the City of Chicago’s OpenGrid project, as presented by Tom Schenk, the Chief Data Officer for the City of Chicago, at the Amazon Web Services (AWS) Public Sector Summit in Washington, DC on June 20-21, 2016. In this post, I want to take the time to discuss some of the lesser-known (and well known) features of the OpenDataSoft platform that really makes it, in my mind, the platform for the future of Open Smart Cities.
It’s an impressive project that integrates four core technologies – one commercial Open Data platform (Socrata), two large scale open source programs (OpenGrid.io and Plenar.io), and the deployment and management of an unstructured database (MongoDB) – in order to overcome the shortcomings of the available data platforms in the Open Data space. In particular, it allows real-time event data from IoT/smart city sensors to be meaningfully published and useable to the public and developers.
As noted in Part 1, I was particulary struck by OpenGrid as it mirrors the solution I have discovered in OpenDataSoft, which overcomes the shortcomings of the available data platforms in the Open Data space, and does so in a way that is accessible to cities of all sizes.
Here is the story of my journey and that of my colleague Jason Hare as we sought a better platform to make Open Data (and internal data) more approachable and useful to everyone, a journey very much like the City of Chicago’s quest to achieve a very useful data portal. You can read more about Jason’s personal journey in his post.
My Journey to OpenDataSoft
People that know me are aware that for many years I have been simultaneously involved with many different ventures, non-profits, civic organizations, etc., all of which have some focus on technology, better uses of data, and civic engagement.
I joined Jason Hare’s company BaleFire Global (BfG) in early 2014. BfG specialized in deployments of Open Data programs and portals worldwide. Through BfG, we were not only involved in launching many Open Data programs, but we were also involved with international Open Data organizations like the Open Government Partnership (OGP) and the Open Data Institute (ODI). One common issue we faced at BfG with the available Open Data platforms was how user-unfriendly and lacking in functionality these platforms were.
Jason spent quite some time looking for anything better than the current Open Data platform offerings and finally identified a new player in the market; OpenDataSoft. At the OGP meeting in Dublin in July 2014, Jason was able to meet with the OpenDataSoft CTO, David Thoumas. Based on what Jason saw then, OpenDataSoft solved most of (if not all of) the pain points that our current Open Data customers were facing with the available platforms. We had found what we were looking for!
Later in 2014, we again bumped into OpenDataSoft at the ODI as we were building out our “node” franchise of ODI for North Carolina. OpenDataSoft was participating in the ODI Business Incubator while we were training and meeting with the ODI for our node franchise.
In 2015, Jason had the opportunity to launch the first OpenDataSoft customer in the US, the City and County of Durham, NC. In mid 2015, to the shock of my close friends, I dropped several of the activities with which I was involved and joined OpenDataSoft “to focus on just one thing.”
Why OpenDataSoft can help build the future of Open Smart Cities
So what is it about the OpenDataSoft platform that made me want to focus on “just one thing”? OpenDataSoft has a very easy to use platform with many powerful features, a very responsive development team and a great product roadmap. I see in OpenDataSoft a company that can be a leader long-term in the Open Data and Data markets: an exciting place to be!
Here are some of the very customer pain points with current Open Data platforms that I see that OpenDataSoft easily solves:
Entering Data into the portal
One of the biggest obstacles for an Open Data customer is loading clean and useful data. Many datasets are in proprietary formats and it is a well-known problem that all datasets are in some way “dirty”. Many customers use expensive and complicated ETL (Extract, Transform and Load) tools to prepare data for upload to their portals. This is a time-consuming and expensive step in the process that significantly delays and slows the opening of data.
OpenDataSoft has many data connectors and harvesters to import data directly into a portal, along with preview and enrichment features, that eliminate the need for separate ETL tools. The platform supports many standard sources and data formats, including tabular, time series, geospatial, image and many others. Data connectors can pull data from proprietary formatted datasets (the ArcGIS and Salesforce connectors are good examples of OpenDataSoft data connectors). See the “Publishing your Data” video with OpenDataSoft here.
The platform’s harvesters can be thought of as similar to web crawling bots; they will load all datasets from a root level of a data structure, be it a website, FTP server, data portal or API endpoint. With very little effort, large numbers of datasets that have already been collected on a data structure can all be loaded in one step, including data on other Open Data portal platforms like CKAN and Socrata. For an insight into the power of the OpenDataSoft harvesters, read Jason Hare’s experience with Chapel Hill, NC – “Sorry for Blowing Up Your Website!”
Note: Any dataset that is loaded into OpenDataSoft from a URL (Http(s), FTP(s), API endpoint, etc.) can be scheduled for automatic updates. Keeping datasets up-to-date on an Open Data portal can be very labor intensive and results in many of the “zombie” portals you find. OpenDataSoft’s automated data loading capabilities minimize ongoing personnel time demands.
Data Transformation and Data Cleaning
OpenDataSoft includes a wide range of data processors, including one which adds facets to data. A facet is a pre-indexed field. Instead of breaking large datasets into multiple files to make them more accessible (i.e., loading a file by fiscal year), they allow you to keep all of the data in the same file. For example, selecting a year from a date facet will sort the dataset and immediately provide the single-year file as described above. OpenDataSoft is scalable to hundreds of millions of rows, so there’s no need to worry about the dataset size or whether the user will be able to sort very large datasets to a manageable size for download and independent analysis.
All datasets have dirty data resulting from keystroke errors, and you can use OpenDataSoft processors to remove obvious data errors from view of the users. Errors like dates beyond the time scale of the dataset and geospatial points on “null island” can be removed from the dataset (the best practice would be to fix this data in the system of record, but the enormity of the task usually makes this a very low priority). Fields with case issues, common misspellings, having some data abbreviated, etc. can all be normalized with other processors.
So what other data issues can you solve with OpenDataSoft processors? Here is a list I have come across, but you are only limited by your imagination:
- Have an address and need to geolocate? -> Geolocation processors
- Have an address split over many fields that you need to assemble for readability or geolocation -> Concatenate Text Processor
- Have a field in another dataset you want in the dataset you are working on -> Join Dataset Processor
- Need the day of the week, month of the year, etc. for analysis or periodicity in data? -> Date and Numerical Processors
- Have multiple pieces of information in a field (i.e. status and number) -> Extract Text Processor, Split Text Processor
- Need to create ranges of data in a numerical field? -> Tertiary Numerical Processor
- Need to make a file with multiple columns of yes/no features useful? -> Transpose Fields Processor
See OpenDataSoft’s documentation and full list of processors.
OpenDataSoft makes data human-readable with many easy to use dataset visualization tools (tables, maps, charts (analyze), images, calendars, etc.), a platform-level charting tool to combine time series datasets, and a platform-level mapping tool to combine datasets with geospatial data.
This is a very important differentiator for the OpenDataSoft platform over other existing Open Data platforms. For a great example of how a data platform that is both machine-readable and human-readable can dramatically improve your open data program, read this case study about the Open Data project run by Keolis Rennes.
In 2014, Keolis switched from their expensive and complicated proprietary Open Data solution to the user friendly and human-readable OpenDataSoft platform. Immediately, they were able to double the number of open datasets and now have 10 real-time datasets and 9 city recognized transit apps, in a city of 200,000 people.
Website Configuration, Performance Dashboards, Data Stories
The OpenDataSoft front office is fully open source. I repeat, the OpenDataSoft front office is fully open source. Through HTML, CSS, the OpenDataSoft AngularJS Widget libraries, some AngularJS and Bootstrap, you are in full control of the editorial presentation of your site and data.
As a publisher, you are in full control of the color, fonts, graphics, spacing, layout, etc. of your website. No need to wait for a vendor to move items a few pixels, change text, etc.
Also as a publisher, you are in full control of the development and presentation of dashboards and data stories. No need to be limited to a few canned dashboards, you can build the dashboard to help you solve the specific issues you are tracking.
The OpenDataSoft widgets can also interact on dashboards, allowing you to build very complex interactions between datasets. Look at the Restaurant Inspection dashboard made for Durham, NC which navigates through 3 datasets on the same dashboard!
That OpenDataSoft has an open source front office means that dashboards developed in one municipality can be redeployed in another with little effort. The Durham Restaurant Inspection Dashboard was easily copied to Wake County, NC with a few changes for field names. There is an open source community developing around OpenDataSoft and besides new and modified widgets submitted by customers, there is an open ‘cookbook’ of tips and tricks for deploying and customizing OpenDataSoft.
Moreover, the open source front office of OpenDataSoft is built using Bootstrap components, so all webpages, data stories and performance dashboards built on the OpenDataSoft platform are natively responsive to multiple screen formats.
Enabling App Development
The app developers I have been working with love the flexibility and power of the OpenDataSoft API. The apps they are developing do not need a middleware layer to interface with the OpenDataSoft portal. Through the API they are able to receive the specific data they need without having to process a larger data pull to extract what they need. Examples:
- Citygram, Code for Charlotte. No middleware is required to interface the OpenDataSoft Open Data to Citygram. At the 2015 Code for America Summit, in 1 day they were able to take an XML data feed from the Tulsa, OK website and publish it to Citygram using an OpenDataSoft portal.
- Open Source Community platform by Concursive. Concursive is able to programmatically create API calls to automatically generate 180,000 customized property pages populated with multiple datasets from OpenDataSoft Open Data portals.
See the webinar “Maximize your Open Data Initiatives with APIs” for more.
Real-Time Event Data Support
Unique among Open Data portals, OpenDataSoft’s platform was natively designed for real-time, geo-localizeable sensor data. In fact, OpenDataSoft’s first customer was not a government agency but a utility company. That customer, Veolia, used the platform to support smart water meter data publishing, sharing, and dashboard creation.
OpenDataSoft uses Elasticsearch in its back-office stack, which provides fast processing, making search and retrieval of massive real-time datasets possible. Whether you are looking at datasets of thousands of rows, or hundreds of millions of rows, the platform is very responsive. A dashboard made from datasets as large as 150 million rows can refresh all charts and graphs within an acceptably short 5-10 seconds. Your days of watching spinning logos as data is processed is over when you have OpenDataSoft!
Handling Internal Data
OpenDataSoft is built with a fine-tuned security access layer for users and groups to the dataset, data subset and fields level. While OpenDataSoft was originally inteded for the municipal open data market, our first (and still largest) customer was a utility company looking to break down data silos in the company. Today, about half of our customers are in the commercial market depending on our security layer to protect their data.
In summary, Jason and I are at OpenDataSoft today because we found in this single platform the answer to so many challenges we had struggled with for so long on the front lines of developing Open Data portals and applications.
We admire Chicago’s OpenGrid project, and love the city’s deep commitment to Open Data and Open Governance, and its continuous innovation in these spaces. However, we’re also very happy to share with cities of all sizes that there’s a great turnkey cloud solution that can deliver the capabilities lacking to date in Open Data platforms without having to integrate, master, or maintain multiple systems (in the case of OpenGrid: OpenGrid.io, Socrata or CKAN, MongoDB, and Plenar.io).
OpenDataSoft was the answer to our quest for easier data acquisition and maintenance; human-friendly data search, navigation and visualization; real-time event data support, easy API generation and maintenance; and more flexible cloud and security options. If you’re on a similar quest, sign up for a free account to try out OpenDataSoft.