October 23, 2017 – BOSTON – Today, OpenDataSoft launched Open Data America, a first-of-its-kind initiative to release data portals for over…
Data Transform Tips and Tricks: how to anonymize location data with geomasking
Behold the second instalment of our new series: Data Transform Tips & Tricks! The last time, we covered how to extract text from a single string. This time, let’s use OpenDataSoft’s wide swath of data transformation tools to properly anonymize location data with geomasking.
Sounds great right? Let’s begin your training then!
The City and County of Durham asked: How can we anonymize location data in crime incident data sets?
(Our answer: There’s a processor for that!)
Crime incident data is one of the most highly requested datasets. When the City and County of Durham were launching their open data portal they wanted to publish crime incident data. Durham was very concerned about releasing personal information about victims.
Since it is quite easy to reverse the geocoding to generate approximate addresses, geographic data is considered to be a breach of a victim’s confidentiality.
Protecting personal data can be tricky. One of the first step that can be taken is to anonymize location data. It’s called geomasking. Several methods can be applied. Unfortunately most of these geomasking methods lack rigor in protecting personally identifiable data. For instance, some solutions only truncate the last two digits of the street address making 130, Lourmel Street appear as 1XX, Lourmel Street.
OpenDataSoft has a different method of geomasking called the OpenDataSoft Donut geomasking processor. This processor gives a random displacement within a donut defined by an outer circle, and a smaller internal circle within which displacement is not allowed. In effect, this sets a minimum and maximum level for the displacement. Masked locations are placed anywhere within the allowed area.
Studies show that the donut method provides a consistently higher level of privacy protection.
Following steps to fully protect personal data would involve looking for patterns and data points leading to indirect identification. This is something that would need to be dealt with on a case-by-case basis.
There are several other processors used to make crime incident data more useful and understandable. We will save these for other installments in your training. But if you want to get ahead of the game, check out the full list of processors in OpenDataSoft’s documentation.
You can check out the Durham Crime Police dataset here.
Bonus visualization of anonymized location data
Here is an interesting visualization of Crime Data using the OpenDataSoft Cartograph advanced mapping tool. This time we had some fun with data from the City of Chicago. Each layer on this heatmap shows different types of crime correlated geographically.
The dataset that powers this visualization has 4,716,835 records. Play around the layers to get a taste of the OpenDataSoft platform responsiveness.
You are only limited by your imagination about how you use our processors to enhance your data. Let us know what processors you have used and why!