Data Transform Tips & Tricks: Extract text from a single string

The OpenDataSoft platform comes with a wide swath of data transformation tools and powerful processors to make your datasets more useful for viewing and analyzing. In order to help you become a Data Processing champion, we’ve decided to begin this blog series to highlight a different processor, how it works, and its role in making your data more useful. Today, let’s extract text from a single string!

Code for Cary asks: How can we extract text from a single string?

(Our answer: There’s a processor for that!)

At one of our Code for Cary Brigade meetings we started looking at crime incident data from the Town of Cary. When looking at the data sets (before the launch of the Town’s Open Data program we were receiving weekly Excel spreadsheets from the Town as a public records request), we found there was a field that included a major and minor crime classification. We then said to ourselves:

If only we could break this field into 2 separate ones, we could do an analysis of the major and minor classifications separately. 🤔

OpenDataSoft made this separation possible! We loaded these spreadsheets directly into our brigade Open Data portal supplied by OpenDataSoft. The Extract Text processor then allowed us to quickly break the string into 2 fields.

Here’s the step-by-step process to extract text from a single string:

  1. The Crime_Type field is broken into 2 fields.
  2. The processor takes the word to the left of the first “-” and puts it into a new field Crime_Category.
  3. The words to the right of the first “-” rip and replace the Crime_Type field.
  4. We were then able to facet these 2 fields so we could rapidly sort and aggregate the data (a facet is another powerful processor that is a pre-index of a field). Here is the format of the processor (?P[^-]+) - (?P.+)

Extract Text from a single string

Yes, we made a gif to help you get the whole process. You’re welcome.

We then built a dashboard using the portal to allow people to see the types of analyses we had done. You can access the Code for Cary dashboard here.

There are several other processors used to make the crime data more useful and understandable. We will save these for future installments in your training. But if you want to get ahead of the game, check out the full list of processors in OpenDataSoft’s documentation.

You can check out the dataset here. You’ll notice that the dataset is no longer up to date since the Town has begun an Open Data program and the police incident data is now fully kept up to date there.

You are only limited by your imagination about how you use our processors to enhance your data. Let us know what processors you have used and why!


So, you love data tips and tricks?

You should download our checklist to learn what metadata are the most important to bring the most visibility to your data!


Related Posts