Language

[Webinar] How Birmingham City Council transformed data sharing with Opendatasoft to power smarter decisions and greater efficiency

Register now!

Opendatasoft boosts data enrichment, even when using the largest reference sources

ProductData access

Enriching your data is a key step in creating relevant insights and analysis that drives value. However, when it comes to using massive reference sources such as national company databases, detailed weather data or geographic/administrative boundary datato enrich your data, technical limitations often become a challenge.

This issue is now a thing of the past with Opendatasoft. Our data marketplace solution allows users to perform joins – i.e. to link one dataset to another in order to automatically enrich it with external information – even when these databases contain several tens of millions of records. All without compromising on performance. This breakthrough opens the way to deepen and widen analysis and uncover exciting new perspectives.

To find out more, we spoke with Valentine Copin, Lead Product Manager in charge of the connectivity and data enrichment aspects of the Opendatasoft platform.

Valentine, this product evolution seems to be a real step forward. Can you explain what it actually means for Opendatasoft users?

Valentine:

Yes, this is an important advance, especially since amongst our 50 processors that help customers prepare and enrich data, the join processor is the second most popular.

Until now, joins in Opendatasoft were limited to sets of up to 100,000 records. This was sufficient for most cases, but for enormous reference datasets, this limit meant that the datasets had to be divided into subsets, external processing had to be carried out, or even certain analyses had to be abandoned.

It is important to understand that it is rare to be able to achieve this type of limitless join, given the technical complexity that it requires. Now, our users can do it directly in the Opendatasoft platform, even with databases containing several tens of millions of rows. The result: a seamless experience and richer, better data marketplaces.

Can you give us a concrete use case to illustrate how it works?

Valentine:

Of course! Enriching their data with information from national company databases, such as SIRENE in France, is something that many of our customers rely on. SIRENE itself is a very popular dataset, which we make available and keep continuously updated on our Data hub. So I’m going to make a concrete example based on this dataset – it would work equally well on equivalents in other countries.

Let’s look at if a customer has a dataset listing transactions between companies, and wants to analyze economic flows between different regions within a country.

With the platform’s new ability to perform joins without size limits, it will be able to enrich the customer dataset with the SIRENE database (which contains more than 50 million rows). This means it can enrich each transaction with the location of the two companies concerned.

By cross-referencing the SIRET database of companies with SIRENE, it has added the municipality where each entity is located. The result: granular and geographically contextualized analysis, carried out 100% in Opendatasoft in just a few clicks.

And how fast does it perform? Joining on 50 million lines sounds scary...

Valentine:

This is precisely where we can demonstrate the value this new development brings. We have optimized our engine to keep this type of operation smooth. This means that even at very high volumes, joins run with very reasonable response times.

And above all, it remains consistent with our guiding philosophy – to offer an intuitive, straightforward experience, even in the most complex areas. We want our users to be able to focus on analysis, not on technical constraints.

To sum up, what kind of projects is this new capability particularly useful for?

Valentine:

It’s ideal for all projects that require datasets to be enriched with reference data: regional analyses, fine segmentation, or cross-referencing of multiple sources.

This goes beyond public data. While we think of enormous national databases such as SIRENE, BAN, or IRIS, the feature also  applies to enriching with very large internal business data sources. For example, by associating internal customer data with the SIRENE database or the National Address Database, customers can better target and personalize their sales and marketing strategies. Similarly, by combining internal point of sale information with demographic data, they can optimize decisions around where to site shops or branches to maximize revenue and reduce costs.

This product evolution allows us to further enrich our customers’ datasets and offer them new insights into analysis. All with a seamless experience in the Opendatasoft platform.

Do you have final advice for those who are still hesitating to enrich their data with Opendatasoft?

Valentine:

Enrichment is a powerful lever to unlock the full value of your data. And with this evolution, it becomes accessible to all, even on “XXL” volumes. Do you have massive datasets or want to cross-reference them with large repositories, but thought it was too complex? Try it now. You’re in for a surprise!

👉 Contact our teams to enrich your datasets now with our turnkey data marketplace solution.

Share this post:
Articles on the same topic:
ProductData access
Learn more
Building effective teams to increase data consumption at scale
Blog
Building effective teams to increase data consumption at scale
Generating value from data through a data product marketplace
Blog
Generating value from data through a data product marketplace
How do workflows on a data marketplace connect business users to data to accelerate value creation?
Blog
How do workflows on a data marketplace connect business users to data to accelerate value creation?