Data virtualization: securely share your data on your marketplace, without duplicating or moving it
Data virtualization transforms the way organizations share and use their data. It allows data from external sources to be explored and consumed securely, without the need for duplication. In this article, Coralie Lohéac, Lead Product Manager at Opendatasoft, explains how deploying data virtualization within a data marketplace opens up new perspectives for data sharing and value creation within organizations.

Hello Coralie. In a few words, how would you define data virtualization and the role it plays in a data marketplace?
Coralie: Data virtualization makes data accessible and discoverable within a data marketplace, while it remains stored in its original system. There’s no need to duplicate or move it. It ensures that data on the marketplace is secure, reliable, and up-to-date, while allowing business users, who are often less technical or unfamiliar with tools such as data lakes or data warehouses, to easily explore it through an intuitive interface designed for them.
What led you to develop this new data sharing mechanism in the platform?
Coralie: Many of our customers were hesitant about importing certain data assets into their data marketplace. This was especially true when they came from departments that wanted to keep control and ensure it provided a single source of truth, because it meant duplicating the data. Data virtualization removes this constraint by leaving the data in its original location while making it accessible through the marketplace. It ensures reliability, consistency and security, while making data easy to access and manage by everyone.
Should you virtualize all your data in a data marketplace?
Coralie: Not necessarily. Most of our customers opt for a hybrid model on their data marketplace, combining virtualized data, duplicate data (that has been prepared to be more understandable through the platform’s processors) or just metadata. Our goal is to give them every possible option to share as much data as possible. Then, it’s up to them to decide the best approach, as obviously they know and understand their own data estate and needs.
When can data virtualization be beneficial?
Coralie: Data virtualization in a data marketplace makes it possible to expand the range of data accessible to business users. As some data cannot be moved, virtualization still allows it to be shared while providing key safeguards and benefits:
- Security and reliability: Data stays at its source, under the direct control of the teams that manage it. It therefore retains its freshness, reliability, and status as a single source of truth. This is the most popular benefit for our customers.
- Better data discovery: With AI-powered features like our multilingual semantic search engine or similar data recommendations, users don’t miss out on finding data that’s relevant to their needs.
- Extended range of users: data becomes accessible to all audiences via data consumption tools such as the automatic generation of data visualizations for business users, or queries via an API console for technical experts.
- Usage management: sharing virtualized data within a data marketplace makes it possible to track who accesses it, what queries are made, and how data is being used. This enables organizations to understand usage and justify the impact and ROI of data sharing.
- Reduced environmental impact: By avoiding storage duplication, data virtualization reduces the organization’s carbon footprint. This is a key advantage for organizations committed to Corporate Social Responsibility (CSR)/Environmental, Social and Governance (ESG) strategies or that have to meet compliance requirements.
What criteria are used to virtualize data?
Coralie: The choice around virtualizing data is based on multiple factors, including:
- The volume of data: The larger the size of the dataset, the more attractive virtualization is to reduce the costs associated with duplicate storage.
- Data sensitivity: Some data belongs to specific teams who want to retain ownership and ensure its reliability. Virtualization makes it possible to keep this single source of truth while sharing it securely with a larger number of users on the data marketplace.
Can better discoverability of virtualized data on a data marketplace add to costs?
Coralie: This is an important point to cover as data lakes and data warehouses often charge by use or via quotas. An increase in access to, and discovery of, virtualized data via the marketplace would logically lead to higher usage and higher costs. To anticipate this issue, we have integrated management tools such as configurable quotas into our platform to allow customers to precisely control the consumption of virtualized data. Interestingly, when we talked to customers, allowances and quotas purchased in advance are often underused. Therefore, by making this data more visible in the marketplace, virtualization allows organizations to take full advantage of their existing quotas and maximize data value and resource utilization.
Can you give us a concrete example of how data virtualization delivers benefits in a data marketplace?
Coralie: Of course. At Opendatasoft, we have virtualized our platform’s usage and adoption data in our own internal data marketplace. This is data from our data lake, which previously was only accessible to our data teams, because they were the only ones who had the technical training or relevant licenses. Now this data is accessible in real time via self-service to our Product Managers (essentially our business users), without the need for them to go through our data teams. This saves a considerable amount of time for everyone.
What developments can we expect in the near future? More connectors, more AI-based innovations...?
Coralie: In terms of connectivity, Opendatasoft already knows how to virtualize data from the main data lakes and data warehouses on the market, such as Snowflake, Databricks, Microsoft Azure and Denodo. Our priority now is to strengthen the contribution of agentic AI to our data marketplace solution, with the recent launch of our MCP server. This breakthrough opens up new opportunities for our customers to use and create value and is a key step in the evolution of our platform — all still driven by the same mission: to make data accessible and useful to everyone.