Glossary
Data Warehouse
A data warehouse brings together data from multiple sources into a single, centralized, large repository for storage, analysis and reporting.
What is a Data Warehouse?
A data warehouse is a data management solution that brings together data from multiple sources into a single, centralized, large repository for storage, analysis and reporting. Data is either structured (such as database tables or spreadsheets), or semi-structured (for example XML files or webpages).
Data comes from sources such as transactional systems, relational databases and business applications (such as for sales, ERP, CRM, finance) and is added on a regular basis. Before entering the data warehouse it is normally cleaned and standardized to ensure consistency, accuracy and quality. While data warehouses were traditionally hosted on-premise, they can now be hosted in the cloud or on a specialist data warehouse appliance.
What is a Data Warehouse used for?
Data warehouses provide a centralized information solution and can store enormous historical volumes of data. This can then be used for business intelligence and reporting functions including:
- Data analytics
- Data mining
- Data visualization
- Artificial intelligence (AI) and machine learning
This enables better understanding of business performance, improved decision-making, and the ability to predict future trends.
How does a Data Warehouse differ from alternatives?
Data Warehouse vs Data Lake
A data warehouse contains structured data that has been cleansed and standardized to fit with specific models or use cases. By contrast, a data lake contains raw data which can be structured, semi-structured, or unstructured. As data is stored in a raw form, rather than being optimized for a specific purpose, it can be accessed for a variety of immediate or future uses, depending on the organization’s needs.
Data Warehouse vs database
A standard database collects transactional information from specific, ongoing business processes, such as sales. It therefore provides a limited view of overall operations, and is optimized to support the ongoing business process. By contrast, a data warehouse aggregates multiple sources of information (including databases) and is optimized to store very large volumes of data, making it available for fast querying and analysis.
Data Warehouse vs Data Mart
A data mart contains a limited subset of data – such as information relating to a specific department or business process. It is normally part of a data warehouse and is designed to make it easier and quicker to access smaller groups of information. Unlike a full data warehouse, a data mart is not comprehensive or designed to provide enterprise-wide data.
Data Warehouse vs Data Lakehouse
As the name suggests, a data lakehouse is a hybrid approach that combines the strengths of the data warehouse and data lake approaches in a unified platform. Data can be stored in its raw form (as in a data lake), but with the ability to use data processing and analytics as in a data warehouse.
How does a Data Warehouse work?
Before creating a data warehouse, companies need to define its architecture and schema, setting how data is stored and how it will be used.
Information is added using Extract, Transform, Load (ETL) processes, either on a one-off, real-time or regular batch timeframe. Once added, data can be used and analyzed through business intelligence tools by data analysts and data scientists.
A data warehouse has three tiers:
- Top tier: the front-end client that presents results through reporting, analysis, and data mining tools
- Middle tier: the analytics engine that is used to access and analyze data
- Bottom tier: the database server, where data is loaded and stored
What are the advantages and disadvantages of a Data Warehouse?
The advantages of a Data Warehouse
- Improved data quality, by cleaning and standardizing data from multiple sources
- Faster business intelligence and real-time decision making, based on analyzing all organizational data
- Ability to use advanced analytics, such as data mining, AI and machine learning
- Greater consistency of data across the organization, breaking down data silos
The disadvantages of a Data Warehouse
- Data warehouses are complex to create, manage and update
- There is a requirement for skilled experts to both manage the data warehouse and run reports and analytics. Normal business users cannot query the data without in-depth technical skills
- Creating and running a data warehouse is expensive, due to the need for specialist skills and technology
- Data warehouses cannot handle unstructured data, which is a growing part of data volumes
- Data warehouses are inflexible as their models and use cases have to be defined up-front
Learn more about the differences between data warehouses and data lakes and how to unlock value from your data in this Opendatasoft blog.
Learn more
Data access
How to break down organizational silos to engage everyone in your data project
Organizational silos prevent data sharing and collaboration, increasing risk and reducing efficiency and innovation. How can companies remove them and ensure that data flows seamlessly around the organization so that it can be used by every employee?