Glossary

Star schema

A star schema is the simplest type of database/data warehouse schema used to store data, with the model’s design resembling a star shape.

What is a Star Schema?

A data warehouse schema governs how to structure tables and their mutual relationships within a database or data warehouse, meaning it creates the shape your data takes.

The star schema consists of a single ‘fact table’ that contains information about events or facts, surrounded by a single level of ‘dimension tables’ (or lookup tables) that contain descriptive information about the dimensions of these facts or events. Data is denormalized into these dimensions and facts.

What are the characteristics of the Star Schema?

The star schema has these features:

Denormalization: Star schemas denormalize the data. This means they add redundant columns to some dimension tables to make querying and working with the data faster and easier.
Non-hierarchical structure: Star schemas are single level, with multiple dimension tables connecting directly to the central fact table.

What are the advantages and disadvantages of the Star Schema?

What are the advantages of the Star Schema?

It is simple to understand, design and implement
It is well suited to running simple queries, and works well with OLAP cubes
It delivers faster performance due to its reduced number of joins and efficient indexing of fact and dimension tables
It delivers clear, intuitive analysis, with easily understandable relationships
It is scalable and can handle large amounts of data

What are the disadvantages of the Star Schema?

Due to redundancy, it requires more storage space than other schema models, adding to costs.
It does not enforce data integrity, due to its denormalized structure, potentially impacting data quality.
Denormalized dimension tables can be more difficult to maintain, as updates may require changes in multiple places.
It can be harder to run more complex queries.

What are the uses of the Star Schema?

Star schemas can be applied to data warehouses, databases, data marts, and other tools and are optimized to simplify the querying of large data sets.

Analysts are able to create queries that filter and group data by one or more dimensions and then aggregate results at different levels of granularity.

In contrast, the star schema does not lend itself to applications such as online transaction processing. This is because its denormalized structure requires data to be processed and verified carefully on an ongoing basis to ensure its integrity, impacting performance.

What is the difference between the Star Schema and the Snowflake Schema?

Data warehouses are normally built on either a star schema or snowflake schema model. There are seven main differences between them:

Normalization: Star schemas are denormalized, with values repeated within a table. By contrast, the snowflake schema has a fully normalized data structure, with dimensional hierarchies stored in separate dimensional tables.
Data redundancy: The star schema stores repeated data, leading to data redundancy.
Disk space: As it stores redundant data, the star schema uses more disk space than the snowflake schema.
Query complexity: It is simpler to run queries on a star schema database, as it has only one level of dimension tables, and does not require queries with multiple joins.
Query performance: Star schema queries are faster than those in snowflake schemas due to being less complex.
Data integrity: In the star schema, multiple copies of the same data exist in different dimensional tables. This means new inserts, updates, or deletes can compromise the integrity of data. In contrast the snowflake schema stores dimension data once, improving data integrity.
Set up and maintenance: As they are simpler, star schemas are easier to design and set up than snowflake schema. However, due to potential data integrity issues star schemas are harder to maintain when new data is added to a data warehouse.

What is a Starflake Schema?

As the name suggests, a starflake schema is a combination of a star schema and a snowflake schema. It aims to bring together the benefits of both approaches and is based on a snowflake schema where only some of the dimension tables have been denormalized.
Shared dimensional hierarchies are placed in outriggers in order to normalize the schema.

Learn more

Blog

5 best practices for creating a business glossary for business teams

How do you reduce misunderstandings around data and ensure there’s a common language used to describe data shared across between every department in an organization? By adopting key best practices, a business glossary provides a simple but effective tool to enable self-service data discovery, improved collaboration between teams, and better-informed decision-making. Find out how!

Blog

Data, metadata, data assets, data products: understanding the differences between these key concepts

In an increasingly data-driven world, understanding the differences between data, metadata, data assets, and data products is essential to maximizing their potential. This is because these interrelated yet distinct concepts each play a key role in driving digital transformation by facilitating data sharing and consumption at scale.

Blog

How to break down organizational silos to engage everyone in your data project

Organizational silos prevent data sharing and collaboration, increasing risk and reducing efficiency and innovation. How can companies remove them and ensure that data flows seamlessly around the organization so that it can be used by every employee?

Start creating the best data experiences

Request a demo