[WEBINAR] Product Talk: Using AI to enhance the data marketplace search experience

Save your place
Glossary

Box Plot

A box plot is a standardized, graphical way of summarizing the distribution of a set of data groups and visualizing them for further analysis.

What is a Box Plot graph/diagram?

A box plot is a standardized, graphical way of summarizing the distribution of multiple sets of data. It enables the display of five different values – the minimum, first quartile, median, third quartile, and maximum – in a single box shape for each group. A box plot therefore makes it easy to visualize and understand the spread of data collected and its distribution, and to compare these between groups.

It can also be used to show variability beyond the normal spread of the upper/lower quartiles through lines (called whiskers) which extend from the box, hence its alternative names of a box and whisker plot or diagram. Further outliers can be shown as data points on the graph.
The shape of the box plot shows how the data is distributed and any outliers. It is a useful way to compare different sets of data as you can draw more than one box plot per graph.

Box plots can be aligned with the boxes placed vertically (with groups on the horizontal axis) or horizontally (with groups aligned vertically). Orientating boxes horizontally is helpful when there are a lot of groups to plot, or if those group names are long, as they don’t need to be abbreviated. Orientating boxes vertically works well for other types of data, such as when the grouping variable is based on units of time.

Why are Box Plots used?

Box plots are used to provide at a glance, high-level information about a group of data, showing its symmetry, skew, variance and any outliers. Viewers can easily see where the main bulk of the data sits, and box plots are clearer to understand than a line chart when there is a great deal of variability in the dataset. Box plots also enable the comparison of multiple data groups, on the same graph and using the same scale.

However, the simplicity of a box plot means that there are limitations on the density of data that it can show. It is not possible to view the detailed shape of a distribution or spot specific peaks or troughs.

How do you create Box Plot diagrams?

Creating a box plot is standardized process:

Analyze your data

Arrange your data in numerical order, from the lowest to the highest. Then analyze it to find the five-number summary:

  • The minimum (Q0 or 0th percentile): the lowest data point in the data set excluding any outliers
  • The maximum (Q4 or 100th percentile): the highest data point in the data set excluding any outliers
  • The median (Q2 or 50th percentile): the middle value in the data set
  • First quartile (Q1 or 25th percentile): also known as the lower quartile. This is the median of the lower half of the dataset.
  • Third quartile (Q3 or 75th percentile): also known as the upper quartile. This is the median of the upper half of the dataset.

Create your graph

Start drawing the graph by creating a relevant, labeled and scaled axis (either vertical or horizontal). Based on the five-number summary then draw a box that extends from the first quartile to the third quartile. This indicates the range of the central 50% of the data. Add a central line to the box that shows the median in the middle of the box.

After this draw lines (or whiskers) to either side of the box to show the minimum and maximum values, excluding any outliers. Finally, plot any outliers beyond the normal ranges with dots/points.

 

Ebook - Data Portal: the essential solution to maximize impact for data leaders

Learn more
What is the difference between a data product and a data asset? Data Trends
What is the difference between a data product and a data asset?

Data products and data assets both aim to make data usable and valuable. What are the differences between the two and how do you incorporate them into your data strategy?

The central role of data in delivering the Paris 2024 Olympic and Paralympic Games Company news
The central role of data in delivering the Paris 2024 Olympic and Paralympic Games

As we get closer to the start of the world's biggest sporting event, we look at the role of data in preparing for the Paris 2024 Olympic and Paralympic Games, which start on July 26th 2024.

How data visualization solutions can increase your data sharing Data intelligence & reporting
How data visualization solutions can increase your data sharing

When it comes to data visualization, what solutions are available to enterprises and public sector organizations? Do they really allow us to leverage data and make data-driven decisions and build truly data-centric organizations?

Start creating the best data experiences