Glossary

Data mining

Data mining is the analysis of huge volumes of data to find hidden patterns, anomalies, or correlations, predicting future trends and opportunities.

What is data mining?

Data mining is the analysis of huge volumes of data to find hidden patterns, anomalies, or correlations, predicting future trends and opportunities. Often involving millions of records, from multiple data sources, data mining can be carried out through software under human control, or be completely automated using artificial intelligence and machine learning.

Data mining differs from traditional data analysis as it uncovers hidden patterns in data, rather than necessarily answering set questions. These examples show the difference:

Data analysis question: What were my sales for last month?
Data mining question: What products do customers buy most often when they buy product X?

The term data mining began to be used in the 1990s – it is also referred to as:

Knowledge Discovery in Databases (KDD), particularly by the AI/machine learning community
Data dredging
Data fishing
Data snooping, particularly when involving personal information
Why is data mining important?

While data mining is not new, four key factors now make it vital:

Businesses now have access to enormous volumes of data, from an increasing range of internal and external sources. Finding value from this mass of information is difficult due to the noise and complexity of all the data available to an organization.
Competition is increasing in different markets, with digital-first companies entering many sectors. Successfully mining data is crucial for traditional businesses to beat these rivals.
Advances in computer processing power make it easier and faster to mine data in a timely, effective manner
Techniques such as artificial intelligence and machine learning enable organizations to deploy data mining through models that predict future events and scenarios, delivering unparalleled agility and foresight.

What are the uses and benefits of data mining?

Data mining has a range of uses across different industry sectors, including:

Understanding patterns in data to improve operational processes, lowering costs
Optimizing prices in areas as diverse as retail and insurance
Supporting better decision-making, either by humans or AI algorithms
Predicting customer and market behavior, enabling future activities and decisions to be optimized
Offering the right products/services to particular customer segments through personalization and recommendations
Predicting supply chain needs (such as how much of a product needs to be ordered/manufactured), avoiding inventory shortages or gluts
Predicting failures in manufacturing equipment, enabling preventative maintenance
Reducing risk by identifying fraud/compliance risks, particularly in financial services
Delivering better customer service through a more complete understanding of the entire customer journey
Organizing product display in retail stores, by understanding which products are often bought together

What is the data mining process?

Types of data mining

Data mining is organized into two main types:

Predictive data mining – analysis to predict future events/outcomes
Descriptive data mining – analysis to demonstrate existing patterns in historical data.

Major stages inside data mining

Essentially there are three steps within the data mining process:

Pre-processing – data is collected (such as within a data mart or data warehouse) and cleaned to ensure data quality standards are met.
Data mining – the actual step of analyzing data, which using techniques such as:
- Anomaly detection: Identifying anomalies in the data to be checked/investigated
- Association rule learning (dependency modeling): Searching for relationships between variables.
- Clustering: Discovering new similarities between groups in the data.
- Classification: Assigning structures and categories to new data.
- Regression: Finding relationships within data sets by identifying and analyzing the relationship among variables.
- Summarization: Delivering a more compact representation of the data set, including visualization and report generation.
Results validation – verifying that data mining results, particularly those provided by AI algorithms are accurate and can be applied on a wider scale.

Models for data mining

These steps are described in a number of models such as:

The Knowledge Discovery in Databases (KDD) process:

Selection
Pre-processing
Transformation
Data mining
Interpretation/evaluation.

The Cross-industry standard process for data mining (CRISP-DM) process:

Business understanding
Data understanding
Data preparation
Modeling
Evaluation
Deployment

What are the challenges to effective data mining?

Organizations looking to effectively deploy data mining need to overcome five key obstacles:

Privacy and ethics: Consumers and regulators are increasingly focused on ensuring the privacy of personal information. All data that is mined needs to be compliant with regulations such as GDPR and CCPA. It also needs to be used ethically, treating consumers and citizens fairly and with respect.
Skills: Data mining is a complex discipline, and requires skilled data scientists to run the process. These skills are often in short supply, pushing up costs.
Artificial intelligence: Handing over data mining decisions to artificial intelligence algorithms can lead to unforeseen consequences due to poor training and a lack of oversight, creating legal and reputational risks.
Complexity: With multiple, large sources of data involved, the entire data mining process is extremely complex. There are no guarantees that findings will be accurate, due to poor data quality or issues with underlying information.
Technology: Collecting, storing and analyzing data requires a full technology infrastructure, from tools to storage facilities. This can be expensive to set up and maintain.

Learn more

Blog

New product: obtain reliable, up-to-date data with the real-time collaborative data collection form

Ebook

Data Portal: The essential solution to maximize impact for data leaders

Blog

Increasing data collection and driving collaboration through built-in forms

Data doesn’t just come from business systems and data producers - it can equally be provided by users and data consumers, widening the range of assets on your data marketplace. We explain how collecting data through integrated forms increases engagement, democratizes collection and strengthens your data community.

Start creating the best data experiences

Request a demo