10 steps to become a smart city using open data

Download the ebook
Glossary

Data mining

Data mining is the analysis of huge volumes of data to find hidden patterns, anomalies, or correlations, predicting future trends and opportunities.

What is data mining?

Data mining is the analysis of huge volumes of data to find hidden patterns, anomalies, or correlations, predicting future trends and opportunities. Often involving millions of records, from multiple data sources, data mining can be carried out through software under human control, or be completely automated using artificial intelligence and machine learning.

Data mining differs from traditional data analysis as it uncovers hidden patterns in data, rather than necessarily answering set questions. These examples show the difference:

  • Data analysis question: What were my sales for last month?
  • Data mining question: What products do customers buy most often when they buy product X?

The term data mining began to be used in the 1990s – it is also referred to as:

  • Knowledge Discovery in Databases (KDD), particularly by the AI/machine learning community
  • Data dredging
  • Data fishing
  • Data snooping, particularly when involving personal information
  • Why is data mining important?

While data mining is not new, four key factors now make it vital:

  1. Businesses now have access to enormous volumes of data, from an increasing range of internal and external sources. Finding value from this mass of information is difficult due to the noise and complexity of all the data available to an organization.
  2. Competition is increasing in different markets, with digital-first companies entering many sectors. Successfully mining data is crucial for traditional businesses to beat these rivals.
  3. Advances in computer processing power make it easier and faster to mine data in a timely, effective manner
  4. Techniques such as artificial intelligence and machine learning enable organizations to deploy data mining through models that predict future events and scenarios, delivering unparalleled agility and foresight.

What are the uses and benefits of data mining?

Data mining has a range of uses across different industry sectors, including:

  • Understanding patterns in data to improve operational processes, lowering costs
  • Optimizing prices in areas as diverse as retail and insurance
  • Supporting better decision-making, either by humans or AI algorithms
  • Predicting customer and market behavior, enabling future activities and decisions to be optimized
  • Offering the right products/services to particular customer segments through personalization and recommendations
  • Predicting supply chain needs (such as how much of a product needs to be ordered/manufactured), avoiding inventory shortages or gluts
  • Predicting failures in manufacturing equipment, enabling preventative maintenance
  • Reducing risk by identifying fraud/compliance risks, particularly in financial services
  • Delivering better customer service through a more complete understanding of the entire customer journey
  • Organizing product display in retail stores, by understanding which products are often bought together

What is the data mining process?

Types of data mining

Data mining is organized into two main types:

  • Predictive data mining – analysis to predict future events/outcomes
  • Descriptive data mining – analysis to demonstrate existing patterns in historical data.

Major stages inside data mining

Essentially there are three steps within the data mining process:

  • Pre-processing – data is collected (such as within a data mart or data warehouse) and cleaned to ensure data quality standards are met.
  • Data mining – the actual step of analyzing data, which using techniques such as:
    • Anomaly detection: Identifying anomalies in the data to be checked/investigated
    • Association rule learning (dependency modeling): Searching for relationships between variables.
    • Clustering: Discovering new similarities between groups in the data.
    • Classification: Assigning structures and categories to new data.
    • Regression:¬†Finding relationships within data sets by identifying and analyzing the relationship among variables.
    • Summarization: Delivering a more compact representation of the data set, including visualization and report generation.
  • Results validation – verifying that data mining results, particularly those provided by AI algorithms are accurate and can be applied on a wider scale.

Models for data mining

These steps are described in a number of models such as:

The Knowledge Discovery in Databases (KDD) process:

  • Selection
  • Pre-processing
  • Transformation
  • Data mining
  • Interpretation/evaluation.

The Cross-industry standard process for data mining (CRISP-DM) process:

  • Business understanding
  • Data understanding
  • Data preparation
  • Modeling
  • Evaluation
  • Deployment

What are the challenges to effective data mining?

Organizations looking to effectively deploy data mining need to overcome five key obstacles:

  • Privacy and ethics: Consumers and regulators are increasingly focused on ensuring the privacy of personal information. All data that is mined needs to be compliant with regulations such as GDPR and CCPA. It also needs to be used ethically, treating consumers and citizens fairly and with respect.
  • Skills: Data mining is a complex discipline, and requires skilled data scientists to run the process. These skills are often in short supply, pushing up costs.
  • Artificial intelligence: Handing over data mining decisions to artificial intelligence algorithms can lead to unforeseen consequences due to poor training and a lack of oversight, creating legal and reputational risks.
  • Complexity: With multiple, large sources of data involved, the entire data mining process is extremely complex. There are no guarantees that findings will be accurate, due to poor data quality or issues with underlying information.
  • Technology: Collecting, storing and analyzing data requires a full technology infrastructure, from tools to storage facilities. This can be expensive to set up and maintain.
  • Download the ebook making data widely accessible and usable

Learn more

Business Intelligence (BI) tools:  Moving from theory to practice Digital transformation
Business Intelligence (BI) tools: Moving from theory to practice

By making it easy to analyze data and share the results, business intelligence tools promise improved performance and better quality decision-making, whilst promoting transparency and innovation. How...

5 tips for deploying a data mesh approach in your organization Data Trends
5 tips for deploying a data mesh approach in your organization

What are the best practices around deploying the data mesh approach in your organization?

How do you structure your data team for success? Digital transformation
How do you structure your data team for success?

Ensuring data flows effectively around your organization means adopting the right structure for your data team. What are the options and when should you pick a centralized, rather than decentralized a...

Business Intelligence (BI) tools:  Moving from theory to practice Digital transformation
Business Intelligence (BI) tools: Moving from theory to practice

By making it easy to analyze data and share the results, business intelligence tools promise improved performance and better quality decision-making, whilst promoting transparency and innovation. How...

5 tips for deploying a data mesh approach in your organization Data Trends
5 tips for deploying a data mesh approach in your organization

What are the best practices around deploying the data mesh approach in your organization?

How do you structure your data team for success? Digital transformation
How do you structure your data team for success?

Ensuring data flows effectively around your organization means adopting the right structure for your data team. What are the options and when should you pick a centralized, rather than decentralized a...

Start creating the best data experiences

Request a demo