[NEW REPORT] The State of European Energy Data Maturity - In-depth research with E.DSO and GEODE

Download here
Glossary

Data mining

Data mining is the analysis of huge volumes of data to find hidden patterns, anomalies, or correlations, predicting future trends and opportunities.

What is data mining?

Data mining is the analysis of huge volumes of data to find hidden patterns, anomalies, or correlations, predicting future trends and opportunities. Often involving millions of records, from multiple data sources, data mining can be carried out through software under human control, or be completely automated using artificial intelligence and machine learning.

Data mining differs from traditional data analysis as it uncovers hidden patterns in data, rather than necessarily answering set questions. These examples show the difference:

  • Data analysis question: What were my sales for last month?
  • Data mining question: What products do customers buy most often when they buy product X?

The term data mining began to be used in the 1990s – it is also referred to as:

  • Knowledge Discovery in Databases (KDD), particularly by the AI/machine learning community
  • Data dredging
  • Data fishing
  • Data snooping, particularly when involving personal information
  • Why is data mining important?

While data mining is not new, four key factors now make it vital:

  1. Businesses now have access to enormous volumes of data, from an increasing range of internal and external sources. Finding value from this mass of information is difficult due to the noise and complexity of all the data available to an organization.
  2. Competition is increasing in different markets, with digital-first companies entering many sectors. Successfully mining data is crucial for traditional businesses to beat these rivals.
  3. Advances in computer processing power make it easier and faster to mine data in a timely, effective manner
  4. Techniques such as artificial intelligence and machine learning enable organizations to deploy data mining through models that predict future events and scenarios, delivering unparalleled agility and foresight.

What are the uses and benefits of data mining?

Data mining has a range of uses across different industry sectors, including:

  • Understanding patterns in data to improve operational processes, lowering costs
  • Optimizing prices in areas as diverse as retail and insurance
  • Supporting better decision-making, either by humans or AI algorithms
  • Predicting customer and market behavior, enabling future activities and decisions to be optimized
  • Offering the right products/services to particular customer segments through personalization and recommendations
  • Predicting supply chain needs (such as how much of a product needs to be ordered/manufactured), avoiding inventory shortages or gluts
  • Predicting failures in manufacturing equipment, enabling preventative maintenance
  • Reducing risk by identifying fraud/compliance risks, particularly in financial services
  • Delivering better customer service through a more complete understanding of the entire customer journey
  • Organizing product display in retail stores, by understanding which products are often bought together

What is the data mining process?

Types of data mining

Data mining is organized into two main types:

  • Predictive data mining – analysis to predict future events/outcomes
  • Descriptive data mining – analysis to demonstrate existing patterns in historical data.

Major stages inside data mining

Essentially there are three steps within the data mining process:

  • Pre-processing – data is collected (such as within a data mart or data warehouse) and cleaned to ensure data quality standards are met.
  • Data mining – the actual step of analyzing data, which using techniques such as:
    • Anomaly detection: Identifying anomalies in the data to be checked/investigated
    • Association rule learning (dependency modeling): Searching for relationships between variables.
    • Clustering: Discovering new similarities between groups in the data.
    • Classification: Assigning structures and categories to new data.
    • Regression: Finding relationships within data sets by identifying and analyzing the relationship among variables.
    • Summarization: Delivering a more compact representation of the data set, including visualization and report generation.
  • Results validation – verifying that data mining results, particularly those provided by AI algorithms are accurate and can be applied on a wider scale.

Models for data mining

These steps are described in a number of models such as:

The Knowledge Discovery in Databases (KDD) process:

  • Selection
  • Pre-processing
  • Transformation
  • Data mining
  • Interpretation/evaluation.

The Cross-industry standard process for data mining (CRISP-DM) process:

  • Business understanding
  • Data understanding
  • Data preparation
  • Modeling
  • Evaluation
  • Deployment

What are the challenges to effective data mining?

Organizations looking to effectively deploy data mining need to overcome five key obstacles:

  • Privacy and ethics: Consumers and regulators are increasingly focused on ensuring the privacy of personal information. All data that is mined needs to be compliant with regulations such as GDPR and CCPA. It also needs to be used ethically, treating consumers and citizens fairly and with respect.
  • Skills: Data mining is a complex discipline, and requires skilled data scientists to run the process. These skills are often in short supply, pushing up costs.
  • Artificial intelligence: Handing over data mining decisions to artificial intelligence algorithms can lead to unforeseen consequences due to poor training and a lack of oversight, creating legal and reputational risks.
  • Complexity: With multiple, large sources of data involved, the entire data mining process is extremely complex. There are no guarantees that findings will be accurate, due to poor data quality or issues with underlying information.
  • Technology: Collecting, storing and analyzing data requires a full technology infrastructure, from tools to storage facilities. This can be expensive to set up and maintain.
  • Download the ebook making data widely accessible and usable
Learn more
New product: obtain reliable, up-to-date data with the real-time collaborative data collection form Product
New product: obtain reliable, up-to-date data with the real-time collaborative data collection form

To improve the reliability and quality of the data in your data portals, Opendatasoft offers new data collection form functionality that automates the enrichment and updating of information, saving you precious time.

Data Portal: The essential solution to maximize impact for data leaders Ebook
Data Portal: The essential solution to maximize impact for data leaders

All organizations understand the vital importance of data to success. In a world full of data, easy and rapid access to the right datasets, in the right format, at the right time is crucial to decision-making, efficiency, collaboration, innovation and transparency. It decreases costs, builds new revenue streams, and mitigates risk. This ebook provides a comprehensive introduction to data portals at both a strategic and tactical level. It aims to help you embrace data democratization and unlock the value of your data.

Opendatasoft’s data lineage feature now enriched to enhance data usage analysis Product
Opendatasoft’s data lineage feature now enriched to enhance data usage analysis

Discover how updated data lineage provides a detailed understanding of the relationships and dependencies between objects in your portal, thanks to the new "field" level view.

New product: obtain reliable, up-to-date data with the real-time collaborative data collection form Product
New product: obtain reliable, up-to-date data with the real-time collaborative data collection form

To improve the reliability and quality of the data in your data portals, Opendatasoft offers new data collection form functionality that automates the enrichment and updating of information, saving you precious time.

Data Portal: The essential solution to maximize impact for data leaders Ebook
Data Portal: The essential solution to maximize impact for data leaders

All organizations understand the vital importance of data to success. In a world full of data, easy and rapid access to the right datasets, in the right format, at the right time is crucial to decision-making, efficiency, collaboration, innovation and transparency. It decreases costs, builds new revenue streams, and mitigates risk. This ebook provides a comprehensive introduction to data portals at both a strategic and tactical level. It aims to help you embrace data democratization and unlock the value of your data.

Opendatasoft’s data lineage feature now enriched to enhance data usage analysis Product
Opendatasoft’s data lineage feature now enriched to enhance data usage analysis

Discover how updated data lineage provides a detailed understanding of the relationships and dependencies between objects in your portal, thanks to the new "field" level view.

Start creating the best data experiences
Request a demo