Science & Technology

Data Mining

Data mining is a process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database system
Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use.


data mining is the analysis step of the “knowledge discovery in databases” process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interesting metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.


The term “data mining” is a misnomer, because the goal is the extraction of patterns and knowledge from large amounts of data, not the extraction (mining) of data itself. It also is a buzzword and is frequently applied to any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) as well as any application of computer decision support system, including artificial intelligence.

Data mining involves six common classes of tasks:

Anomaly detection (outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or data errors that require further investigation.

Association rule learning (dependency modeling) – Searches for relationships between variables. For example, a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.

Clustering – is the task of discovering groups and structures in the data that are in some way or another “similar”, without using known structures in the data.

Classification – is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as “legitimate” or as “spam”.

Regression – attempts to find a function that models the data with the least error that is, for estimating the relationships among data or datasets.

Summarization – providing a more compact representation of the data set, including visualization and report generation.

What’s your Reaction?
+1
0
+1
0
+1
0
+1
0
+1
0
+1
0
+1
0

Related Articles

Leave a Reply

Your email address will not be published.