Introduction to Data Mining

Data mining or data mining is the nontrivial process to discover valid patterns, new, potentially useful and understandable within a data set, as defined by Piatetsky-Shapiro published in the journal "AI Magazine".

For simplicity, we could say that data mining is to extract knowledge from data.

Through a series of processes applied in different phases of the raw data, and defined by an expert who knows the meaning of these data, and have clear objectives pursued, can extract relationships between these data, find hidden patterns and build models to describe this knowledge.

The phases should pass this knowledge discovery process are:

- Definition of data mining task.What are the goals?

- Selection of data

- Preparing data

- Application of data mining processes on information prepared

- Evaluation and interpretation of the model obtained

- Integration of results in information systems

Is a continuous process, and may consist of several iterations, where the results of an iteration feed the start of the next.

Of course, for the realization of the process there are several specialized tools that facilitate or enable the passage through all stages. Two of the best known are SAS Enterprise Miner and SPSS Clementine.There is also a project of free software such as WEKA, developed at the University of Waikato, which enables data mining processes.