Data mining refers to the process of searching for information hidden in large amounts of data through algorithms. Data mining is usually related to computer science and uses many methods such as statistics, online analytical processing, intelligence retrieval, machine learning, expert systems (relying on past rules of thumb) and pattern recognition to achieve the goal of searching for hidden information in large amounts of data.
Data mining is a hot issue in the field of artificial intelligence and database research. The so-called data mining refers to revealing hidden and previously unknown information from a large amount of data in the database. and potentially valuable information.
Data mining is a decision support process. It is mainly based on artificial intelligence, machine learning, pattern recognition, statistics, databases, visualization technology, etc. It analyzes enterprise data in a highly automated manner and makes inductive inferences. Uncover potential patterns from them to help decision makers adjust market strategies, reduce risks, and make correct decisions.
The knowledge discovery process consists of the following three stages: ① data preparation; ② data mining; ③ result expression and interpretation. Data mining can interact with users or knowledge bases.
Data mining objects
The type of data can be structured, semi-structured, or even heterogeneous. Methods of discovering knowledge can be mathematical, non-mathematical, or inductive. The knowledge finally discovered can be used for information management, query optimization, decision support and maintenance of the data itself. [4]
The object of data mining can be any type of data source. It can be a relational database, which is a data source that contains structured data; it can also be a data warehouse, text, multimedia data, spatial data, time series data, and Web data, which is a data source that contains semi-structured data or even heterogeneous data. . [4]
The method of discovering knowledge can be numerical, non-numeric, or inductive. The knowledge finally discovered can be used for information management, query optimization, decision support and maintenance of the data itself.
Data Mining Steps
Before implementing data mining, first determine what steps to take, what to do at each step, and what goals are necessary to achieve. Only with a good plan can data mining be implemented in an orderly manner and achieve success. Many software vendors and data mining consulting companies provide some data mining process models to guide their users step by step in data mining work. For example, SPSS's 5A and SAS's SEMMA.
The data mining process model steps mainly include defining problems, establishing data mining libraries, analyzing data, preparing data, building models, evaluating models and implementation. Let us take a closer look at the specific content of each step:
(1) Define the problem. The first and most important requirement before starting knowledge discovery is to understand the data and business problem. You must have a clear and clear definition of your goals, that is, decide what you want to do. For example, when you want to improve the utilization rate of your email, you may want to "increase user utilization rate" or you may want to "increase the value of one user use." The models established to solve these two problems are almost completely different. , a decision must be made.
(2) Establish a data mining library. Building a data mining library includes the following steps: data collection, data description, selection, data quality assessment and data cleaning, merging and integration, building metadata, loading the data mining library, and maintaining the data mining library.
(3) Analyze data. The purpose of the analysis is to find the data fields that have the greatest impact on the forecast output and determine whether export fields need to be defined. If the data set contains hundreds or thousands of fields, then browsing and analyzing the data will be a very time-consuming and tiring task. In this case, you need to choose a tool software with a good interface and powerful functions to assist you in completing these tasks. .
(4) Prepare data. This is the last step of data preparation before building the model. This step can be divided into four parts: selecting variables, selecting records, creating new variables, and converting variables.
(5)Build the model. Building a model is an iterative process. Different models need to be carefully examined to determine which model is most useful for the business problem faced. First use a portion of the data to build a model, and then use the remaining data to test and validate the resulting model. Sometimes there is a third data set, called the validation set, because the test set may be affected by the characteristics of the model, and an independent data set is needed to verify the accuracy of the model. Training and testing data mining models requires splitting the data into at least two parts, one for model training and the other for model testing.
(6) Evaluation model. After the model is established, the results obtained must be evaluated and the value of the model explained. The accuracy obtained from the test set is only meaningful for the data used to build the model. In practical applications, it is necessary to further understand the types of errors and the related costs caused by them. Experience has proven that a valid model is not necessarily a correct model. The direct reason for this is the various assumptions implicit in model building, so it is important to test the model directly in the real world. Apply it to a small area first, obtain test data, and then promote it to a large area after you feel satisfied.
(7)Implementation. Once a model is built and validated, it can be used in two main ways. The first is to provide analysts with a reference; the other is to apply this model to different data sets.
For more related knowledge, please visit:PHP Chinese website!
The above is the detailed content of What is data mining?. For more information, please follow other related articles on the PHP Chinese website!