Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003
What is Data Mining Data Mining Process Properties of Data Mining Applications Data Mining Techniques Outline
Data mining is the process of extracting previously unknown, valid, and actionable patterns, knowledge, or high-level information from large databases. What is Data Mining?
Data Mining Process Selection PreprocessingTransformation Mining Interpretation/ Evaluation Data Target Data Preprocessed Data Transformed Data Patterns Knowledge
Properties of Data Mining Applications Business-question-driven process Multiple data mining technique potentially appropriate for a data mining task Hybrid approach for better data mining results Importance of data prospecting (selection) and cleaning (preprocessing) Unavoided knowledge post-processing etc.
Data Mining Techniques Classification –Process that establishes classes with attributes from a set of instances (called training examples) in a database. Clustering Analysis –Process of creating a partition so that all members of each cluster are similar according to some metric (e.g., distance between objects). Association Rule Analysis –Discovery of association rules showing attribute- value conditions that occur frequently together in a given set of data
Data Mining Techniques (Cont’d) Sequential Pattern Analysis –Discovery the sequential occurrence of items across ordered transactions over time. Time-series Similarity Analysis –To find those sequences that are similar to a query sequence Q (called whole matching), or to identify the sequences that contain subsequences similar to Q (called subsequence matching). Link Analysis Text Mining