Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.

Similar presentations


Presentation on theme: "Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu."— Presentation transcript:

1 Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu

2  Definition  General Concept  Foundations  Evolution  Applications  Challenges  Algorithms  Classical  Next Generations Introduction to Data Mining

3 What is Data Mining?  Data mining is the process for the non-trivial extraction of implicit, previously unknown and potentially useful information from data stored in repositories using pattern recognition technologies as well as statistical and mathematical methods. Introduction to Data Mining

4 Foundations  Massive data collection  Powerful multiprocessor computers  Data mining algorithms

5 Introduction to Data Mining Evolution

6 Introduction to Data Mining Applications  Industry  Retails  Health maintenance group  Telecommunications  Credit card  Web mining  Sports and entertainment solutions

7 Introduction to Data Mining Challenges  Ability to handle different types of data  Graceful degeneration of data mining algorithms  Valuable data mining results  Representation of data mining requests and results  Mining at different abstraction levels  Mining information from different sources of data  Protection of privacy and data security

8 Introduction to Data Mining Hierarchy of Choices and Decisions  Business goal  Collecting, cleaning and preparing data  Prediction  Model type and algorithms

9 Introduction to Data Mining Data Description  Descriptions of data characteristics in elementary and aggregated form  Summarization  Visualization

10 Introduction to Data Mining Predictive Data Mining  Predictive modeling is a term used to describe the process of mathematically or mentally representing a phenomenon or occurrence with a series of equations or relationships.

11 Introduction to Data Mining Prediction: Classification  Classification predicts class membership  Pre-classify (using classification algorithms)  Test to determine the quality of the model  Predict (using effective classifier)

12 Introduction to Data Mining Prediction: Regression  Regression takes a numerical dataset and develops a mathematical formula that fits the data.  When you're ready to use the results to predict future behavior, you simply take your new data, plug it into the developed formula and you get a prediction!

13 Introduction to Data Mining Algorithms  Classical Techniques  Statistics  Neighborhoods  Clustering  Next Generations  Decision Tree  Neural Network  Rule Induction

14 Introduction to Data Mining Statistics  Classical Statistics:  Related to the collection and description of data  Believes: there exists an underlying pattern of data distribution  Objective: find the best guess  Data Mining:  Employs statistical methods  Needs to analyze huge amounts of data  Beyond traditional statistics

15 Introduction to Data Mining Neighborhoods  Basic idea:  For a new problem, look for the similar problems (neighborhoods) that have been solved  Key point: find the neighborhood  Calculate the distance: how far is good to be considered as a neighbor?  Which class the new problem belong to?  Large computational load:  New calculation for each new case

16 Introduction to Data Mining Clustering  Elements grouped together according to different characteristics  Every cluster share same values (homogenous)  Problem: Control the number of cluster  Hierarchical clustering: flexibility  Non-hierarchical clustering: given by user  Used most frequently for:  Consolidating data into a high-level of view  Group records into likely behaviors

17 Introduction to Data Mining Decision Tree  A way of representing a series of rules that lead to a class or value  Structure:  Decision node, branches, leaves  Example: A loan officer wants to determine the credit of applicants

18 Introduction to Data Mining Decision Tree (continued)  Help to induce the tree and its rules to make predictions

19 Introduction to Data Mining Neural Networks  Efficiently modeling large and complex problems with hundreds of predictor variables  Structure:  Input layer, hidden layer, output layer  Activation function between nodes  Requires training and testing of relations

20 Introduction to Data Mining Neural Networks (continued)  Example:

21 Introduction to Data Mining Rule Induction  A method to derive a set of rules to classify cases  For example, rule induction can be used to discover patterns relating decisions (e.g., credit card application)  Rules may not cover all possible situations

22 Introduction to Data Mining


Download ppt "Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu."

Similar presentations


Ads by Google