Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

Similar presentations


Presentation on theme: "University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad."— Presentation transcript:

1 University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad

2 University of Toronto 8/30/20152 Agenda  Explosion of data  Introduction to data mining  Examples of data mining in science and engineering  Challenges and opportunities

3 University of Toronto 8/30/20153 Explosion of Data  Data in the world doubles every 20 months!  NASA’s Earth Orbiting System: 46 megabytes of data per second 4,000,000,000,000 bytes a day  FBI fingerprints image library: 200,000,000,000,000 bytes  In-line image analysis for particle detection: 1 megabyte in one second

4 University of Toronto 8/30/20154 Explosion of Data (cont.)

5 University of Toronto 8/30/20155 Explosion of Data (cont.)

6 University of Toronto 8/30/20156 Explosion of Data (cont.)

7 University of Toronto 8/30/20157 Explosion of Data (cont.)

8 University of Toronto 8/30/20158 Fast, accurate, and scalable data analysis techniques to extract useful knowledge: The answer is Data Mining. What we need?

9 University of Toronto 8/30/20159 What is Data Mining? “Data Mining is the exploration and analysis of large or small quantities of data in order to discover meaningful patterns, trends and rules.” Data Knowledge Data Mining

10 University of Toronto 8/30/201510 AI, Machine Learning Statistics Data Mining Database Data AnalysisData Warehouse OLAP

11 University of Toronto 8/30/201511 Data Mining Data AnalysisDatabase StatisticsMachine LearningData WarehouseOLAP

12 University of Toronto 8/30/201512 Text FilesRelational Database Multi- dimensional Database EntitiesFileTableCube AttributesRow and Col Record, Field, Index Dimension, Level, Measurement MethodsRead, Write Select, Insert, Update, Delete Drill down, Drill up, Drill through Language-SQLMDX Database

13 University of Toronto 8/30/201513 Data Analysis  Classification  Regression  Clustering  Association  Sequence Analysis

14 University of Toronto 8/30/201514 Data Analysis X1X1 X2X2 Y2Y2 Output Variables or Targets Y1Y1 Numeric Categorical Numeric Categorical Regression (0,1) Classification (good, bad) age, income, … gender, occupation, … Linear Models or Decision Trees Input Variables or Attributes Model W1W1 W2W2

15 University of Toronto 8/30/201515 Data Analysis (cont.) Age Income Clustering 1, chips, coke, chocolate 2, gum, chips 3, chips, coke 4, … Probability (chips, coke) ? Association Sequence Analysis …ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA… X t-1 XtXt T

16 University of Toronto 8/30/201516 Data Mining in Research Life Cycle  Questions  Needs Search Research Experiment Modeling Report Library Data Database Data Analysis

17 University of Toronto 8/30/201517 Data Mining – Modeling Steps 1.Problem Definition 2.Data Preparation 3.Exploration 4.Modeling 5.Evaluation 6.Deployment

18 University of Toronto 8/30/201518 Agenda  Explosion of data  Introduction to data mining  Examples of data mining in science and engineering  Challenges and opportunities

19 University of Toronto 8/30/201519 Examples of data mining in science & engineering 1. Data mining in Biomedical Engineering “Robotic Arm Control Using Data Mining Techniques” 2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing”

20 University of Toronto 8/30/201520 1. Problem Definition “Control a robotic arm by means of EMG signals from biceps and triceps muscles.” Supination Pronation Flexion Extension Muscle Contraction BicepsTriceps Supination HH Pronation LL Flexion HL Extension LH

21 University of Toronto 8/30/201521 2. Data Preparation  The dataset includes 80 records.  There are two input variables; biceps signal and triceps signal.  One output variable, with four possible values; Supination, Pronation, Flexion and Extension.

22 University of Toronto 8/30/201522 3. Exploration Triceps Record# Scatter Plot Flexion Extension Supination Pronation

23 University of Toronto 8/30/201523 3. Exploration (cont.) Biceps Record# Scatter Plot Flexion Extension Supination Pronation

24 University of Toronto 8/30/201524 5. Modeling  Classification  OneR  Decision Tree  Naïve Bayesian  K-Nearest Neighbors  Neural Networks  Linear Discriminant Analysis  Support Vector Machines  …

25 University of Toronto 8/30/201525 6. Model Deployment A neural network model was successfully implemented inside the robotic arm.

26 University of Toronto 8/30/201526 Examples of data mining in science & engineering 1. Data mining in Biomedical Engineering “Robotic Arm Control Using Data Mining Techniques” 2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing”

27 University of Toronto 8/30/201527 Plastics Extrusion Plastic pellets Plastic melt

28 University of Toronto 8/30/201528 Film Extrusion Extruder Plastic Film Defect due to particle contaminant

29 University of Toronto 8/30/201529 In-Line Monitoring Transition Piece Window Ports

30 University of Toronto 8/30/201530 In-Line Monitoring Light Source Extruder and Interface Optical Assembly Imaging Computer Light

31 University of Toronto 8/30/201531 Melt Without Contaminant Particles (WO)

32 University of Toronto 8/30/201532 Melt With Contaminant Particles (WP)

33 University of Toronto 8/30/201533 1. Problem Definition Classify images into those with particles (WP) and those without particles (WO). WOWP

34 University of Toronto 8/30/201534 2. Data Preparation  2000 Images  54 Input variables all numeric  One output variables with two possible values -With Particle -Without Particle

35 University of Toronto 8/30/201535 2. Data Preparation (cont.)  Pre-processed images to remove noise  Dataset 1 with sharp images: 1350 images including 1257 without particles and 91 with particles  Dataset 2 with sharp and blurry images: 2000 images including 1909 without particles and blurry particles and 91 with particles  54 Input variables, all numeric  One output variable, with two possible values (WP and WO)

36 University of Toronto 8/30/201536 3. Exploration Demo!

37 University of Toronto 8/30/201537 4. Modeling Classification: OneR Decision Tree 3-Nearest Neighbors Naïve Bayesian

38 University of Toronto 8/30/201538 5. Evaluation DatasetAttrib.ClassOne-RC4.53.N.NBayes Sharp Images 54299.999.8 95.8 Sharp + Blurry Images 54298.597.8 93.3 Sharp + Blurry Images 54387 8479 10 -fold cross-validation If pixel_density_max < 142 then WP

39 University of Toronto 8/30/201539 6. Deploy model  A Visual Basic program will be developed to implement the model.

40 University of Toronto 8/30/201540 Agenda  Explosion of data  Introduction to data mining  Examples of data mining in science & engineering  Challenges and opportunities

41 University of Toronto 8/30/201541 Challenges and Opportunities  Data mining is a ‘top ten’ emerging technology.  High pay job! in the financial, medical and engineering.  Faster, more accurate and more scalable techniques.  Incremental, on-line and real-time learning algorithms.  Parallel and distributed data processing techniques.

42 University of Toronto 8/30/201542 Data mining is an exciting and challenging field with the ability to solve many complex scientific and business problems. You can be part of the solution!


Download ppt "University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad."

Similar presentations


Ads by Google