Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining in Micro array Analysis Classification (Supervised Learning) Finding models (functions) that describe and distinguish classes or concepts for.

Similar presentations


Presentation on theme: "Data Mining in Micro array Analysis Classification (Supervised Learning) Finding models (functions) that describe and distinguish classes or concepts for."— Presentation transcript:

1 Data Mining in Micro array Analysis Classification (Supervised Learning) Finding models (functions) that describe and distinguish classes or concepts for future prediction E.g., predict disease based on gene expression profiles Similar to Prediction: Predict some unknown or missing categorical value rather than a numerical values Presentation: decision-tree, classification rule, neural network Cluster analysis (Unsupervised Learning) Class label is unknown: Group data to form new classes, e.g., cluster genes to find distribution patterns Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity E.g. Group genes based on their gene expression profiles

2 Supervised Classification Unsupervised Clustering known number of classes based on a training set used to classify future observations unknown number of classes no prior knowledge used to understand (explore) data Supervised vs Unsupervised Learning

3 * * * * * * * * * o o o o o o o o o * * o income debt * * * * * * * * * o o o o o o o o o * * o income debt income debt Supervised Learning Unsupervised Learning Supervised vs. Unsupervised Learning

4 Classification Training Set Data with known classes Classification Technique Classifier Data with unknown classes Class Assignment

5 * * * * * * * * * o o o o o o o o o * * o income debt * * * * * * * * * o o o o o o o o o * * o income debt a*income + b*debt No loan ! * * * * * * * * * o o o o o o o o o * * o income debt Linear Classifier: Non Linear Classifier : Types of Classifiers

6 Predictive Modelling: Day OutlookTemperature HumidityWindPlay Tennis 1 SunnyHotHighWeakNo 2SunnyHotHighStrongNo 3OvercastHotHighWeakYes 4RainMildHighWeakYes 5RainCoolNormalWeakYes 6RainCoolNormalStrongNo 7OvercastCoolNormalStrongYes 8SunnyMildHighWeakNo 9SunnyCoolNormalWeakYes 10RainMildNormalWeakYes 11SunnyMild NormalStrongYes 12OvercastMildHighStrongYes 13OvercastHotNormalWeakYes 14RainMildHighStrongNo  Predict categorical class labels  Classify data (construct a model) based on the training set and the values (class labels) in a classifying attribute and  Use it in classifying new data

7 Classification Training Data: Inductive Learning System Classifiers (Derived Hypotheses) Data to be classified Classifier Decision on class assignment Task: determine which of a fixed set of classes an example belongs to Input: training set of examples annotated with class values. Output:induced hypotheses (model/concept description/classifiers) Learning : Induce classifiers from training data Predication : Using Hypothesis for Prediction: classifying any example described in the same manner

8 Day OutlookTemperature HumidityWindPlay Tennis 1 SunnyHotHighWeakNo 2SunnyHotHighStrongNo 3OvercastHotHighWeakYes 4RainMildHighWeakYes 5RainCoolNormalWeakYes 6RainCoolNormalStrongNo 7OvercastCoolNormalStrongYes 8SunnyMildHighWeakNo 9SunnyCoolNormalWeakYes 10RainMildNormalWeakYes 11SunnyMild NormalStrongYes 12OvercastMildHighStrongYes 13OvercastHotNormalWeakYes 14RainMildHighStrongNo Outlook SunnyOvercastRain Humidity Yes Wind HighNormal NoYesNo Yes Strong Weak Decision Tree: Example

9 Classification: Relevant Gene Identification Goal: Identify subset of genes that distinguish between treatments, tissues, etc. Method Collect several samples grouped by treatments (e.g. Diseased vs. Healthy) Use genes as “features” Build a classifier to distinguish treatments

10 Gene Expression Example ID G1G2 G3G4Cancer No No Yes Yes Yes No Yes Yes Yes Yes Yes No Yes No 15…..… Problem: With large number of genes (~10000) Need to use feature selection/reduction techniques G1 >22 G3G4 <=12>12 NoYes No Yes <=52 >52 <=22


Download ppt "Data Mining in Micro array Analysis Classification (Supervised Learning) Finding models (functions) that describe and distinguish classes or concepts for."

Similar presentations


Ads by Google