The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP 790-90 Seminar BCB 713 Module Spring 2011.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Data Mining Classification: Alternative Techniques
Bayesian Classification
Ch5 Stochastic Methods Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011.
Machine Learning Classification Methods
Classification and Prediction.  What is classification? What is prediction?  Issues regarding classification and prediction  Classification by decision.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Bayesian classifiers.
Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska Hiroo Kusaba.
Classification & Prediction
Business Systems Intelligence: 5. Classification 1
Bayesian Classification and Bayesian Networks
Chapter 7 Classification and Prediction
Classification and Regression. Classification and regression  What is classification? What is regression?  Issues regarding classification and regression.
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Classification and Regression. Classification and regression  What is classification? What is regression?  Issues regarding classification and regression.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Bayes Classification.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
浙江大学本科生《数据挖掘导论》课件 第4课 数据分类和预测 徐从富,副教授 浙江大学人工智能研究所.
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Rule Generation [Chapter ]
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Simple Bayesian Classifier
Bayesian Networks. Male brain wiring Female brain wiring.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Bayesian Networks Martin Bachler MLA - VO
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Bayesian Classifier. 2 Review: Decision Tree Age? Student? Credit? fair excellent >40 31…40
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Bayesian Classification Using P-tree  Classification –Classification is a process of predicting an – unknown attribute-value in a relation –Given a relation,
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 6 Bayesian Learning
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Bayesian Classification
Classification And Bayesian Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Classification & Prediction — Continue—. Overfitting in decision trees Small training set, noise, missing values Error rate decreases as training set.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Classification Today: Basic Problem Decision Trees.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Data Mining: Concepts and Techniques
CS 9633 Machine Learning Support Vector Machines
Read R&N Ch Next lecture: Read R&N
Bayesian Classification
Data Mining Lecture 11.
Read R&N Ch Next lecture: Read R&N
Pattern Recognition and Image Analysis
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Machine Learning: Lecture 6
CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu
Machine Learning: UNIT-3 CHAPTER-1
CS 685: Special Topics in Data Mining Spring 2009 Jinze Liu
Intro. to Data Mining Chapter 6. Bayesian.
Presentation transcript:

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011

COMP Data Mining: Concepts, Algorithms, and Applications 2 Bayesian Classification: Why? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data. Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured

COMP Data Mining: Concepts, Algorithms, and Applications 3 Bayesian Theorem: Basics Let X be a data sample whose class label is unknown Let H be a hypothesis that X belongs to class C For classification problems, determine P(H/X): the probability that the hypothesis holds given the observed data sample X P(H): prior probability of hypothesis H (i.e. the initial probability before we observe any data, reflects the background knowledge) P(X): probability that sample data is observed P(X|H) : probability of observing the sample X, given that the hypothesis holds

COMP Data Mining: Concepts, Algorithms, and Applications 4 Bayesian Theorem Given training data X, posteriori probability of a hypothesis H, P(H|X) follows the Bayes theorem Informally, this can be written as posterior =likelihood x prior / evidence MAP (maximum posteriori) hypothesis Practical difficulty: require initial knowledge of many probabilities, significant computational cost

COMP Data Mining: Concepts, Algorithms, and Applications 5 Naïve Bayes Classifier A simplified assumption: attributes are conditionally independent: The product of occurrence of say 2 elements x 1 and x 2, given the current class is C, is the product of the probabilities of each element taken separately, given the same class P([y 1,y 2 ],C) = P(y 1,C) * P(y 2,C) No dependence relation between attributes Greatly reduces the computation cost, only count the class distribution. Once the probability P(X|C i ) is known, assign X to the class with maximum P(X|C i )*P(C i )

COMP Data Mining: Concepts, Algorithms, and Applications 6 Training dataset Class: C1:buys_computer= ‘yes’ C2:buys_computer= ‘no’ Data sample X =(age<=30, Income=medium, Student=yes Credit_rating= Fair)

COMP Data Mining: Concepts, Algorithms, and Applications 7 Naïve Bayesian Classifier: Example Compute P(X/Ci) for each class P(age=“<30” | buys_computer=“yes”) = 2/9=0.222 P(age=“<30” | buys_computer=“no”) = 3/5 =0.6 P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444 P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4 P(student=“yes” | buys_computer=“yes”)= 6/9 =0.667 P(student=“yes” | buys_computer=“no”)= 1/5=0.2 P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667 P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4 X=(age<=30,income =medium, student=yes,credit_rating=fair) P(X|Ci) : P(X|buys_computer=“yes”)= x x x =0.044 P(X|buys_computer=“no”)= 0.6 x 0.4 x 0.2 x 0.4 =0.019 P(X|Ci)*P(Ci ) : P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.028 P(X|buys_computer=“no”) * P(buys_computer=“no”)=0.007 X belongs to class “buys_computer=yes”

COMP Data Mining: Concepts, Algorithms, and Applications 8 Naïve Bayesian Classifier: Comments Advantages : Easy to implement Good results obtained in most of the cases Disadvantages Assumption: class conditional independence, therefore loss of accuracy Practically, dependencies exist among variables E.g., hospitals: patients: Profile: age, family history etc Symptoms: fever, cough etc., Disease: lung cancer, diabetes etc Dependencies among these cannot be modeled by Naïve Bayesian Classifier How to deal with these dependencies? Bayesian Belief Networks

COMP Data Mining: Concepts, Algorithms, and Applications 9 Bayesian Networks Bayesian belief network allows a subset of the variables conditionally independent A graphical model of causal relationships Represents dependency among the variables Gives a specification of joint probability distribution X Y Z P  Nodes: random variables  Links: dependency  X,Y are the parents of Z, and Y is the parent of P  No dependency between Z and P  Has no loops or cycles

COMP Data Mining: Concepts, Algorithms, and Applications 10 Bayesian Belief Network: An Example Family History LungCancer PositiveXRay Smoker Emphysema Dyspnea LC ~LC (FH, S) (FH, ~S)(~FH, S)(~FH, ~S) Bayesian Belief Networks The conditional probability table for the variable LungCancer: Shows the conditional probability for each possible combination of its parents

COMP Data Mining: Concepts, Algorithms, and Applications 11 Learning Bayesian Networks Several cases Given both the network structure and all variables observable: learn only the CPTs Network structure known, some hidden variables: method of gradient descent, analogous to neural network learning Network structure unknown, all variables observable: search through the model space to reconstruct graph topology Unknown structure, all hidden variables: no good algorithms known for this purpose D. Heckerman, Bayesian networks for data mining

COMP Data Mining: Concepts, Algorithms, and Applications 12 SVM – Support Vector Machines Support Vectors Small Margin Large Margin

COMP Data Mining: Concepts, Algorithms, and Applications 13 SVM – Cont. Linear Support Vector Machine Given a set of points with label The SVM finds a hyperplane defined by the pair (w,b) (where w is the normal to the plane and b is the distance from the origin) s.t. x – feature vector, b- bias, y- class label, 2/||w|| - margin

COMP Data Mining: Concepts, Algorithms, and Applications 14 SVM – Cont.

COMP Data Mining: Concepts, Algorithms, and Applications 15 SVM – Cont. What if the data is not linearly separable? Project the data to high dimensional space where it is linearly separable and then we can use linear SVM – (Using Kernels) (1,0) (0,0) (0,1) + +-

COMP Data Mining: Concepts, Algorithms, and Applications 16 Non-Linear SVM Classification using SVM ( w,b ) In non linear case we can see this as Kernel – Can be thought of as doing dot product in some high dimensional space

COMP Data Mining: Concepts, Algorithms, and Applications 17 Example of Non-linear SVM

COMP Data Mining: Concepts, Algorithms, and Applications 18 Results

COMP Data Mining: Concepts, Algorithms, and Applications 19 SVM Related Links C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Knowledge Discovery and Data Mining, 2(2), 1998.A Tutorial on Support Vector Machines for Pattern Recognition SVM light – Software (in C) BOOK: An Introduction to Support Vector Machines N. Cristianini and J. Shawe-Taylor Cambridge University Press