Presentation is loading. Please wait.

Presentation is loading. Please wait.

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin.

Similar presentations


Presentation on theme: "THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin."— Presentation transcript:

1 THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin L. Zhang Room 3504, phone: 2358-7015, Email: lzhang@cs.ust.hk Home pagelzhang@cs.ust.hkHome page

2 CSIT 5220 L10: Model-Based Classification and Clustering l Probabilistic Models (PMs) for Classification l PMs for Clustering Page 2

3 CSIT 5220 l The problem: n Given data: n Find mapping  (A1, A2, …, An) |- C l Possible solutions n ANN n Decision tree (Quinlan) n…n… n (SVM: Continuous data) Classification

4 CSIT 5220 Probabilistic Approach to Classification

5 CSIT 5220 Page 5 Will Boss Play Tennis?

6 CSIT 5220 Page 6 Will Boss Play Tennis?

7 CSIT 5220 Page 7

8 CSIT 5220 Page 8

9 CSIT 5220 Page 9

10 CSIT 5220 Page 10

11 CSIT 5220 Page 11 l Naïve Bayes model often has good performance in practice l Drawbacks of Naïve Bayes: n Attributes mutually independent given class variable n Often violated, leading to double counting. l Fixes: n General BN classifiers n Tree augmented Naïve Bayes (TAN) models n…n… Bayesian Networks for Classification

12 CSIT 5220 Page 12 l General BN classifier n Treat class variable just as another variable n Learn a BN. n Classify the next instance based on values of variables in the Markov blanket of the class variable. n Pretty bad because it does not utilize all available information because of Markov boundary Bayesian Networks for Classification

13 CSIT 5220 Page 13 Bayesian Networks for Classification l Tree-Augmented Naïve Bayes (TAN) model n Capture dependence among attributes using a tree structure. n During learning,  First learn a tree among attributes: use Chow-Liu algorithm  Special structure learning problem, easy  Add class variable and estimate parameters n Classification  arg max_c P(C=c|A1=a1, …, An=an)  BN inference  Many other methods

14 CSIT 5220 l Task: Find a tree model over observed variables that has maximum likelihood given data. l Maximized loglikelihood Chow-Liu Trees

15 CSIT 5220

16

17

18

19

20

21 l Mutual Information Chow-Liu Trees  Task is equivalent to finding maximum spanning tree of the following weighted and undirected graph:

22 CSIT 5220 Maximum Spanning Trees

23 CSIT 5220 l http://www.cs.cmu.edu/~guestrin/Class/15781/recitations/r10/11152007chowliu.pdf Illustration of Kruskal’s Algorithm

24 CSIT 5220 L10: Probabilistic Models (PMs) for Classification and Clustering Page 24 l Probabilistic Models (PMs) for Classification l PMs for Clustering

25 CSIT 5220 Page 25

26 CSIT 5220 Page 26

27 CSIT 5220 Page 27

28 CSIT 5220 Page 28

29 CSIT 5220 Page 29

30 CSIT 5220 Page 30

31 CSIT 5220 Page 31

32 CSIT 5220 Page 32

33 CSIT 5220 An Medical Application l In medical diagnosis, sometimes gold standard exists l Example: Lung Cancer n Symptoms:  Persistent cough, Hemoptysis (Coughing up blood), Constant chest pain, Shortness of breath, Fatigue, etc n Information for diagnosis: symptoms, medical history, smoking history, X-ray, sputum. n Gold standard:  Biopsy: the removal of a small sample of tissue for examination under a microscope by a pathologist

34 CSIT 5220 An Medical Application l Sometimes gold standard does not exist l Example: Rheumatoid Arthritis (RA) n Symptoms: Back Pain, Neck Pain, Joint Pain, Joint Swelling, Morning Joint Stiffness, etc n Information for diagnosis:  Symptoms, medical history, physical exam,  Lab tests including a test for rheumatoid factor.  (Rheumatoid factor is an antibody found in the blood of about 80 percent of adults with RA. ) n No gold standard:  None of the symptoms or their combinations are not clear-cut indicators of RA  The presence or absence of rheumatoid factor does not indicate that one has RA.

35 CSIT 5220 LC Analysis of Hannover Rheumatoid Arthritis Data n Class specific probabilities n Cluster 1: “disease” free n Cluster 2: “back-pain type” n Cluster 3: “Joint type” n Cluster 4: “Severe type”


Download ppt "THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L10: Model-Based Classification and Clustering Nevin."

Similar presentations


Ads by Google