Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set.

Similar presentations


Presentation on theme: "Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set."— Presentation transcript:

1 Presenter: Russell Greiner

2 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set of labels of set of pixels, … value of a property of an instance, …

3 3 TempPress. Sore- Throat … Color 3290N … Pale Predictor treatX Ok Need to know “ label ” of an instance, to determine appropriate action Predictor Med ( patient#2 ) = ? “ treatX is Ok ” Unfortunately, Predictor(. ) not known a priori But many examples of  patient, treatX 

4 4 Machine learning provide alg ’ s for mapping {  patient, treatX  } to Predictor(.) function Pale … N9032 Color … Sore- Throat Press.Temp Predictor treatX Ok Learner N N Y Sore Throat … … … … No Pale8710 :::: Ok Clear11022 No Pale9535 treatXColourPress.Temp.

5 5 Need to learn (not program it in) when predictor is … … not known … not expressible … changing … user dependent Pale … N9032 Color … Sore- Throat Press.Temp Predictor treatX No Learner N N Y Sore Throat … … … … No Pale8710 :::: Ok Clear11022 No Pale9535 treatX ColourPress.Temp.

6 6 PI synergy: Greiner, Schuurmans, Holte, Sutton, Szepesvari, Goebel 5 Postdocs 16 Grad students (5 MSc, 11 PhD) 5 Supporting technical staff + personnel for Bioinformatics thrust

7 7 4 UofA CS profs 1 UofAlberta Math/Stat Non-UofA collaborators: Google, Yahoo!, Electronic Arts, UofMontreal, UofWaterloo, UofNebraska, NICTA, NRC-IIT, … + Bioinformatics thrust collaborators

8 8 Grants $225K CFI $100K MITACS $100K Google Hardware 68 processor, 2TB, Opteron Cluster 54 processor, dual core, 1.5TB, Opteron Cluster + funds/data for Bioinformatics thrust

9 9 IJCAI 2005 – Distinguished Paper Prize UM 2003 – Best Student Paper Prize WebIC technology is foundation for start-up company Significant advances in extending SVMs to use Un-supervised/Semi-supervised data, and for structured data + Highlights from Bioinformatics thrust

10 10 Simplifying assumptions re: training data IID / unstructured Lots of instances Low dimensions Complete features Completely labeled Balanced data is sufficient Pale … N9032 Colo r … Sore- Throat Press. Tem p Predictor treatX No Learner N N Y Sore Thro at … … … … No Pale8710 :::: Ok Clear11022 No Pale9535 treatX Colo ur Press Tem p.

11 11 Segmenting Brain Tumors Extensions to Conditional Random Fields, …

12 12 m  1000’s 7.32.15.0 … 1.1Y 22.16.033.1 … 3.0Y 22.16.033.1 … 3.0Y 22.16.033.1 … 3.0Y 22.16.032.2 … 3.0Y 22.16.0312 … 3.0Y 22.16.035 … 3.0Y ::::: 32.01.95.8 … 2.8N N  10’s

13 13 g1g2g3 … gNdisease 7.32.155.0 … 1.1Y 22.16.0329.1 … 3.0Y ::::: 32.01.915.8 … 2.8N m  100 N  20,000 Microarray, SNP Chips, … Dimensionality Reduction … L 2 Model: Component Discovery BiCluster Coding

14 14 g1g2g3 … gNdiseaseX 7.32.155.0 … 1.1 Y 22.16.0329.1 … 3.0 Y :::: : 32.01.915.8 … 2.8 N Budget Learning g1g2g3 … gNdiseaseX 7.32.155.0 … 1.1 Y 22.1 … 3.0 Y :::: : 1.9 … N g1g2g3 … gNdiseaseX Y Y : N

15 15 g1g2g3 … gNtreatX 7.32.155.0 … 1.1Y 22.16.0329.1 … 3.0Y 20.76.0329.1 … 3.0N 22.18.7320.1 … 5.0N 1236.0317.1 … 7.0Y ::::: 32.01.915.8 … 2.8N SemiSupervised Learning Active Learning treatX Y Y

16 16 Cost Curves (analysis)

17 17 Robust SVM Mixture Using Variance Large Margin Bayes Net Coordinate Classifier …

18 18 Structured Prediction Random Fields Parsing Unsupervised M 3 N Dimensional Reduction (L 2 Model: Component Discovery) Budgeted Learning SemiSupervised Learning large-margin (SVM) probabilistic (CRF) graph based transduction Active Learning CostCurves Robust SVM Coordinated Classifiers Mixture Using Variance Large Margin Bayes Net IID / unstructured Lots of instances Low dimensions Complete features Completely labeled Balanced data Beyond simple learners Poster # 26

19 Technical Details

20 20 b05b1 b13a0 a11a0 b11a0 a03a1 Person 1 Person 2 Response Predictor Learner

21 21 Person 1 Person 2 Response Predictor Learner b05b ? b13a ? a11a ? b11a ? a03a ? User is able to PURCHASE labels, at some cost … for which instances?? ? + ? + --

22 22 Person 1 Person 2 Response Predictor Learner ???? 1 ???? 0 ???? 0 ???? 0 ???? 1 User is able to PURCHASE values of features, at some cost … but which features for which instances?? 15+t 05+f ???? ???? ????

23 23 Person 1 Person 2 Response Predictor Learner ???? 1 ???? 0 ???? 0 ???? 0 ???? 1 Significantly different from ACTIVE learning: correlations between feature values ?5?? ??+? 0?+? ?9--? ???? User is able to PURCHASE values of features, at some cost … but which features for which instances??

24 24 10 tests ($1/test) Budget =$40 Beta(10,1) # features purchased

25 25 Defined framework Ability to purchase individual feature values Fixed LEARNING / CLASSIFICATION Budget Theoretical results NP-hard in general Standard algorithms not even Approx ! Empirical Results show … Avoid Round Robin Try clever algorithms Biased Robin Randomized Single Feature Lookahead [Lizotte,Madani,Greiner: UAI’03], [Madani,Lizotte,Greiner: UAI’04], [Kapoor,Greiner: ECML’05]

26 26 Person 1 Person 2 Response Classifier Learner ????1 b???0 ????0 ????0 ????1 ? ? ? ? ?

27 27 Sample complexity of Budgeted Learning How many (I j, X i ) “ probes ” required to PAC-learn ? Develop policies with guarantees on learning performance Complex cost model … Bundling tests, … Allow learner to perform more powerful probes purchase X 3 in instance where X 7 = 0 & Y = 1 More complex classifiers ?

28 28 Person 1 Person 2 Response ????? ????? ????? ????? ????? Goal: Find  * = argmax  P  (D) ? ? ? ? ? Learning Generative Model

29 29 Structured Prediction (ongoing) Dimensional Reduction: (ongoing; RoBiC: Poster#8) Budgeted Learning (ongoing) SemiSupervised Learning (ongoing) Active Learning (ongoing) CostCurves (complete; Post#26) Labels M Test M Train 01. 1 11. 0 10 … 1 11 … 0 00 … 1 11 … 0 Learner Classifier + – – + – + – Find BiClust ers BiCluster Membership

30 30

31 Technical Details

32 32 Spse many different classifiers … For each instance, want each classifier to … “ know what it knows ” … … and shout LOUDEST when it knows best … “ Loudness ”  1/ Variance ! C2C2 + + + + + + o o o o o o + + + + + + + o o o o o o + o + + + + + + o o o o o o + + + + + + + o o o o o o + C1C1 C3C3 C4C4 * §

33 33 Given belief net classifier fixed (correct) structure parameters  estimated from (random) datasample Response to query “ P(+c| -e, +w) ” is … asymptotically normal with … (asymptotic) variance Variance easy to compute … for simple structures (Na ï ve Bayes, TAN) … and for complete queries

34 34 MUV significantly out-performs AdaBoost even when using base-classifiers that AdaBoost generated! MUV (kNB, AdaBoost, js) better than AdaBoost[NB] with p < 0.023

35 35 Sound statistical foundation Very effective classifier … … across many real datasets MUV(NB) better than AdaBoost(NB)! C. Lee, S. Wang and R. Greiner; ICML’06

36 36 Other structures (beyond NB, TAN) Beyond just tabular CPtables for discrete variables Noisy-or Gaussians Learn different base-classifiers from different subset of features Scaling up to many MANY features overfitting characteristics?

37 37 Confidence of Prediction? Fit each  j,  j 2  to Beta(a ji, b j ) Compute area CDF Beta(a j, b j ) (0.5)

38 38 Temp.BP. Sore Throat … ColourdiseaseX 3595Y … Pale No 22110N … ClearYes ::: 1087N … Pale TempPress. Sore- Throat … Color 3290N … Pale Classifier diseaseX No Learner Temp.BP. Sore Throat … ColourdiseaseX 3595Y … Pale No 22110N … ClearYes 1087N … PaleYes 1782Y … Red No 3382N … Blue No :::: 487N … Pale No Labeled Training Data UnLabeled Training Data

39 39 Ignore the unlabeled data Great if have LOTS of labeled data Use the unlabeled data, as is … “ Semi-Supervised Learning ”… based on large margin (SVM) graph probabilistic model Pay to get labels for SOME unlabeled data “ Active Learning ”

40 40 Approach: find a labeling that would yield an optimal SVM classifier, on the resulting training data. Hard, but semi-definite relaxations can approximate this objective surprisingly well training procedures are computationally intensive, but produce high quality generalization results. L. Xu, J. Neufeld, B. Larson, D. Schuurmans. Maximum margin clustering. NIPS-04. L. Xu and D. Schuurmans. Unsupervised and semi-supervised multi-class SVMs. AAAI-05.

41 41 Probabilistic model: P(y|x) Context: non-IID data Language modelling Segmenting Brain Tumor from MR Images Use Unlabeled Data as Regularizer Future: Other applications … C-H. Lee, W. Shaojun, F. Jiao, D. Schuurmans and R. Greiner. Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields. NIPS06. F. Jiao, S. Wang, C. Lee, R. Greiner, and D. Schuurmans. Semi-supervised conditional random fields for improved sequence segmentation and labeling. COLING/ACL06.

42 42 Pay for label to query x i that... maximizes conditional mutual information about unlabeled data: How to determine y i ? Take EXPECTATION wrt Y i ? Use OPTIMISTIC guess wrt Y i ?

43 43 Need Optimism Need “ on-line adjustment ” Better than just MostUncertain, … pima breast Y. Guo and R. Greiner. Optimistic active learning using mutual information. IJCAI’07

44 44 Understand WHY “ optimism ” works … + other applications of optimism Extend framework to deal with non-iid data different qualities of labelers …


Download ppt "Presenter: Russell Greiner. 2 Helping the world understand data … and make informed decisions. Single decision: determine class label of an instance set."

Similar presentations


Ads by Google