Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Ensembles of First-Order Clauses for Recall-Precision Curves Preliminary Thesis Proposal Mark Goadrich Department of Computer Sciences University.

Similar presentations


Presentation on theme: "Learning Ensembles of First-Order Clauses for Recall-Precision Curves Preliminary Thesis Proposal Mark Goadrich Department of Computer Sciences University."— Presentation transcript:

1 Learning Ensembles of First-Order Clauses for Recall-Precision Curves Preliminary Thesis Proposal Mark Goadrich Department of Computer Sciences University of Wisconsin – Madison USA 17 Dec 2004

2 Talk Outline Background Inductive Logic Programming Evaluation Metrics Biomedical Information Extraction Preliminary Work Three Ensemble Approaches Empirical Results Proposed Work Extensions to Algorithms Theoretical Results

3 Inductive Logic Programming Machine Learning Classify data into positive, negative categories Divide data into train and test sets Generate hypotheses on train set and then measure performance on test set In ILP, data are Objects … person, block, molecule, word, phrase, … and Relations between them grandfather, has_bond, is_member, …

4 Learning daughter(A,B) Positive Examples daughter(mary, ann) daughter(eve, tom) Negative Examples daughter(tom, ann) daughter(eve, ann) daughter(ian, tom) daughter(ian, ann) … Background Knowledge mother(ann, mary) mother(ann, tom) father(tom, eve) father(tom, ian) female(ann) female(mary) female(eve) male(tom) male(ian) Ann IanEve MaryTom Possible Clauses daughter(A,B) :- true. daughter(A,B) :- female(A). daughter(A,B) :- female(A), male(B). daughter(A,B) :- female(A), father(B,A). daughter(A,B) :- female(A), mother(B,A). … Father Mother Correct Theory

5 ILP Domains Object Learning Trains, Carcinogenesis Link Learning Binary predicates

6 Link Learning Large skew toward negatives 500 relational objects 5000 positive links means 245,000 negative links Enormous quantity of data 4,285,199,774 web pages indexed by Google PubMed includes over 15 million citations Difficult to measure success Always negative classifier is 98% accurate ROC curves look overly optimistic

7 Evaluation Metrics Classification vs Correctness Positive or Negative True or False Evaluation Recall Precision FPTP  FP FN TN correctness classification True Positive Rate False Positive Rate FNTP  FPTP  FPTP FP 

8 Evaluation Metrics Area Under Recall- Precision Curve (AURPC) Cumulative measure over recall-precision space All curves standardized to cover full recall range Average AURPC over 5 folds Recall Precision 1.0

9 AURPC Interpolation Convex interpolation in RP space? Precision interpolation is counterintuitive Example: 1000 positive & 9000 negative TPFPTP RateFP RateRecallPrec 500 0.500.060.50 100090001.00 0.10 Example CountsRP CurvesROC Curves 75047500.750.530.750.14

10 AURPC Interpolation

11 Biomedical Information Extraction *image courtesy of National Human Genome Research Institute

12 Biomedical Information Extraction Given: Medical Journal abstracts tagged with protein localization relations Do: Construct system to extract protein localization phrases from unseen text NPL3 encodes a nuclear protein with an RNA recognition motif and similarities to a family of proteins involved in RNA metabolism.

13 Biomedical Information Extraction Hand-labeled dataset (Ray & Craven ’01) 7,245 sentences from 871 abstracts Examples are phrase-phrase combinations 1,810 positive & 279,154 negative 1.6 GB of background knowledge Structural, Statistical, Lexical and Ontological In total, 200+ distinct background predicates

14 Biomedical Information Extraction NPL3 encodes a nuclear protein with … verbnounarticleadjnounprepsentence prep phrase … verb phrase noun phrase noun phrase alphanumericmarked location noun phrase noun phrase

15 Related Work Bagging in ILP (Dutra et. al.) Boosting FOIL (Quinlan) Boosting ILP (Hoche) Structural HMM (Ray and Craven) WAWA-IE (Eliassi-Rad and Shavlik) Markov Logic Nets (Richardson and Domingos) ELCS (Bunescu et. al.)

16 Talk Outline Background Inductive Logic Programming Evaluation Metrics Biomedical Information Extraction Preliminary Work Three Ensemble Approaches Empirical Results Proposed Work Extensions to Algorithms Theoretical Results

17 Aleph - Background Seed Example A positive example that our clause must cover Bottom Clause All predicates which are true about seed example seed

18 Aleph - Learning Aleph learns theories of clauses (Srinivasan, v4, 2003) Pick positive seed example, find bottom clause Use heuristic search to find best clause Pick new seed from uncovered positives and repeat until threshold of positives covered Theory produces one recall-precision point Learning complete theories is time-consuming Can produce ranking with ensembles

19 ILP Ensembles Three Approaches Aleph Ensembles of Multiple Theories Clause Weighting of One Theory Gleaner Evaluation Area Under Recall Precision Curve (AURPC) Time = Number of clauses considered

20 Aleph Ensembles We construct ensembles of theories Algorithm ( Dutra et al ILP 2002 ) Use K different initial seeds Learn K theories containing C clauses Rank examples by the number of theories Need to balance C for high performance Small C leads to low recall Large C leads to converging theories

21 Aleph Ensembles (100 theories)

22 Clause Weighting Single Theory Ensemble rank by how many clauses cover examples Weight clauses using tuneset statistics Ordered Rank by Precision or Lowest False Positive Rate Average Among all matching clauses Cumulative PrecisionDiversity on negatives F1 scoreRecall

23 Clause Weighting

24 Gleaner Develop fast ensemble algorithms focused on recall and precision evaluation Definition of Gleaner One who gathers grain left behind by reapers Key Ideas of Gleaner Keep wide range of clauses Create separate theories for different recall ranges

25 Gleaner - Background Rapid Random Restart ( Zelezny et al ILP 2002 ) Stochastic selection of initial clause Time-limited local heuristic search Randomly choose new initial clause and repeat seed initial

26 Gleaner - Learning Precision Recall Create B Bins Generate Clauses Record Best per Bin Repeat for K seeds

27 Gleaner - Combining Combine K clauses per bin If at least L of K clauses match, call example positive How to choose L ? L=1 then high recall, low precision L=K then low recall, high precision Our method Choose L such that ensemble recall matches bin b Bin b’s precision should be higher than any clause in it We should now have set of high precision rule sets spanning space of recall levels

28 How to use Gleaner Precision Recall Generate Curve User Selects Recall Bin Return Classifications With Precision Confidence Recall = 0.50 Precision = 0.70

29 Experimental Methodology Performed five-fold cross-validation Variation of parameters Gleaner (20 recall bins) seeds = {25, 50, 75, 100} clauses = {1K, 10K, 25K, 50K, 100K, 250K, 500K} Aleph Ensembles (0.75 minacc, 35,000 nodes) theories = {10, 25, 50, 75, 100} clauses per theory = {1, 5, 10, 15, 20, 25, 50} Clause Weighting (1 Aleph theory) clauses = {25, 50, 100, 271}

30 Empirical Results

31 Results: Testfold 5 at 1,000,000 clauses Ensembles Gleaner

32 Results: Testfold 5 at 1,000,000 clauses

33 Conclusions Gleaner Focuses on recall and precision Keeps wide spectrum of clauses Aleph ensembles ‘Early stopping’ helpful Clause Weighting Cumulative statistics important AURPC Useful metric for comparison Interpolation unintuitive

34 Talk Outline Background Inductive Logic Programming Evaluation Metrics Biomedical Information Extraction Preliminary Work Three Ensemble Approaches Empirical Results Proposed Work Extensions to Algorithms Theoretical Results

35 Proposed Work Improve Gleaner in High Recall areas Need more emphasis on diverse clauses Search for clauses that optimize AURPC Use RankBoost and AURPC heuristic Examine more ILP link-learning datasets Focus within Information Extraction Better understanding of AURPC Relationship with ROC curves, F1-score

36 Gleaner – Precision Bins Precision Recall Create B Bins Generate Clauses Record Best per Bin Repeat for K seeds

37 Gleaner – Save Per Jump Rapid Random Restart makes Jumps Every 1,000 clauses, find new space to search Saving best “per jump” will increase diversity seed initial

38 Gleaner – Negative Seeds High Recall clauses found at top of lattice Perform Breadth-First Search Bias search away from Negative Examples seed

39 ROC vs. RP Curves

40 What is the relationship between ROC curves and RP curves? Will optimizing one optimize the other? ROC vs RP Curves

41 Optimizing AURPC WARNING! SLIDE INCOMPLETE!

42 Acknowledgements USA NLM Grant 5T15LM007359-02 USA NLM Grant 1R01LM07050-01 USA DARPA Grant F30602-01-2-0571 USA Air Force Grant F30602-01-2-0571 Condor Group David Page Vitor Santos Costa, Ines Dutra Soumya Ray, Marios Skounakis, Mark Craven


Download ppt "Learning Ensembles of First-Order Clauses for Recall-Precision Curves Preliminary Thesis Proposal Mark Goadrich Department of Computer Sciences University."

Similar presentations


Ads by Google