Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge-Based Discovery: Using Semantics in Machine Learning Bruce Buchanan Joe Phillips University of Pittsburgh cs.pitt.edu cs.pitt.edu.

Similar presentations


Presentation on theme: "Knowledge-Based Discovery: Using Semantics in Machine Learning Bruce Buchanan Joe Phillips University of Pittsburgh cs.pitt.edu cs.pitt.edu."— Presentation transcript:

1 Knowledge-Based Discovery: Using Semantics in Machine Learning Bruce Buchanan Joe Phillips University of Pittsburgh cs.pitt.edu cs.pitt.edu

2 Intelligent Systems Laboratory Faculty:Bruce Buchanan, P.I., John Aronis Collaborators:John Rosenberg (Biol.Sci.), Greg Cooper (Medicine), Bob Ferrell (Genetics), Janyce Wiebe (CS), Lou Penrod (Rehab.Med.), Rich Simpson (Rehab.Sci.), Russ Altman (Stanford MIS) Research Associates: Joe Phillips, Paul Hodor, Vanathi Gopalakrishnan, Wendy Chapman Ph.D. Students: Gary Livingston, Dan Hennessy, Venkat Kolluri, Will Bridewell, Lili Ma M.S. Students:Karl Gossett

3 GOALS (A) Learn understandable & interesting rules from data (B) Construct an understandable & coherent model from rules METHOD: To use background knowledge to search for: simple rules with familiar predicates interesting and novel rules coherent models

4 Familiar Syntax –(conditional rules) Syntactically Simple Semantically Simple Familiar Predicates Accurate Predictions Meaningful Rules Relevant to Question Novel Cost-Effective Coherent Model Rules or Models: Understandable | Interesting

5 The RL Program Explicit Bias Training Examples RL RULES New Cases Performance Program Predictions MODEL Partial Domain Model HAMB

6 (A) Individual Rules J. Phillips Rehabilitation Medicine Data

7 Simple single rules Syntactic Simplicity –Fewer terms on the LHS Explicitly stated constraints (rules with no more than N terms) Tagged attributes (e.g. must have at least one control attribute to be interesting)

8 Simple sets of rules Syntactic simplicity –Fewer rules: independent rules E.g. in physics: U(x) = U gravity (x) + U electronic (x) + U magnetic (x) HAMB removes highly similar terms from feature set –less independence when theres feedback e.g. medicine

9 Interestingness: Given, controlled and observed –explicitly state observed attributes as interesting target Temporal –future (or distant past) predictions are interesting Influence diagram (e.g. Bayes net) –strong but more indirect influences are interesting

10 Using typed attribute background knowledge Organize terms into given, controlled and observed –E.g. in medical domain demographics, intervention and outcome Benefits: –Categorization of rules by whether they use givens (default), controls (controllable) or both (conditionally controllable):

11 Typed attribute example Rehab. (RL; Phillips, Buchanan, Penrod) > 2000 records givencontrolled observed demographic medical temporal medical age race sex admit general_condition specific_condition time rate absolute normalize

12 Example interestingness: Group rules by whether they predict by medical, demographic or both: –by medical: Left_Body_Stroke => poor improvement (interesting, expected) –by demographic: High_age => poor improvement (interesting, expected) (Race=X) => poor improvement (interesting, NOT expected)

13 Using temporal background knowledge Organize data by time –Utility may or may not extend to other metric spaces (e.g. space, mass) Benefits: –Predictions parameterized by time: f(t) Future or distant past may be interesting –Cyclical patterns

14 Temporal example Geophysics (Scienceomatic; Phillips 2000) –Subduction zone discoveries of type: d(q after ) = d(q main ) + m*[t(q after )-t(q main )] + b –NOTE: This is not an accurate prediction! –interesting, generally quakes cant be predicted X d

15 Using influence diagram background knowledge This is future work! Organize terms to follow pre-existing influence diagram –E.g. Bayesian nets, but do not need conditional probabilities Benefits: –Suggest hidden variables, new influences f(x) => f(x,y)

16 Interestingness summary How different types of background knowledge help us achieve interestingness: –Explicitly stated: observed attributes –Implicitly stated: parameterized equations with interesting parameters –Learned: new influence factors

17 (B) Coherent Models B.Buchanan Protein Data

18 EXAMPLE: Predicting Ca++ Binding Sites (G.Livingston) Given 3-d descriptions of 16 sites in proteins that bind calcium ions & 100 other sites that do not Find a model that allows predicting whether a proposed new site will bind Ca++ [in terms of subset of 63 attributes]

19 Ca++ binding sites in proteins SOME ATTRIBUTES ATOM-NAME-IS-C ATOM-NAME-IS-O CHARGE CHARGE-WITH-HIS HYDROPHOBICITY MOBILITY RESIDUE-CLASS1-IS-CHARGED RESIDUE-CLASS1-IS-HYDROPHOBIC RESIDUE-CLASS2-IS-ACIDIC RESIDUE-CLASS2-IS-NONPOLAR RESIDUE-CLASS2-IS-UNKNOWN RESIDUE-NAME-IS-ASP RESIDUE-NAME-IS-GLU RESIDUE-NAME-IS-HOH RESIDUE-NAME-IS-LEU RESIDUE-NAME-IS-VAL RING-SYSTEM SECONDARY-STRUCTURE1-IS-4-HELIX SECONDARY-STRUCTURE1-IS-BEND SECONDARY-STRUCTURE1-IS-HET SECONDARY-STRUCTURE1-IS-TURN SECONDARY-STRUCTURE2-IS-BETA SECONDARY-STRUCTURE2-IS-HET VDW-VOLUME

20 Predicting Ca++ Binding Sites semantic types of attributes: PhysicalChemicalStructural e.g., solvent accessibility charge VDW volume heteroatom oxygen carbonyl ASN helix beta-turn ring-system mobility

21 Coherent Model = subset of locally acceptable rules that explains as much of the data uses entrenched predicates [Goodman] uses predicates of same semantic type uses predicates of same grain size uses classes AND their complements avoids rules that are "too similar": identical; subsuming; sem.close

22 EXAMPLE: predict Ca++ binding sites in proteins 158 rules found independently. E.g., R1: IF a site (a) is charged > 18.5 AND (b) no. of C=O > THEN it binds calcium R2: IF a site (a) is charged > 18.5 AND (b) no. of ASN > 15 THEN it binds calcium

23 Predicting Ca++ Binding Sites semantic network of attributes Heteroatoms Sulfur Oxygen... Nitrogen "Hydroxyl" Carbonyl Amide Amine | SH OH ASP GLU ASN GLN...PRO | / CYS SER THR TYR...

24 Ca++ binding sites in proteins 58 rules above threshold: threshold = at least 80% TP AND no more than 20% FP 42 rules predict SITE 16 rules predict NON-SITE Average accuracy for five 5-fold x-validations = 100% for the redundant model with 58 rules

25 Predicting Ca++ Binding Sites Prefer complementary rules -- e.g., R59: IF, within 5 A of a site, # oxygens > 6.5 THEN it binds calcium R101: IF, within 5 A of a site, # oxygens <= 6.5 THEN it does NOT bind calcium o o

26 5 A Radius Model Five perfect rules* R1. #Oxygen LE > NON-SITE R2. Hydrophobicity GT > NON-SITE R3. #Oxygen GT > SITE R4. Hydrophobicity LE > SITE R5. #Carbonyl GT 4.5 & #Peptide LE > SITE * ( 100% of TP's and 0 FP's ) o

27 Final Result Ca++ binding sites in proteins Model with 5 rules: same accuracy no unique predicates no subsumed or very similar rules more genl. rules for SITES (prior prob. < 0.01) more specific rules for NON-SITES (prior prob. > 0.99)

28 Predicting Ca++ Binding Sites Attribute Hierarchies RESIDUE CLASS 1 POLAR (ASN, CYS, GLN, HIS, SER THR, TYR, TRP, GLY) CHARGED (ARG ASP GLU LYS) HYDROPHOBIC (ALA ILE LEU MET PHE PRO VAL)

29 Attribute Hierarchies RESIDUE CLASS 2 POLAR (ASN, CYS, GLN, HIS, SER THR, TYR, TRP, GLY) CHARGED ACIDIC (ARG ASP GLU) BASIC ( LYS) NONPOLAR (ALA ILE LEU MET PHE PRO VAL) TRP HIS

30 CONCLUSION Induction systems can be augmented with semantic criteria to provide (A) interesting & understandable rules syntactically simple meaningful (B) coherent models equally predictive closer to a theory

31 CONCLUSION We have shown –how specific types of background knowledge might be incorporated in the rule discovery process –possible benefits of incorporating those types of knowledge more coherent models more understandable models more accurate models


Download ppt "Knowledge-Based Discovery: Using Semantics in Machine Learning Bruce Buchanan Joe Phillips University of Pittsburgh cs.pitt.edu cs.pitt.edu."

Similar presentations


Ads by Google