Thirty-Two Years of Knowledge-Based Machine Learning Jude Shavlik University of Wisconsin Not on cs540 final.

Thirty-Two Years of Knowledge-Based Machine Learning Jude Shavlik University of Wisconsin Not on cs540 final

Key Question of AI: How to Get Knowledge into Computers? Hand coding Supervised ML How can we mix these two extremes? Small ML subcommunity has looked at ways to do so Slide 2

Learning from Labeled Examples Positive ExamplesNegative Examples Category of this example? Concept Solid Red Circle in a (Regular?) Polygon What about? Figures on left side of page Figures drawn before 5pm 2/2/89

Fixed-Length Feature Vectors (commonly assumed data representation) Feature 1 Feature 2... Feature N Output Category Example 10.0circlered true Example 29.3triangleredfalse Example 38.2squarebluefalse... Example M5.7trianglegreentrue We can handle more than Boolean outputs

Two Underexplored Questions in ML How can we go beyond teaching machines solely via I/O pairs?How can we go beyond teaching machines solely via I/O pairs? ‘advice giving’ How can we understand what an ML algorithm has discovered?How can we understand what an ML algorithm has discovered? ‘rule extraction’ Slide 5

Outline Explanation-Based Learning (1980’s)Explanation-Based Learning (1980’s) Knowledge-Based Neural Nets (1990’s)Knowledge-Based Neural Nets (1990’s) Knowledge-Based SVMs (2000’s)Knowledge-Based SVMs (2000’s) Markov Logic Networks (2010’s)Markov Logic Networks (2010’s) Slide 6

Explanation-Based Learning (EBL) – My PhD Years, 1983-1987 The EBL Hypothesis By understanding why an example is a member of a concept, can learn essential properties of the concept Trade-Off The need to collect many examples for The ability to ‘explain’ single examples (a ‘domain theory’) Ie, assume a smarter learner Slide 7

Knowledge-Based Artificial Neural Networks, KBANN (1988-2001) Initial Symbolic Domain Theory Final Symbolic Domain Theory Initial Neural Network Trained Neural network Examples Insert Extract Refine Slide 8 Mooney, Pazzani, Cohen, etc

What Inspired KBANN? Geoff Hinton was an invited speaker at ICML-88 I recall him saying something like “one can backprop through any function” And I thought “what would it mean to backprop through a domain theory?” Slide 9

Inserting Prior Knowledge into a Neural Network Domain Theory Final Conclusions Supporting Facts Intermediate Conclusions Neural Network Output Units Input Units Hidden Units Slide 10

Jumping Ahead a Bit Notice that symbolic knowledge induces a graphical model, which is then numerically optimized Similar perspective later followed in Markov Logic Networks (MLNs) However in MLNs, symbolic knowledge expressed in first-order logic Slide 11

Mapping Rules to Nets If A and B then Z If B and ¬C then Z Z ABCD 4 4 4 2 -4 4 0 6 2 4 0 0 0 Bias Weight Maps propositional rule sets to neural networks Slide 12

Case Study: Learning to Recognize Genes (Towell, Shavlik & Noordewier, AAAI-90) DNA sequence promoter contact conformation minus_10minus_35 promoter :- contact, conformation. contact :- minus_35, minus_10. (Halved error rate of standard BP) Slide 13 We compile rules to a more basic language, but here we compile for refinability

Learning Curves (similar results on many tasks and with other advice-taking algo’s) Testset Errors Amount of Training Data KBANN STD. ANN DOMAIN THEORY Slide 14 Fixed amount of data Given error-rate spec

From Prior Knowledge to Advice (Maclin PhD 1995) Originally ‘theory refinement’ community assumed domain knowledge was available before learning starts (prior knowledge) When applying KBANN to reinforcement learning, we began to realize you should be able to provide domain knowledge to a machine learner whenever you think of something to say Changing the metaphor: commanding vs. advising computers Continual (ie, lifelong) Human Teacher – Machine Learner Cooperation Slide 15

What Would You Like to Say to This Penguin? IF a Bee is (Near and West) & an Ice is (Near and North) an Ice is (Near and North)Then Begin Begin Move East Move East Move North Move North END END

Some Advice for Soccer- Playing Robots Slide 17 if distanceToGoal ≤ 10 and shotAngle  30 then prefer shoot over all other actions X

Some Sample Results Slide 18 Without advice With advice

Overcoming Bad Advice Bad Advice No Advice

Rule Extraction Initially Geoff Towell (PhD, 1991) viewed this as simplifying the trained neural network (M-of-N rules) Mark Craven (PhD, 1996) realized This is simply another learning task! Ie, learn what the neural network computes Collect I/O pairs from trained neural network Give them to decision-tree learner Applies to SVMs, decision forests, etc Slide 20

KBANN Recap Use symbolic knowledge to make an initial guess at the concept description Standard neural-net approaches make a random guess Use training examples to refine the initial guess (‘early stopping’ reduces overfitting) Nicely maps to incremental (aka online) machine learning Valuable to show user the learned model expressed in symbols rather than numbers

Knowledge-Based Support Vector Machines (2001-2011) Question arose during 2001 PhD defense of Tina Eliassi-Rad How would you apply the KBANN idea using SVMs? Led to collaboration with Olvi Mangasarian (who has worked on SVMs for over 50 years!) Slide 22

Generalizing the Idea of a Training Example for SVM’s Can extend SVM linear program to handle ‘regions as training examples’

Knowledge-Based Support Vector Regression Add soft constraints to linear program (so need only follow advice approximately) minimize ||w|| 1 + C ||s|| 1 + penalty for violating advice such that f(x) = y  s constraints that represent advice Advice: In this region, y should exceed 4 Inputs 4 Output

Sample Advice-Taking Results if distanceToGoal ≤ 10 and shotAngle  30 then prefer shoot over all other actions advice std RL 2 vs 1 BreakAway, rewards +1, -1 Q(shoot) > Q(pass) Q(shoot) > Q(move) Maclin et al: AAAI ‘05, ‘06, ‘07

Automatically Creating Advice Interesting approach to transfer learning (Lisa Torrey, PhD 2009) Learn in task A Perform ‘rule extraction’ Give as advice for related task B Since advice not assumed 100% correct, differences between tasks A and B handled by training ex’s for task B Slide 26 So advice giving is done by MACHINE!

Close-Transfer Scenarios 2-on-1 BreakAway 3-on-2 BreakAway 4-on-3 BreakAway

Distant-Transfer Scenarios 3-on-2 BreakAway 3-on-2 KeepAway 3-on-2 MoveDownfield

Some Results: Transfer to 3-on-2 BreakAway Torrey, Shavlik, Walker & Maclin: ECML 2006, ICML Workshop 2006

KBSVM Recap Can view symbolic knowledge as a way to label regions of feature space (rather than solely labeling points) Maximize Model Simplicity + Fit to Advice + Fit to Training Examples Note: does not fit view of “guess initial model, then refine using training ex’s” Slide 30

Markov Logic Networks, 2009+ (and statistical-relational learning in general) My current favorite for combining symbolic knowledge and numeric learning MLN = set of weighted FOPC sentences wgt=3.2  x,y,z parent(x, y)  parent(z, y) → married(x, z) Have worked on speeding up MLN inference (via RDBMS) plus learning MLN rule sets Slide 31

Learning a Set of First-Order Regression Trees (each path to a leaf is an MLN rule) – ICDM ‘11 Slide 32 Data Predicted Probs vs Gradients = Current Rules + + learn iterate Final Ruleset = ++ + + …

Some Results Slide 33 advisedBy AUC-PR CLL Time MLN-BT 0.94 ± 0.06 -0.52 ± 0.45 18.4 sec MLN-BC 0.95 ± 0.05 -0.30 ± 0.06 33.3 sec Alch-D 0.31 ± 0.10 -3.90 ± 0.41 7.1 hrs Motif 0.43 ± 0.03 -3.23 ± 0.78 1.8 hrs LHL 0.42 ± 0.10 -2.94 ± 0.31 37.2 sec

Differences from KBANN Rules involve logical variables During learning, we create new rules to correct errors in initial rules Recent followup: also refine initial rules (note that KBSVMs also do NOT refine rules, though we had one AAAI paper on that) Slide 34

Wrapping Up Symbolic knowledge refined/extended by Neural networks Support-vector machines MLN rule and weight learning Variety of views taken Make initial guess at concept, then refine weights Use advice to label a region in feature space Make initial guess at concept, then add wgt’ed rules Seeing what was learned – rule extraction Slide 35 Applications in genetics, cancer, machine reading, robot learning, etc

Some Suggestions Allow humans to continually observe learning and provide symbolic knowledge at any time Never assume symbolic knowledge is 100% correct Allow user to see what was learned in a symbolic representation to facilitate additional advice Put a graphic on every slide Slide 36

Thanks for Your Time Questions? Papers, Powerpoint presentations, and some software available on line pages.cs.wisc.edu/~shavlik/mlrg/publications.html hazy.cs.wisc.edu/hazy/tuffy/ pages.cs.wisc.edu/~tushar/rdnboost/i Slide 37

Thirty-Two Years of Knowledge-Based Machine Learning Jude Shavlik University of Wisconsin Not on cs540 final.

Similar presentations

Presentation on theme: "Thirty-Two Years of Knowledge-Based Machine Learning Jude Shavlik University of Wisconsin Not on cs540 final."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Thirty-Two Years of Knowledge-Based Machine Learning Jude Shavlik University of Wisconsin Not on cs540 final.

Similar presentations

Presentation on theme: "Thirty-Two Years of Knowledge-Based Machine Learning Jude Shavlik University of Wisconsin Not on cs540 final."— Presentation transcript:

Similar presentations

About project

Feedback