1 Transfer Learning by Mapping and Revising Relational Knowledge Raymond J. Mooney University of Texas at Austin with acknowledgements to Lily Mihalkova,

Slides:



Advertisements
Similar presentations
Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
A Tutorial on Learning with Bayesian Networks
Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Discriminative Training of Markov Logic Networks
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Bayesian Abductive Logic Programs Sindhu Raghavan Raymond J. Mooney The University of Texas at Austin 1.
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Fast Algorithms For Hierarchical Range Histogram Constructions
Markov Logic Networks Instructor: Pedro Domingos.
Department of Computer Science The University of Texas at Austin Probabilistic Abduction using Markov Logic Networks Rohit J. Kate Raymond J. Mooney.
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u
Efficient Weight Learning for Markov Logic Networks Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Speaker:Benedict Fehringer Seminar:Probabilistic Models for Information Extraction by Dr. Martin Theobald and Maximilian Dylla Based on Richards, M., and.
School of Computing Science Simon Fraser University Vancouver, Canada.
Learning Markov Network Structure with Decision Trees Daniel Lowd University of Oregon Jesse Davis Katholieke Universiteit Leuven Joint work with:
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
CSE 574: Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Relational Models. CSE 515 in One Slide We will learn to: Put probability distributions on everything Learn them from data Do inference with them.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Boosting Markov Logic Networks
Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.
1 CS 343: Artificial Intelligence Bayesian Networks Raymond J. Mooney University of Texas at Austin.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Markov Logic And other SRL Approaches
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,
@delbrians Transfer Learning: Using the Data You Have, not the Data You Want. October, 2013 Brian d’Alessandro.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Tuffy Scaling up Statistical Inference in Markov Logic using an RDBMS
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Markov Logic Networks Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Matt Richardson)
Learning to “Read Between the Lines” using Bayesian Logic Programs Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku The University of Texas at Austin July.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Happy Mittal (Joint work with Prasoon Goyal, Parag Singla and Vibhav Gogate) IIT Delhi New Rules for Domain Independent Lifted.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
1 Bottom-Up Search and Transfer Learning in SRL Raymond J. Mooney University of Texas at Austin with acknowledgements to Lily Mihalkova Tuyen Huynh and.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Happy Mittal Advisor : Parag Singla IIT Delhi Lifted Inference Rules With Constraints.
Data Mining Lecture 11.
Discriminative Probabilistic Models for Relational Data
Presentation transcript:

1 Transfer Learning by Mapping and Revising Relational Knowledge Raymond J. Mooney University of Texas at Austin with acknowledgements to Lily Mihalkova, Tuyen Huynh

2 Transfer Learning Most machine learning methods learn each new task from scratch, failing to utilize previously learned knowledge. Transfer learning concerns using knowledge acquired in a previous source task to facilitate learning in a related target task. Usually assume significant training data was available in the source domain but limited training data is available in the target domain. By exploiting knowledge from the source, learning in the target can be: –More accurate: Learned knowledge makes better predictions. –Faster: Training time is reduced.

3 Transfer Learning Curves Transfer learning increases accuracy in the target domain. Amount of training data in target domain Predictive Accuracy NO TRANSFER TRANSFER FROM SOURCE Jump start Advantage from Limited target data

4 Recent Work on Transfer Learning Recent DARPA program on Transfer Learning has led to significant recent research in the area. Some work focuses on feature-vector classification. –Hierarchical Bayes (Yu et al., 2005; Lawrence & Platt, 2004) –Informative Bayesian Priors (Raina et al., 2005) –Boosting for transfer learning (Dai et al., 2007) –Structural Correspondence Learning (Blitzer et al., 2007) Some work focuses on Reinforcement Learning –Value-function transfer (Taylor & Stone, 2005; 2007) –Advice-based policy transfer (Torrey et al., 2005; 2007)

5 Similar Research Problems Multi-Task Learning (Caruana, 1997) –Learn multiple tasks simultaneously; each one helped by the others. Life-Long Learning (Thrun, 1996) –Transfer learning from a number of prior source problems, picking the correct source problems to use.

6 Logical Paradigm Represents knowledge and data in binary symbolic logic such as First Order Predicate Calculus. + Rich representation that handles arbitrary sets of objects, with properties, relations, quantifiers, etc.  Unable to handle uncertain knowledge and probabilistic reasoning.

7 Probabilistic Paradigm Represents knowledge and data as a fixed set of random variables with a joint probability distribution. + Handles uncertain knowledge and probabilistic reasoning.  Unable to handle arbitrary sets of objects, with properties, relations, quantifiers, etc.

8 Statistical Relational Learning (SRL) Most machine learning methods assume i.i.d. examples represented as fixed-length feature vectors. Many domains require learning and making inferences about unbounded sets of entities that are richly relationally connected. SRL methods attempt to integrate methods from predicate logic and probabilistic graphical models to handle such structured, multi-relational data.

9 Statistical Relational Learning pacino brando … coppola … godFatherpacino godFatherbrando godFathercoppola streetCarbrando …… pacinocoppola brandocoppola …… ActorMovieDirectorWorkedFor Multi-Relational Data Learning Algorithm Probabilistic Graphical Model

10 Multi-Relational Data Challenges Examples cannot be effectively represented as feature vectors. Predictions for connected facts are not independent. (e.g. WorkedFor(brando, coppolo), Movie(godFather, brando) ) –Data is not i.i.d. –Requires collective inference (classification) (Taskar et al., 2001) A single independent example (mega-example) often contains information about a large number of interconnected entities and can vary in length. –Leave one university out testing (Craven et al., 1998)

TL and SRL and I.I.D. Standard Machine Learning assumes examples are: Independent and Identically Distributed 11 SRL breaks the assumption that examples are independent TL breaks the assumption that test examples are drawn from the same distribution as the training instances

12 Multi-Relational Domains Domains about people –Academic departments (UW-CSE) –Movies (IMDB) Biochemical domains –Mutagenesis –Alzheimer drug design Linked text domains –WebKB –Cora

13 Relational Learning Methods Inductive Logic Programming (ILP) –Produces sets of first-order rules –Not appropriate for probabilistic reasoning If a student wrote a paper with a professor, then the professor is the student’s advisor SRL models & learning algorithms –SLPs (Muggleton, 1996) –PRMs (Koller, 1999) –BLPs (Kersting & De Raedt, 2001) –RMNs (Taskar et al., 2002) –MLNs (Richardson & Domingos, 2006) Markov logic networks

14 MLN Transfer (Mihalkova, Huynh, & Mooney, 2007) Given two multi-relational domains, such as: Transfer a Markov logic network learned in the Source to the Target by: –Mapping the Source predicates to the Target –Revising the mapped knowledge IMDB Director(A) Actor(A) Movie(T, A) WorkedFor(A, B) …UW-CSE Professor(A) Student(A) Publication(T, A) AdvisedBy(A, B) … Source: Target :

15 First-Order Logic Basics Literal: A predicate (or its negation) applied to constants and/or variables. –Gliteral: Ground literal; WorkedFor(brando, coppola) –Vliteral: Variablized literal;  WorkedFor(A, B) We assume predicates have typed arguments. –For example: Movie(godFather, coppola) movieTitleperson

16 First-Order Clauses Clause: A disjunction of literals Can be rewritten as a set of rules:

17 Actor(pacino)WorkedFor(pacino, coppola)Movie(godFather, pacino) Actor(brando)WorkedFor(brando, coppola)Movie(godFather, brando) Director(coppola)Movie(godFather, coppola) Representing the Data Makes a closed world assumption: –The gliterals listed are true; the rest are false

18 Markov Logic Networks (Richardson & Domingos, 2006) Set of first-order clauses, each assigned a weight. Larger weight indicates stronger belief that the clause should hold. The clauses are called the structure of the MLN

19 Markov Networks (Pearl, 1988) A concise representation of the joint probability distribution of a set of random variables using an undirected graph. Quality of Paper Reputation of Author AlgorithmExperiments RAEQ eeee0.7 eeep0.1 eepe0.03 … Same probability distribution can be represented as the product of a set of functions defined over the cliques of the graph Joint distribution

20 Markov Network Equations General form Log-linear models weights features

21 Ground Markov Network for an MLN MLNs are templates for constructing Markov networks for a given set of constants: –Include a node for each type-consistent grounding (a gliteral) of each predicate in the MLN. –Two nodes are connected by an edge if their corresponding gliterals appear together in any grounding of any clause in the MLN. –Include a feature for each grounding of each clause in the MLN with weight equal to the weight of the clause.

22 constants: coppola, brando, godFather Movie(godFather, brando)Movie(godFather,coppola) WorkedFor(coppola, brando)WorkedFor(coppola, coppola) WorkedFor(brando, brando)WorkedFor(brando, coppola) Director(brando) Actor(brando) Director(coppola) Actor(coppola)

23 MLN Equations Compare to log-linear Markov networks: all groundings of f i all clauses in the model

24 MLN Equation Intuition A possible world (a truth assignment to all gliterals) becomes exponentially less likely as the total weight of all the grounded clauses it violates increases.

25 MLN Inference Given truth assignments for given set of evidence gliterals, infer the probability that each member of set of unknown query gliterals is true.

26 WorkedFor(coppola, brando)WorkedFor(coppola, coppola) WorkedFor(brando, brando)WorkedFor(brando, coppola) Director(brando) Actor(brando) Director(coppola) Actor(coppola) Movie(godFather, brando)Movie(godFather,coppola) F TT T

27 MLN Inference Algorithms Gibbs Sampling (Richardson & Domingos, 2006) MC-SAT (Poon & Domingos, 2006)

28 MLN Learning Weight-learning (Richardson & Domingos, 2006; Lowd & Domingos, 2007) –Performed using optimization methods. Structure-learning (Kok & Domingos, 2005) –Proceeds in iterations of beam search, adding the best-performing clause after each iteration to the MLN. –Clauses are evaluated using WPLL score.

29 WPLL (Kok & Domingos, 2005) Weighted pseudo log-likelihood Compute the likelihood of the data according to the model and take log For each gliteral, condition on its Markov blanket for tractability Weight it so that predicates with greater arity do not dominate

Alchemy Open-source package of MLN software provided by UW that includes: –Inference algorithms –Weight learning algorithms –Structure learning algorithm –Sample data sets All our software uses and extends Alchemy. 30

31 TAMAR ( T ransfer via A utomatic M apping And R evision) Target (IMDB) Data M-TAMAR R-TAMAR Movie(T,A)  WorkedFor(A,B) → Movie(T,B) Movie(T,A)  WorkedFor(A,B)  Relative(A,B) → Movie(T,B) Publication(T,A)  AdvisedBy(A,B) → Publication(T,B) (clause from UW-CSE) Predicate Mapping: Publication  Movie AdvisedBy  WorkedFor

32 Predicate Mapping Each clause is mapped independently of the others. The algorithm considers all possible ways to map a clause such that: –Each predicate in the source clause is mapped to some target predicate. –Each argument type in the source is mapped to exactly one argument type in the target. Each mapped clause is evaluated by measuring its WPLL for the target data, and the most accurate mapping is kept.

33 Predicate Mapping Example Publication(title, person) Movie(name, person) AdvisedBy(person, person) WorkedFor(person, person) Consistent Type Mapping: title → name person → person

34 Predicate Mapping Example 2 Publication(title, person) Gender(person, gend) AdvisedBy(person, person) SameGender(gend, gend) Consistent Type Mapping: title → person person → gend

35 TAMAR (Transfer via Automatic Mapping And Revision) Target (IMDB) Data M-TAMAR R-TAMAR Movie(T,A)  WorkedFor(A,B) → Movie(T,B) Movie(T,A)  WorkedFor(A,B)  Relative(A,B) → Movie(T,B) Publication(T,A)  AdvisedBy(A,B) → Publication(T,B) (clause from UW-CSE)

36 Transfer Learning as Revision Regard mapped source MLN as an approximate model for the target task that needs to be accurately and efficiently revised. Thus our general approach is similar to that taken by theory revision systems (Richards & Mooney, 1995). Revisions are proposed in a bottom-up fashion. Source MLN Top-down Source MLN Target Training Data Data-Driven Revision Proposer Bottom-up

37 R-TAMAR Relational Data Clause 1 wt Clause 2 wt Clause 3 wt Clause 4 wt Clause 5 wt Self- diagnosis Change in WPLL 1.7 Clause 1 wt Clause 2 wt Clause 3 wt Clause 4 wt Clause 5 wt Too Long Too Short Good Too Long Good Directed Beam Search New Candidate Clauses New Clause 1 wt New Clause 3 wt New Clause 5 wt New clause discovery New Clause 6 wt New Clause 7 wt

38 R-TAMAR: Self-Diagnosis Use mapped source MLN to make inferences in the target and observe the behavior of each clause: –Consider each predicate P in the domain in turn. –Use Gibbs sampling to infer truth values for the gliterals of P, using the remaining gliterals as evidence. –Bin the clauses containing gliterals of P based on whether they behave as desired. Revisions are focused only on clauses in the “Bad” bins.

39 Self-Diagnosis: Clause Bins Actor(brando) Director(coppola) Movie(godFather, brando) Movie(godFather, coppola) Movie(rainMaker, coppola) WorkedFor(brando, coppola) Current gliteral: Actor(brando) – Relevant;Good

40 Self-Diagnosis: Clause Bins Actor(brando) Director(coppola) Movie(godFather, brando) Movie(godFather, coppola) Movie(rainMaker, coppola) WorkedFor(brando, coppola) Current gliteral: Actor(brando) – Relevant;Good – Relevant;Bad

41 Self-Diagnosis: Clause Bins Actor(brando) Director(coppola) Movie(godFather, brando) Movie(godFather, coppola) Movie(rainMaker, coppola) WorkedFor(brando, coppola) Current gliteral: Actor(brando) – Relevant;Good – Relevant;Bad – Irrelevant;Good

42 Self-Diagnosis: Clause Bins Actor(brando) Director(coppola) Movie(godFather, brando) Movie(godFather, coppola) Movie(rainMaker, coppola) WorkedFor(brando, coppola) Current gliteral: Actor(brando) – Relevant;Good – Relevant;Bad – Irrelevant;Good – Irrelevant;Bad Lengthen Shorten

43 Structure Revisions Using directed beam search: –Literal deletions attempted only from clauses marked for shortening. –Literal additions attempted only for clauses marked for lengthening. Training is much faster since search space is constrained by: 1.Limiting the clauses considered for updates. 2.Restricting the type of updates allowed.

44 New Clause Discovery Uses Relational Pathfinding (Richards & Mooney, 1992) brandocoppola godFatherrainMaker WorkedFor Movie Actor(brando) Director(coppola) Movie(godFather, brando) Movie(godFather, coppola) Movie(rainMaker, coppola) WorkedFor(brando, coppola) WorkedFor

45 Weight Revision Target (IMDB) Data M-TAMAR R-TAMAR Movie(T,A)  WorkedFor(A,B) → Movie(T,B) Movie(T,A)  WorkedFor(A,B)  Relative(A,B) → Movie(T,B) Publication(T,A)  AdvisedBy(A,B) → Publication(T,B) MLN Weight Training 0.8 Movie(T,A)  WorkedFor(A,B)  Relative(A,B) → Movie(T,B)

46 Experiments: Domains UW-CSE –Data about members of the UW CSE department –Predicates include Professor, Student, AdvisedBy, TaughtBy, Publication, etc. IMDB –Data about 20 movies –Predicates include Actor, Director, Movie, WorkedFor, Genre, etc. WebKB –Entity relations from the original WebKB domain (Craven et al. 1998) –Predicates include Faculty, Student, Project, CourseTA, etc.

47 Dataset Statistics Data is organized as mega-examples Each mega-example contains information about a group of related entities. Mega-examples are independent and disconnected from each other. Data Set# Mega- Examples # Constants # Types # Predicates # True Gliterals Total # Gliterals IMDB ,54032,615 UW-CSE 51, ,673678,899 WebKB 41,700362,065688,193

Manually Developed Source KB UW-KB is a hand-built knowledge base (set of clauses) for the UW-CSE domain. When used as a source domain, transfer learning is a form of theory refinement that also includes mapping to a new domain with a different representation. 48

49 Systems Compared TAMAR: Complete transfer system. ScrKD: Algorithm of Kok & Domingos (2005) learning from scratch. TrKD: Algorithm of Kok & Domingos (2005) performing transfer, using M-TAMAR to produce a mapping.

50 Methodology: Training & Testing Generated learning curves using leave-one-out CV –Each run keeps one mega-example for testing and trains on the remaining ones, provided one by one. –Curves are averages over all runs. Evaluated learned MLN by performing inference for all gliterals of each predicate in turn, providing the rest as evidence, and averaging the results.

51 Methodology: Metrics Kok & Domingos (2005) CLL: Conditional Log Likelihood –The log of the probability predicted by the model that a gliteral has the correct truth value given in the data. –Averaged over all test gliterals. AUC: Area under the precision recall (PR) curve –Produce a PR curve by varying the probability threshold. –Compute the area under this curve.

52 Metrics to Summarize Curves Transfer Ratio (Cohen et al. 2007) –Gives overall idea of improvement achieved over learning from scratch

53 Transfer Scenarios Source/target pairs tested: –WebKB  IMDB –UW-CSE  IMDB –UW-KB  IMDB –WebKB  UW-CSE –IMDB  UW-CSE WebKB not used as a target since one mega-example is sufficient to learn an accurate theory for its limited predicate set.

54 Positive Transfer Negative Transfer

55 Positive Transfer Negative Transfer

56 Sample Learning Curve ScrKD TrKD, Hand Mapping TAMAR, Hand Mapping TrKD TAMAR

Number of seconds to find best mapping:

Future Research Issues More realistic application domains. Application to other SRL models (e.g. SLPs, BLPs). More flexible predicate mapping –Allow argument ordering or arity to change. –Map 1 predicate to conjunction of > 1 predicates AdvisedBy(X,Y)  Movie(M,X)  Director(M,Y) 58

Multiple Source Transfer Transfer from multiple source problems to a given target problem. Determine which clauses to map and revise from different source MLNs. 59

Source Selection Select useful source domains from a large number of previously learned tasks. Ideally, picking source domain(s) is sub- linear in the number of previously learned tasks. 60

61 Conclusions Presented TAMAR, a complete transfer system for SRL that: –Maps relational knowledge in the source to the target domain. –Revises the mapped knowledge to further improve accuracy. Showed experimentally that TAMAR improves speed and accuracy over existing methods.

Questions? Related papers at: 62

63 Why MLNs? Inherit the expressivity of first-order logic –Can apply insights from ILP Inherit the flexibility of probabilistic graphical models –Can deal with noisy & uncertain environments Undirected models –Do not need to learn causal directions Subsume all other SRL models that are special cases of first-order logic or probabilistic graphical models [Richardson 04] Publicly available software package: Alchemy

64 Predicate Mapping Comments A particular source predicate can be mapped to different target predicates in different clauses. –This makes our approach context sensitive. More scalable. –In the worst-case, the number of mappings is exponential in the number of predicates. –The number of predicates in a clause is generally much smaller than the total number of predicates in a domain.

65 Relationship to Structure Mapping Engine ( Falkenheiner et al., 1989) A system for mapping relations using analogy based on a psychological theory. Mappings are evaluated based only on the structural relational similarity between the two domains. Does not consider the accuracy of mapped knowledge in the target when determining the preferred mapping. Determines a single global mapping for a given source & target.

66 Summary of Methodology 1.Learn MLNs for each point on learning curve 2.Perform inference over learned models 3.Summarize inference results using 2 metrics: CLL and AUC, thus producing two learning curves 4.Summarize each learning curve using transfer ratio and percentage improvement from one mega-example

67

68