Learning to “Read Between the Lines” using Bayesian Logic Programs Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku The University of Texas at Austin July.

Learning to “Read Between the Lines” using Bayesian Logic Programs Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku The University of Texas at Austin July 2012 1

Information Extraction Information extraction (IE) systems extract factual information that occurs in text [Cowie and Lenhert, 1996; Sarawagi, 2008] Natural language text is typically “incomplete” –Commonsense information is not explicitly stated –Easily inferred facts are omitted from the text Human readers use commonsense knowledge and “read between the lines” to infer implicit information IE systems have no access to commonsense knowledge and hence cannot infer implicit information 2

Example Natural language text “Barack Obama is the President of the United States of America.” Query “Barack Obama is the citizen of what country?” IE systems cannot answer this query since citizenship information is not explicitly stated! 3

Objective Infer implicit facts from explicitly stated information – Extract explicitly stated facts using an IE system – Learn common sense knowledge in the form of logical rules to deduce additional facts – Employ models from statistical relational learning (SRL) that allow probabilities to be estimated using well-founded probabilistic graphical models 4

Related Work Learning propositional rules [Nahm and Mooney, 2000] –Learn propositional rules from the output of an IE system on computer-related job postings –Perform logical deduction to infer new facts –Purely logical deduction is brittle Cannot assign probabilities or confidence estimates to inferences 5

Related Work Learning first-order rules –Logical deduction using probabilistic rules [Carlson et al., 2010; Doppa et al., 2010] Modify existing rule learners like FOIL and FARMER to learn probabilistic rules Probabilities are not computed using well-founded probabilistic graphical models –Use Markov Logic Networks (MLNs) [Domingos and Lowd, 2009] based approaches to infer additional facts [Schoenmackers et al., 2010; Sorower et al., 2011] Grounding process could result in intractably large networks for large domains 6

Related Work Learning for Textual Entailment [Lin and Pantel, 2001; Yates and Etzioni, 2007; Berant et al., 2011] – Textual entailment rules have a single antecedent in the body of the rule – Approaches from statistical relational learning have not been applied so far – Do not use extractions from a traditional IE system to learn rules 7

Our Approach Use an off-the shelf IE system to extract facts Learn commonsense knowledge from the extracted facts in the form of probabilistic first-order-rules Infer additional facts based on the learned rules using Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] 8

System Architecture Training Documents Information Extractor (IBM SIRE) Extracted Facts Inductive Logic Programming (LIME) First-Order Logical Rules BLP Weight Learner (version of EM) Bayesian Logic Program (BLP) BLP Inference Engine Test Document Extractions Inferences with probabilities 9............ Barack Obama is the current President of USA……. Obama was born on August 4, 1961, in Hawaii, USA............. nationState(USA) Person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) nationState(B) ∧ isLedBy(B,A)  hasCitizenship(A,B) nationState(B) ∧ employs(B,A)  hasCitizenship(A,B) hasCitizenship(A,B) | nationState(B), isLedBy(B,A).9 hasCitizenship(A,B) | nationState(B), employs(B,A).6 nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad) employs(malaysian,mahatir-mohamad) hasCitizenship(mahathir-mohamad, malaysian) 0.75

Bayesian Logic Programs [Kersting and De Raedt, 2001] Set of Bayesian clauses a | a 1,a 2,....,a n –Definite clauses in first-order logic, universally quantified –Head of the clause - a –Body of the clause - a 1, a 2, …, a n –Associated conditional probability table (CPT) P(head | body) Bayesian predicates a, a 1, a 2, …, a n have finite domains –Combining rule like noisy-or for mapping multiple CPTs into a single CPT Given a set of Bayesian clauses and a query, SLD resolution is used to construct ground Bayesian networks for probabilistic inference 10

Why BLPs? Pure logical deduction is brittle and results in many undifferentiated inferences Inference in BLPs is probabilistic, i.e. inferences are assigned probabilities – Probabilities can be used to select only high- confidence inferences Efficient grounding mechanism in BLPs enables our approach to scale 11

Inductive Logic Programming (ILP) for learning first-order rules ILP Rule Learner ILP Rule Learner Target relation hasCitizenship(X,Y) Positive instances hasCitizenship (BarackObama, USA) hasCitizenship (GeorgeBush, USA) hasCitizenship (IndiraGandhi,India). Negative instances hasCitizenship (BarackObama, India) hasCitizenship (GeorgeBush, India) hasCitizenship (IndiraGandhi,USA). KB hasBirthPlace(BarackObama,USA) person(BarackObama) nationState(USA) nationState(India). Rules nationState(Y) ∧ isLedBy(Y,X)  hasCitizenship (X,Y). Rules nationState(Y) ∧ isLedBy(Y,X)  hasCitizenship (X,Y). Generated using closed- world assumption 12

Inference using BLPs Test document “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor.” Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad) employs(malaysian,mahatir-mohamad) Learned rules nationState(B) ∧ isLedBy(B,A)  hasCitizenship(A,B) nationState(B) ∧ employs(B,A)  hasCitizenship(A,B) 13

Logical Inference in BLPs Rule 1 nationState(B) ∧ isLedBy(B,A)  hasCitizenship(A,B) nationState(malaysian) isLedBy(malaysian,mahathir-mohamad) hasCitizenship(mahathir-mohamad, malaysian) 14

Logical Inference in BLPs Rule 2 nationState(B) ∧ employs(B,A)  hasCitizenship(A,B) nationState(malaysian) employs(malaysian,mahathir-mohamad) hasCitizenship(mahathir-mohamad, malaysian) 15

Probabilistic inference in BLPs nationState (malaysian) isLedBy (malaysian, mahathir- mohamad) --- --- --- --- Logical And employs (malaysian, mahathir- mohamad) dummy1 dummy2 hasCitizenship (mahathir-mohamad, malaysian) Marginal Probability ?? --- --- --- --- Logical And --- --- --- --- Noisy Or 16

Sample rules learned governmentOrganization(A) ∧ employs(A,B)  hasMember(A,B) eventLocation(A,B) ∧ bombing(A)  thingPhysicallyDamage(A,B) isLedBy(A,B)  hasMemberPerson(A,B) 17

Experimental Evaluation Data –DARPA’s intelligence community (IC) data set from the Machine Reading Project (MRP) –Consists of news articles on politics, terrorism, and other international events –10,000 documents in total Perform 10-fold cross validation 18

Experimental Evaluation Learning first-order rules using LIME [McCreath and Sharma, 1998] –Learn rules for 13 target relations –Learn rules using both positive and negative instances and using only positive instances –Include all unique rules learned from different models Learning BLP parameters –Learn noisy-or parameters using Expectation Maximization (EM) –Set priors to maximum likelihood estimates 19

Experimental Evaluation Performance evaluation – Manually evaluated inferred facts from 40 documents, randomly selected from each test set – Compute two precision scores Unadjusted (UA) – does not account for extractor’s mistakes Adjusted (AD) – account for extractor’s mistakes – Rank inferences using marginal probabilities and evaluate top-n 20

Experimental Evaluation Systems compared – BLP Learned Weights Noisy-or parameters learned using online EM – BLP Manual Weights Noisy-or parameters set to 0.9 – Logical Deduction – MLN Learned Weights Learn weights using generative online weight learner – MLN Manual Weights Assign a weight of 10 to all rules and MLE priors to all predicates 21

Unadjusted Precision 22

Adjusted Precision 23

Future Work Improve the performance of weight learning for BLPs and MLNs – Learn parameters on larger data sets Improve performance of MLNs – Use open-world assumption for learning – Add constraints required to prevent inference of facts like employs(a,a) – Specialize types that do not have strictly defined types Develop an online rule learner that can learn rules from uncertain training data 24

Conclusions Efficient learning of probabilistic first-order rules that represent common sense knowledge using extractions from an IE system Inference of implicitly stated facts with high precision using BLPs Superior performance of BLPs over purely logical deduction and MLNs 25

Questions?? 26

Back Up 27

Results for Logical Deduction UAAD Precision29.73 (443/1490) 35.24 (443/1257) 28

Experimental Evaluation Learning BLP parameters –Use logical-and model to combine evidence from the conjuncts in the body of the clause –Use noisy-or model to combine evidence from several ground rules that have the same head –Learn noisy-or parameters using Expectation Maximization (EM) –Set priors to maximum likelihood estimates 29

Learning to “Read Between the Lines” using Bayesian Logic Programs Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku The University of Texas at Austin July.

Similar presentations

Presentation on theme: "Learning to “Read Between the Lines” using Bayesian Logic Programs Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku The University of Texas at Austin July."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning to “Read Between the Lines” using Bayesian Logic Programs Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku The University of Texas at Austin July.

Similar presentations

Presentation on theme: "Learning to “Read Between the Lines” using Bayesian Logic Programs Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku The University of Texas at Austin July."— Presentation transcript:

Similar presentations

About project

Feedback