Download presentation
Presentation is loading. Please wait.
Published bySydney Gardner Modified over 8 years ago
1
1 Statistical Relational Learning for Knowledge Extraction from the Web Hoifung Poon Dept. of Computer Science & Eng. University of Washington 1
2
22 “Drowning in Information, Starved for Knowledge” 2 WWW 2
3
3 Great Vision: Knowledge Extraction from Web Also need: Knowledge representation and reasoning Close the loop: Apply knowledge to extraction Machine reading [Etzioni et al., 2007] Craven et al., “Learning to Construct Knowledge Bases from the World Wide Web," Artificial Intelligence, 1999. 3
4
44 Machine Reading: Text Knowledge 4 …… 4
5
5 Rapidly Growing Interest AAAI-07 Spring Symposium on Machine Reading DARPA Machine Reading Program (2009-2014) NAACL-10 Workshop on Learning By Reading Etc. 5
6
6 Great Impact Scientific inquiry and commercial applications Literature-based discovery, robot scientists Question answering, semantic search Drug design, medical diagnosis Breach knowledge acquisition bottleneck for AI and natural language understanding Automatically semantify the Web Etc. 6
7
7 This Talk Statistical relational learning offers promising solutions to machine reading Markov logic is a leading unifying framework A success story: USP Unsupervised, end-to-end machine reading Extracts five times as many correct answers as state of the art, with highest accuracy of 91% 7
8
88 USP: Question-Answer Example Q: What does IL-2 control? A: The DEX-mediated IkappaBalpha induction Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells. 8
9
999 Overview Machine reading: Challenges Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions 9
10
10 Key Challenges Complexity Uncertainty Pipeline accumulates errors Supervision is scarce 10
11
11 Languages Are Structural IL-4 induces CD11B Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41...... George Walker Bush was the 43 rd President of the United States. …… Bush was the eldest son of President G. H. W. Bush and Babara Bush. ……. In November 1977, he met Laura Welch at a barbecue. 11 governments lm$pxtm (Hebrew: according to their families)
12
12 Languages Are Structural govern-ment-s l-m$px-t-m (Hebrew: according to their families) S VNP VP IL-4 induces CD11B Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41...... involvement up-regulation IL-10 human monocyte SiteThemeCause gp41p70(S6)-kinase activation Theme Cause Theme George Walker Bush was the 43 rd President of the United States. …… Bush was the eldest son of President G. H. W. Bush and Babara Bush. ……. In November 1977, he met Laura Welch at a barbecue. 12
13
13 Knowledge Is Heterogeneous Individuals E.g.: Socrates is a man Types E.g.: Man is mortal Inference rules E.g.: Syllogism Ontological relations Etc. 13 MAMMAL HUMAN ISA FACE EYE ISPART
14
14 Complexity Can handle using first-order logic Trees, graphs, dependencies, hierarchies, etc. easily expressed Inference algorithms (satisfiability testing, theorem proving, etc.) But … logic is brittle with uncertainty
15
15 G. W. Bush …… …… Laura Bush …… Mrs. Bush …… Languages Are Ambiguous I saw the man with the telescope NP ADVP I saw the man with the telescope Here in London, Frances Deek is a retired teacher … In the Israeli town …, Karen London says … Now London says … London PERSON or LOCATION? Microsoft buys Powerset Microsoft acquires Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, … …… Which one? 15
16
16 Knowledge Has Uncertainty We need to model correlations Our information is always incomplete Our predictions are uncertain
17
17 Uncertainty Statistics provides the tools to handle this Mixture models Hidden Markov models Bayesian networks Markov random fields Maximum entropy models Conditional random fields Etc. But … statistical models assume i.i.d. data (independently and identically distributed) objects feature vectors
18
18 Pipeline is Suboptimal E.g., NLP pipeline: Tokenization Morphology Chunking Syntax … Accumulates and propagates errors Wanted: Joint inference Across all processing stages Among all interdependent objects 18
19
19 Supervision is Scarce Tons of text … but most is not annotated Labeling is expensive (Cf. Penn-Treebank) Need to leverage indirect supervision 19
20
20 Redundancy Key source of indirect supervision State-of-the-art systems depend on this E.g., TextRunner [Banko et al., 2007] But … Web is heterogeneous: Long tail Redundancy only present in head regime
21
21 Overview Machine reading: Challenges Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions 21
22
22 Statistical Relational Learning Burgeoning field in machine learning Offers promising solutions for machine reading Unify statistical and logical approaches Replace pipeline with joint inference Principled framework to leverage both direct and indirect supervision 22
23
23 Machine Reading: A Vision Challenge: Long tail
24
24 Machine Reading: A Vision
25
25 Challenges in Applying Statistical Relational Learning Learning is much harder Inference becomes a crucial issue Greater complexity for user
26
26 Progress to Date Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction [Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Markov logic [Domingos & Lowd, 2009] Etc.
27
27 Progress to Date Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction [Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Markov logic [Domingos & Lowd, 2009] Etc. Leading unifying framework
28
28 Overview Machine reading Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions 28
29
29 Markov Networks Undirected graphical models Log-linear model: Weight of Feature iFeature i Cancer CoughAsthma Smoking 29
30
30 First-Order Logic Constants, variables, functions, predicates E.g.: Anna, x, MotherOf(x), Friends(x,y) Grounding: Replace all variables by constants E.g.: Friends (Anna, Bob) World (model, interpretation): Assignment of truth values to all ground predicates 30
31
31 Markov Logic Intuition: Soften logical constraints Syntax: Weighted first-order formulas Semantics: Feature templates for Markov networks A Markov Logic Network (MLN) is a set of pairs ( F i, w i ) where F i is a formula in first-order logic w i is a real number Number of true groundings of F i 31
32
32 Example: Friends & Smokers 32
33
33 Example: Friends & Smokers 33
34
34 Example: Friends & Smokers 34
35
35 Example: Friends & Smokers Cancer(A) Smokes(A)Friends(A,A) Friends(B,A) Smokes(B) Friends(A,B) Cancer(B) Friends(B,B) Two constants: Anna (A) and Bob (B) Probabilistic graphical models and first-order logic are special cases 35
36
36 MLN Algorithms: The First Three Generations ProblemFirst generation Second generation Third generation MAP inference Weighted satisfiability Lazy inference Cutting planes Marginal inference Gibbs sampling MC-SATLifted inference Weight learning Pseudo- likelihood Voted perceptron Scaled conj. gradient Structure learning Inductive logic progr. ILP + PL (etc.) Clustering + pathfinding 36
37
37 Efficient Inference Logical or statistical inference already hard But … can do approximate inference Suffice to perform well in most cases Combine ideas from both camps E.g., MC-SAT MCMC SAT solver Can also leverage sparsity in relational domains More: Poon & Domingos, “Sound and Efficient Inference with Probabilistic and Deterministic Dependencies”, in Proc. AAAI-2006. 37 More: Poon, Domingos & Sumner, “A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC”, in Proc. AAAI-2008.
38
38 Weight Learning Probability model P(X) X: Observable in training data Maximize likelihood of observed data Regularization to prevent overfitting
39
39 Weight Learning No. of times clause i is true in data Expected no. times clause i is true according to MLN 39 Gradient descent Use MC-SAT for inference Can also leverage second-order information [Lowd & Domingos, 2007] Requires inference
40
40 Unsupervised Learning: How? I.I.D. learning: Sophisticated model requires more labeled data Statistical relational learning: Sophisticated model may require less labeled data Ambiguities vary among objects Joint inference Propagate information from unambiguous objects to ambiguous ones One formula is worth a thousand labels Small amount of domain knowledge large-scale joint inference 40
41
41 Unsupervised Weight Learning Probability model P(X,Z) X: Observed in training data Z: Hidden variables E.g., clustering with mixture models Z : Cluster assignment X : Observed features Maximize likelihood of observed data by summing out hidden variables Z
42
42 Unsupervised Weight Learning Sum over z, conditioned on observed x Summed over both x and z More: Poon, Cherry, & Toutanova, “Unsupervised Morphological Segmentation with Log-Linear Models”, in Proc. NAACL-2009. Best Paper Award 42 Gradient descent Use MC-SAT to compute both expectations May also combine with contrastive estimation
43
43 Markov Logic Unified inference and learning algorithms Can handle millions of variables, billions of features, ten of thousands of parameters Easy-to-use software: Alchemy Many successful applications E.g.: Information extraction, coreference resolution, semantic parsing, ontology induction 43
44
44 Pipeline Joint Inference Combine segmentation and entity resolution for information extraction Extract complex and nested bio-events from PubMed abstracts More: Poon & Domingos, “Joint Inference for Information Extraction”, in Proc. AAAI-2007. More: Poon & Vanderwende, “Joint Inference for Knowledge Extraction from Biomedical Literature”, in Proc. NAACL-2010. 44
45
45 Unsupervised Learning: Example Coreference resolution: Accuracy comparable to previous supervised state of the art More: Poon & Domingos, “Joint Unsupervised Coreference Resolution with Markov Logic”, in Proc. EMNLP-2008. 45
46
46 Overview Machine reading: Challenges Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions 46
47
47 Unsupervised Semantic Parsing USP [Poon & Domingos, EMNLP-09] First unsupervised approach for semantic parsing End-to-end machine reading system Read text, answer questions OntoUSP USP Ontology Induction [Poon & Domingos, ACL-10] Encoded in a few Markov logic formulas Best Paper Award 47
48
48 Semantic Parsing Microsoft buys Powerset BUY(MICROSOFT,POWERSET) Goal Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, … Challenge 48
49
49 Limitations of Existing Approaches Manual grammar or supervised learning Applicable to restricted domains only For general text Not clear what predicates and objects to use Hard to produce consistent meaning annotation Also, often learn both syntax and semantics Fail to leverage advanced syntactic parsers Make semantic parsing harder
50
50 USP: Key Idea # 1 Target predicates and objects can be learned Viewed as clusters of syntactic or lexical variations of the same meaning BUY(-,-) buys, acquires, ’s purchase of, … Cluster of various expressions for acquisition MICROSOFT Microsoft, the Redmond software giant, … Cluster of various mentions of Microsoft
51
51 USP: Key Idea # 2 Relational clustering Cluster relations with same objects USP Recursively cluster arbitrary expressions with similar subexpressions Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, …
52
52 USP: Key Idea # 2 Relational clustering Cluster relations with same objects USP Recursively cluster arbitrary expressions with similar subexpressions Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, … Cluster same forms at the atom level
53
53 USP: Key Idea # 2 Relational clustering Cluster relations with same objects USP Recursively cluster arbitrary expressions with similar subexpressions Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, … Cluster forms in composition with same forms
54
54 USP: Key Idea # 2 Relational clustering Cluster relations with same objects USP Recursively cluster arbitrary expressions with similar subexpressions Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, … Cluster forms in composition with same forms
55
55 USP: Key Idea # 2 Relational clustering Cluster relations with same objects USP Recursively cluster arbitrary expressions with similar subexpressions Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, … Cluster forms in composition with same forms
56
56 USP: Key Idea # 3 Start directly from syntactic analyses Focus on translating them to semantics Leverage rapid progress in syntactic parsing Much easier than learning both
57
57 Joint Inference in USP Forms canonical meaning representation by recursively clustering synonymous expressions Text Logical form in this representation Induces ISA hierarchy among clusters and applies hierarchical smoothing (shrinkage) 57
58
58 USP: System Overview Input: Dependency trees for sentences Converts dependency trees into quasi-logical forms (QLFs) Starts with QLF clusters at atom level Recursively builds up clusters of larger forms Output: Probability distribution over QLF clusters and their composition MAP semantic parses of sentences 58
59
59 Generating Quasi-Logical Forms buys Microsoft Powerset nsubjdobj Convert each node into an unary atom 59
60
60 Generating Quasi-Logical Forms nsubjdobj n 1, n 2, n 3 are Skolem constants buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) 60
61
61 Generating Quasi-Logical Forms nsubjdobj Convert each edge into a binary atom buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) 61
62
62 Generating Quasi-Logical Forms Convert each edge into a binary atom buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) nsubj(n 1,n 2 )dobj(n 1,n 3 ) 62
63
63 A Semantic Parse buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) nsubj(n 1,n 2 )dobj(n 1,n 3 ) Partition QLF into subformulas 63
64
64 A Semantic Parse buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) nsubj(n 1,n 2 )dobj(n 1,n 3 ) Subformula Lambda form: Replace Skolem constant not in unary atom with a unique lambda variable 64
65
65 A Semantic Parse buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) λx 2.nsubj(n 1,x 2 ) Subformula Lambda form: Replace Skolem constant not in unary atom with a unique lambda variable λx 3.dobj(n 1,x 3 ) 65
66
66 A Semantic Parse buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) λx 2.nsubj(n 1,x 2 ) Core form: No lambda variable Argument form: One lambda variable λx 3.dobj(n 1,x 3 ) Core form Argument form 66
67
67 A Semantic Parse buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) λx 2.nsubj(n 1,x 2 ) Assign subformula to object cluster λx 3.dobj(n 1,x 3 ) BUY MICROSOFT POWERSET 67
68
68 Object Cluster: BUY buys(n 1 ) Distribution over core forms 0.1 acquires(n 1 ) 0.2 …… One formula in MLN Learn weights for each pair of cluster and core form 68
69
69 Object Cluster: BUY buys(n 1 ) May contain variable number of property clusters 0.1 acquires(n 1 ) 0.2 …… BUYER BOUGHT PRICE …… 69
70
70 Property Cluster: BUYER λx 2.nsubj(n 1,x 2 ) Distributions over argument forms, clusters, and number 0.5 0.4 …… MICROSOFT 0.2 GOOGLE 0.1 …… Zero 0.1 One 0.8 …… λx 2.agent(n 1,x 2 ) 70 Three MLN formulas
71
71 Probabilistic Model 71 Exponential prior on number of parameters Cluster mixtures: Object Cluster: BUY buys0.1 acquires0.4 … …… Property Cluster: BUYER 0.5 0.4 … MICROSOFT0.2 GOOGLE0.1 … Zero 0.1 One 0.8 … nsubj agent 71
72
72 Probabilistic Model 72 Exponential prior on number of parameters Cluster mixtures with hierarchical smoothing: Object Cluster: BUY buys0.1 acquires0.4 … …… Property Cluster: BUYER 0.5 0.4 … MICROSOFT0.2 GOOGLE0.1 … Zero 0.1 One 0.8 … nsubj agent E.g., picking MICROSOFT as BUYER argument depends not only on BUY, but also on its ISA ancestors 72
73
73 Abstract Lambda Form buys(n 1 ) λx 2.nsubj(n 1,x 2 ) λx 3.dobj(n 1,x 3 ) BUYS (n 1 ) λx 2. BUYER (n 1,x 2 ) λx 3. BOUGHT (n 1,x 3 ) Final logical form is obtained via lambda reduction 73
74
74 Challenge: State Space Too Large Potential cluster number exp( token-number ) Also, meaning units and clusters often small Use combinatorial search 74
75
75 Inference: Find MAP Parse Initialize Search Operator Lambda reduction induces protein CD11B nsubjdobj IL-4 nn protein IL-4 nn protein IL-4 nn 75
76
76 Learning: Greedily Maximize Posterior enhances 1.0 induces 1.0 MERGE COMPOSE amino acid1.0 induces0.2 enhances0.8 …… Initialize Search Operators enhances 1.0 induces 1.0 acid 1.0 amino 1.0 acid 1.0 amino 1.0 76
77
77 Operator: Abstract induces 0.3 0.1 … enhances ISA inhibits 0.2 suppresses0.1 induces0.6 up-regulates 0.2 … INDUCE INHIBIT inhibits0.4 0.2 … suppresses INHIBIT inhibits0.4 0.2 … suppresses induces0.6 up-regulates 0.2 … INDUCE MERGE with REGULATE ? Captures substantial similarities 77
78
78 Experiments Apply to machine reading: Extract knowledge from text and answer questions Evaluation: Number of answers and accuracy GENIA dataset: 1999 Pubmed abstracts Use simple factoid questions, e.g.: What does anti-STAT1 inhibit? What regulates MIP-1 alpha? 78
79
79 Total and Correct Answers KW-SYNTextRunnerRESOLVERDIRTUSP USP extracted five times as many correct answers as TextRunner Highest precision of 91% 79
80
80 Qualitative Analysis Resolve many nontrivial variations Argument forms that mean the same, e.g., expression of X X expression X stimulates Y Y is stimulated with X Active vs. passive voices Synonymous expressions Etc. 80
81
81 Clusters And Compositions Clusters in core forms investigate, examine, evaluate, analyze, study, assay diminish, reduce, decrease, attenuate synthesis, production, secretion, release dramatically, substantially, significantly …… Compositions amino acid, t cell, immune response, transcription factor, initiation site, binding site … 81
82
82 Question-Answer Example Q: What does IL-2 control? A: The DEX-mediated IkappaBalpha induction Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells. 82
83
83 Overview Machine reading Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions 83
84
84 Web-Scale Joint Inference Challenge: Efficiently identify the relevant Key: Induce and leverage an ontology Ontology Capture essential properties & Abstract away unimportant variations Upper-level nodes Skip irrelevant branches Wanted: Combine the following Probabilistic ontology induction (e.g., USP) Coarse-to-fine learning and inference [Felzenszwalb & McAllester, 2007; Petrov, Ph.D. Thesis] 84
85
85 Knowledge Reasoning Most facts/rules are not explicitly stated “Dark matter” in the natural language universe kale contains calcium calcium prevent osteoporosis kale prevents osteoporosis Keys: Induce generic reasoning patterns Incorporate reasoning in extraction Additional sources of indirect supervision 85
86
86 Harness Social Computing Bootstrap online community Knowledge Base 86
87
87 Harness Social Computing Bootstrap online community Incorporate human & end tasks in the loop “Tell me everything about dicer applied to synapse …” 87 Knowledge Base
88
88 Harness Social Computing Bootstrap online community Incorporate human & end tasks in the loop “Your extraction from my paper is correct except for blah …” 88 Knowledge Base
89
89 Harness Social Computing Bootstrap online community Incorporate human & end tasks in the loop Form positive feedback loop 89 Knowledge Base
90
90 Acknowledgments Pedro Domingos, Colin Cherry, Kristina Toutanova, Lucy Vanderwende, Oren Etzioni, Dan Weld, Matt Richardson, Parag Singla, Stanley Kok, Daniel Lowd, Marc Sumner ARO, AFRL, ONR, DARPA, NSF 90
91
91 Summary Statistical relational learning offers promising solutions for machine reading Markov logic provides a language for this Syntax: Weighted first-order logical formulas Semantics: Feature templates of Markov nets Open-source software: Alchemy A success story: USP Three key research directions alchemy.cs.washington.edu alchemy.cs.washington.edu/papers/poon09 91
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.