1 Statistical Relational Learning for Knowledge Extraction from the Web Hoifung Poon Dept. of Computer Science & Eng. University of Washington 1.

1 Statistical Relational Learning for Knowledge Extraction from the Web Hoifung Poon Dept. of Computer Science & Eng. University of Washington 1

22 “Drowning in Information, Starved for Knowledge” 2 WWW 2

3 Great Vision: Knowledge Extraction from Web Also need: Knowledge representation and reasoning Close the loop: Apply knowledge to extraction Machine reading [Etzioni et al., 2007] Craven et al., “Learning to Construct Knowledge Bases from the World Wide Web," Artificial Intelligence, 1999. 3

44 Machine Reading: Text  Knowledge 4 …… 4

5 Rapidly Growing Interest AAAI-07 Spring Symposium on Machine Reading DARPA Machine Reading Program (2009-2014) NAACL-10 Workshop on Learning By Reading Etc. 5

6 Great Impact Scientific inquiry and commercial applications Literature-based discovery, robot scientists Question answering, semantic search Drug design, medical diagnosis Breach knowledge acquisition bottleneck for AI and natural language understanding Automatically semantify the Web Etc. 6

7 This Talk Statistical relational learning offers promising solutions to machine reading Markov logic is a leading unifying framework A success story: USP Unsupervised, end-to-end machine reading Extracts five times as many correct answers as state of the art, with highest accuracy of 91% 7

88 USP: Question-Answer Example Q: What does IL-2 control? A: The DEX-mediated IkappaBalpha induction Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells. 8

999 Overview Machine reading: Challenges Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions 9

10 Key Challenges Complexity Uncertainty Pipeline accumulates errors Supervision is scarce 10

11 Languages Are Structural IL-4 induces CD11B Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41...... George Walker Bush was the 43 rd President of the United States. …… Bush was the eldest son of President G. H. W. Bush and Babara Bush. ……. In November 1977, he met Laura Welch at a barbecue. 11 governments lm$pxtm (Hebrew: according to their families)

12 Languages Are Structural govern-ment-s l-m$px-t-m (Hebrew: according to their families) S VNP VP IL-4 induces CD11B Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41...... involvement up-regulation IL-10 human monocyte SiteThemeCause gp41p70(S6)-kinase activation Theme Cause Theme George Walker Bush was the 43 rd President of the United States. …… Bush was the eldest son of President G. H. W. Bush and Babara Bush. ……. In November 1977, he met Laura Welch at a barbecue. 12

13 Knowledge Is Heterogeneous Individuals E.g.: Socrates is a man Types E.g.: Man is mortal Inference rules E.g.: Syllogism Ontological relations Etc. 13 MAMMAL HUMAN ISA FACE EYE ISPART

14 Complexity Can handle using first-order logic Trees, graphs, dependencies, hierarchies, etc. easily expressed Inference algorithms (satisfiability testing, theorem proving, etc.) But … logic is brittle with uncertainty

15 G. W. Bush …… …… Laura Bush …… Mrs. Bush …… Languages Are Ambiguous I saw the man with the telescope NP ADVP I saw the man with the telescope Here in London, Frances Deek is a retired teacher … In the Israeli town …, Karen London says … Now London says … London  PERSON or LOCATION? Microsoft buys Powerset Microsoft acquires Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, … …… Which one? 15

16 Knowledge Has Uncertainty We need to model correlations Our information is always incomplete Our predictions are uncertain

17 Uncertainty Statistics provides the tools to handle this Mixture models Hidden Markov models Bayesian networks Markov random fields Maximum entropy models Conditional random fields Etc. But … statistical models assume i.i.d. data (independently and identically distributed) objects  feature vectors

18 Pipeline is Suboptimal E.g., NLP pipeline: Tokenization  Morphology  Chunking  Syntax  … Accumulates and propagates errors Wanted: Joint inference Across all processing stages Among all interdependent objects 18

19 Supervision is Scarce Tons of text … but most is not annotated Labeling is expensive (Cf. Penn-Treebank)  Need to leverage indirect supervision 19

20 Redundancy Key source of indirect supervision State-of-the-art systems depend on this E.g., TextRunner [Banko et al., 2007] But … Web is heterogeneous: Long tail Redundancy only present in head regime

22 Statistical Relational Learning Burgeoning field in machine learning Offers promising solutions for machine reading Unify statistical and logical approaches Replace pipeline with joint inference Principled framework to leverage both direct and indirect supervision 22

23 Machine Reading: A Vision Challenge: Long tail

24 Machine Reading: A Vision

25 Challenges in Applying Statistical Relational Learning Learning is much harder Inference becomes a crucial issue Greater complexity for user

26 Progress to Date Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction [Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Markov logic [Domingos & Lowd, 2009] Etc.

27 Progress to Date Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction [Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Markov logic [Domingos & Lowd, 2009] Etc. Leading unifying framework

28 Overview Machine reading Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions 28

29 Markov Networks Undirected graphical models Log-linear model: Weight of Feature iFeature i Cancer CoughAsthma Smoking 29

30 First-Order Logic Constants, variables, functions, predicates E.g.: Anna, x, MotherOf(x), Friends(x,y) Grounding: Replace all variables by constants E.g.: Friends (Anna, Bob) World (model, interpretation): Assignment of truth values to all ground predicates 30

31 Markov Logic Intuition: Soften logical constraints Syntax: Weighted first-order formulas Semantics: Feature templates for Markov networks A Markov Logic Network (MLN) is a set of pairs ( F i, w i ) where F i is a formula in first-order logic w i is a real number Number of true groundings of F i 31

32 Example: Friends & Smokers 32

35 Example: Friends & Smokers Cancer(A) Smokes(A)Friends(A,A) Friends(B,A) Smokes(B) Friends(A,B) Cancer(B) Friends(B,B) Two constants: Anna (A) and Bob (B) Probabilistic graphical models and first-order logic are special cases 35

36 MLN Algorithms: The First Three Generations ProblemFirst generation Second generation Third generation MAP inference Weighted satisfiability Lazy inference Cutting planes Marginal inference Gibbs sampling MC-SATLifted inference Weight learning Pseudo- likelihood Voted perceptron Scaled conj. gradient Structure learning Inductive logic progr. ILP + PL (etc.) Clustering + pathfinding 36

37 Efficient Inference Logical or statistical inference already hard But … can do approximate inference Suffice to perform well in most cases Combine ideas from both camps E.g., MC-SAT  MCMC  SAT solver Can also leverage sparsity in relational domains More: Poon & Domingos, “Sound and Efficient Inference with Probabilistic and Deterministic Dependencies”, in Proc. AAAI-2006. 37 More: Poon, Domingos & Sumner, “A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC”, in Proc. AAAI-2008.

38 Weight Learning Probability model P(X) X: Observable in training data Maximize likelihood of observed data Regularization to prevent overfitting

39 Weight Learning No. of times clause i is true in data Expected no. times clause i is true according to MLN 39 Gradient descent Use MC-SAT for inference Can also leverage second-order information [Lowd & Domingos, 2007] Requires inference

40 Unsupervised Learning: How? I.I.D. learning: Sophisticated model requires more labeled data Statistical relational learning: Sophisticated model may require less labeled data Ambiguities vary among objects Joint inference  Propagate information from unambiguous objects to ambiguous ones One formula is worth a thousand labels Small amount of domain knowledge  large-scale joint inference 40

41 Unsupervised Weight Learning Probability model P(X,Z) X: Observed in training data Z: Hidden variables E.g., clustering with mixture models Z : Cluster assignment X : Observed features Maximize likelihood of observed data by summing out hidden variables Z

42 Unsupervised Weight Learning Sum over z, conditioned on observed x Summed over both x and z More: Poon, Cherry, & Toutanova, “Unsupervised Morphological Segmentation with Log-Linear Models”, in Proc. NAACL-2009. Best Paper Award 42 Gradient descent Use MC-SAT to compute both expectations May also combine with contrastive estimation

43 Markov Logic Unified inference and learning algorithms  Can handle millions of variables, billions of features, ten of thousands of parameters Easy-to-use software: Alchemy Many successful applications E.g.: Information extraction, coreference resolution, semantic parsing, ontology induction 43

44 Pipeline  Joint Inference Combine segmentation and entity resolution for information extraction Extract complex and nested bio-events from PubMed abstracts More: Poon & Domingos, “Joint Inference for Information Extraction”, in Proc. AAAI-2007. More: Poon & Vanderwende, “Joint Inference for Knowledge Extraction from Biomedical Literature”, in Proc. NAACL-2010. 44

45 Unsupervised Learning: Example Coreference resolution: Accuracy comparable to previous supervised state of the art More: Poon & Domingos, “Joint Unsupervised Coreference Resolution with Markov Logic”, in Proc. EMNLP-2008. 45

47 Unsupervised Semantic Parsing USP [Poon & Domingos, EMNLP-09] First unsupervised approach for semantic parsing End-to-end machine reading system Read text, answer questions OntoUSP  USP  Ontology Induction [Poon & Domingos, ACL-10] Encoded in a few Markov logic formulas Best Paper Award 47

48 Semantic Parsing Microsoft buys Powerset BUY(MICROSOFT,POWERSET) Goal Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, … Challenge 48

49 Limitations of Existing Approaches Manual grammar or supervised learning Applicable to restricted domains only For general text Not clear what predicates and objects to use Hard to produce consistent meaning annotation Also, often learn both syntax and semantics Fail to leverage advanced syntactic parsers Make semantic parsing harder

50 USP: Key Idea # 1 Target predicates and objects can be learned Viewed as clusters of syntactic or lexical variations of the same meaning BUY(-,-)   buys, acquires, ’s purchase of, …   Cluster of various expressions for acquisition MICROSOFT   Microsoft, the Redmond software giant, …   Cluster of various mentions of Microsoft

51 USP: Key Idea # 2 Relational clustering  Cluster relations with same objects USP  Recursively cluster arbitrary expressions with similar subexpressions Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, …

52 USP: Key Idea # 2 Relational clustering  Cluster relations with same objects USP  Recursively cluster arbitrary expressions with similar subexpressions Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, … Cluster same forms at the atom level

53 USP: Key Idea # 2 Relational clustering  Cluster relations with same objects USP  Recursively cluster arbitrary expressions with similar subexpressions Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, … Cluster forms in composition with same forms

56 USP: Key Idea # 3 Start directly from syntactic analyses Focus on translating them to semantics Leverage rapid progress in syntactic parsing Much easier than learning both

57 Joint Inference in USP Forms canonical meaning representation by recursively clustering synonymous expressions Text  Logical form in this representation Induces ISA hierarchy among clusters and applies hierarchical smoothing (shrinkage) 57

58 USP: System Overview Input: Dependency trees for sentences Converts dependency trees into quasi-logical forms (QLFs) Starts with QLF clusters at atom level Recursively builds up clusters of larger forms Output: Probability distribution over QLF clusters and their composition MAP semantic parses of sentences 58

59 Generating Quasi-Logical Forms buys Microsoft Powerset nsubjdobj Convert each node into an unary atom 59

60 Generating Quasi-Logical Forms nsubjdobj n 1, n 2, n 3 are Skolem constants buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) 60

61 Generating Quasi-Logical Forms nsubjdobj Convert each edge into a binary atom buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) 61

62 Generating Quasi-Logical Forms Convert each edge into a binary atom buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) nsubj(n 1,n 2 )dobj(n 1,n 3 ) 62

63 A Semantic Parse buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) nsubj(n 1,n 2 )dobj(n 1,n 3 ) Partition QLF into subformulas 63

64 A Semantic Parse buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) nsubj(n 1,n 2 )dobj(n 1,n 3 ) Subformula  Lambda form: Replace Skolem constant not in unary atom with a unique lambda variable 64

65 A Semantic Parse buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) λx 2.nsubj(n 1,x 2 ) Subformula  Lambda form: Replace Skolem constant not in unary atom with a unique lambda variable λx 3.dobj(n 1,x 3 ) 65

66 A Semantic Parse buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) λx 2.nsubj(n 1,x 2 ) Core form: No lambda variable Argument form: One lambda variable λx 3.dobj(n 1,x 3 ) Core form Argument form 66

67 A Semantic Parse buys(n 1 ) Microsoft(n 2 ) Powerset(n 3 ) λx 2.nsubj(n 1,x 2 ) Assign subformula to object cluster λx 3.dobj(n 1,x 3 )  BUY  MICROSOFT  POWERSET 67

68 Object Cluster: BUY buys(n 1 ) Distribution over core forms 0.1 acquires(n 1 ) 0.2 …… One formula in MLN Learn weights for each pair of cluster and core form 68

69 Object Cluster: BUY buys(n 1 ) May contain variable number of property clusters 0.1 acquires(n 1 ) 0.2 …… BUYER BOUGHT PRICE …… 69

70 Property Cluster: BUYER λx 2.nsubj(n 1,x 2 ) Distributions over argument forms, clusters, and number 0.5 0.4 …… MICROSOFT 0.2 GOOGLE 0.1 …… Zero 0.1 One 0.8 …… λx 2.agent(n 1,x 2 ) 70 Three MLN formulas

71 Probabilistic Model 71 Exponential prior on number of parameters Cluster mixtures: Object Cluster: BUY buys0.1 acquires0.4 … …… Property Cluster: BUYER 0.5 0.4 … MICROSOFT0.2 GOOGLE0.1 … Zero 0.1 One 0.8 … nsubj agent 71

72 Probabilistic Model 72 Exponential prior on number of parameters Cluster mixtures with hierarchical smoothing: Object Cluster: BUY buys0.1 acquires0.4 … …… Property Cluster: BUYER 0.5 0.4 … MICROSOFT0.2 GOOGLE0.1 … Zero 0.1 One 0.8 … nsubj agent E.g., picking MICROSOFT as BUYER argument depends not only on BUY, but also on its ISA ancestors 72

73 Abstract Lambda Form buys(n 1 )  λx 2.nsubj(n 1,x 2 )  λx 3.dobj(n 1,x 3 ) BUYS (n 1 )  λx 2. BUYER (n 1,x 2 )  λx 3. BOUGHT (n 1,x 3 ) Final logical form is obtained via lambda reduction 73

74 Challenge: State Space Too Large Potential cluster number  exp( token-number ) Also, meaning units and clusters often small  Use combinatorial search 74

75 Inference: Find MAP Parse Initialize Search Operator Lambda reduction induces protein CD11B nsubjdobj IL-4 nn protein IL-4 nn protein IL-4 nn 75

76 Learning: Greedily Maximize Posterior enhances 1.0 induces 1.0 MERGE COMPOSE amino acid1.0 induces0.2 enhances0.8 …… Initialize Search Operators enhances 1.0 induces 1.0 acid 1.0 amino 1.0 acid 1.0 amino 1.0 76

77 Operator: Abstract induces 0.3 0.1 … enhances ISA inhibits 0.2 suppresses0.1 induces0.6 up-regulates 0.2 … INDUCE INHIBIT inhibits0.4 0.2 … suppresses INHIBIT inhibits0.4 0.2 … suppresses induces0.6 up-regulates 0.2 … INDUCE MERGE with REGULATE ? Captures substantial similarities 77

78 Experiments Apply to machine reading: Extract knowledge from text and answer questions Evaluation: Number of answers and accuracy GENIA dataset: 1999 Pubmed abstracts Use simple factoid questions, e.g.: What does anti-STAT1 inhibit? What regulates MIP-1 alpha? 78

79 Total and Correct Answers KW-SYNTextRunnerRESOLVERDIRTUSP USP extracted five times as many correct answers as TextRunner Highest precision of 91% 79

80 Qualitative Analysis Resolve many nontrivial variations Argument forms that mean the same, e.g., expression of X  X expression X stimulates Y  Y is stimulated with X Active vs. passive voices Synonymous expressions Etc. 80

81 Clusters And Compositions Clusters in core forms  investigate, examine, evaluate, analyze, study, assay   diminish, reduce, decrease, attenuate   synthesis, production, secretion, release   dramatically, substantially, significantly  …… Compositions amino acid, t cell, immune response, transcription factor, initiation site, binding site … 81

82 Question-Answer Example Q: What does IL-2 control? A: The DEX-mediated IkappaBalpha induction Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells. 82

83 Overview Machine reading Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions 83

84 Web-Scale Joint Inference Challenge: Efficiently identify the relevant Key: Induce and leverage an ontology Ontology  Capture essential properties & Abstract away unimportant variations Upper-level nodes  Skip irrelevant branches Wanted: Combine the following Probabilistic ontology induction (e.g., USP) Coarse-to-fine learning and inference [Felzenszwalb & McAllester, 2007; Petrov, Ph.D. Thesis] 84

85 Knowledge Reasoning Most facts/rules are not explicitly stated “Dark matter” in the natural language universe kale contains calcium  calcium prevent osteoporosis  kale prevents osteoporosis Keys: Induce generic reasoning patterns Incorporate reasoning in extraction Additional sources of indirect supervision 85

86 Harness Social Computing Bootstrap online community Knowledge Base 86

87 Harness Social Computing Bootstrap online community Incorporate human & end tasks in the loop “Tell me everything about dicer applied to synapse …” 87 Knowledge Base

88 Harness Social Computing Bootstrap online community Incorporate human & end tasks in the loop “Your extraction from my paper is correct except for blah …” 88 Knowledge Base

89 Harness Social Computing Bootstrap online community Incorporate human & end tasks in the loop Form positive feedback loop 89 Knowledge Base

90 Acknowledgments Pedro Domingos, Colin Cherry, Kristina Toutanova, Lucy Vanderwende, Oren Etzioni, Dan Weld, Matt Richardson, Parag Singla, Stanley Kok, Daniel Lowd, Marc Sumner ARO, AFRL, ONR, DARPA, NSF 90

91 Summary Statistical relational learning offers promising solutions for machine reading Markov logic provides a language for this Syntax: Weighted first-order logical formulas Semantics: Feature templates of Markov nets Open-source software: Alchemy A success story: USP Three key research directions alchemy.cs.washington.edu alchemy.cs.washington.edu/papers/poon09 91

1 Statistical Relational Learning for Knowledge Extraction from the Web Hoifung Poon Dept. of Computer Science & Eng. University of Washington 1.

Similar presentations

Presentation on theme: "1 Statistical Relational Learning for Knowledge Extraction from the Web Hoifung Poon Dept. of Computer Science & Eng. University of Washington 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Statistical Relational Learning for Knowledge Extraction from the Web Hoifung Poon Dept. of Computer Science & Eng. University of Washington 1.

Similar presentations

Presentation on theme: "1 Statistical Relational Learning for Knowledge Extraction from the Web Hoifung Poon Dept. of Computer Science & Eng. University of Washington 1."— Presentation transcript:

Similar presentations

About project

Feedback