Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 1 SOFIE: A Self-Organizing Framework for Information Extraction Fabian.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
1 Knowledge and reasoning – second part Knowledge representation Logic and representation Propositional (Boolean) logic Normal forms Inference in propositional.
Advanced Piloting Cruise Plot.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
UNITED NATIONS Shipment Details Report – January 2006.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
AIFB Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 1 Mind the Web! Valentin Zacharias, Andreas Abecker, Imen.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.
ZMQS ZMQS
Fabian M. SuchanekYAGO - A Core of Semantic Knowledge 1 YAGO – A Core of Semantic Knowledge Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (Max-Planck.
Fabian M. SuchanekYAGO - A Core of Semantic Knowledge 1 YAGO – A Core of Semantic Knowledge Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (Max-Planck.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
ABC Technology Project
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
Green Eggs and Ham.
VOORBLAD.
15. Oktober Oktober Oktober 2012.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Copyright © 2013, 2009, 2006 Pearson Education, Inc.
Constant, Linear and Non-Linear Constant, Linear and Non-Linear
BIOLOGY AUGUST 2013 OPENING ASSIGNMENTS. AUGUST 7, 2013  Question goes here!
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
© 2012 National Heart Foundation of Australia. Slide 2.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
Januar MDMDFSSMDMDFSSS
Week 1.
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Intracellular Compartments and Transport
A SMALL TRUTH TO MAKE LIFE 100%
PSSA Preparation.
Essential Cell Biology
Energy Generation in Mitochondria and Chlorplasts
Presentation transcript:

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 1 SOFIE: A Self-Organizing Framework for Information Extraction Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum (Max-Planck-Institute for Informatics, Saarbrücken, Germany)

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 2 Ontologies Singer Country USA Entity bornInPlace type subclassOf Wikipedia DBpedia, YAGO, KYLIN,... Internet ? "Elvis died in England" birth-place: USA

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 3 Information Extraction England diedInPlace "Elvis died in England" Previous approaches: Espresso, DIPRE, LEILA, Snowball, TextRunner, Alice, and many more Goal: Extract ontological information from natural language documents died in, perished in, was killed in,... ر May deliver non-canonic relations England, UK, Great Britain,... ر May deliver non-canonic entities diedInPlace(Elvis,England) diedInPlace(Elvis,Germany) ر May deliver inconsistent facts

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 4 Pitfalls of Information Extraction Elvis died in England. Ontology Web page Louis XIV died in France. France diedInPlace If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 5 Pitfalls of Information Extraction Elvis died in England. Ontology Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. "Elvis""England" diedInPlace

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 6 Pitfalls of Information Extraction Elvis died in England. Ontology Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. "Elvis""England" diedInPlace Taxidophobist ?

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 7 Pitfalls of Information Extraction Elvis died in England. Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. "Elvis""England" diedInPlace Taxidophobist Reasoning Problem

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 8 Pitfalls of Information Extraction Elvis died in England. Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. Taxidophobist Reasoning Problem Disambiguation Problem

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 9 Pitfalls of Information Extraction Elvis died in England. Louis XIV died in France. Taxidophobist Reasoning Problem Disambiguation Problem Pattern Matching Problem "died in" = diedInPlace ?

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 10 Information Extraction as Formulas type(Elvis,Taxidophobist). type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) [0.8] Taxidophobist Reasoning Problem

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 11 Information Extraction as Formulas Elvis died in England. Louis XIV died in France. Reasoning Problem Disambiguation Problem Pattern Matching Problem "died in" = diedInPlace ? type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) type(Elvis,Taxidophobist).

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 12 Assumptions: ر In one document, the same word has always the same meaning ر The ontology already knows all important meanings of proper names ElvisPresley). [0.7] Information Extraction as Formulas Disambiguation Problem

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 13 Assumptions: ر In one document, the same word has always the same meaning ر The ontology already knows all important meanings of proper names ElvisPresley). [0.7] A word in context (wic). Here: The word "Elvis" in document D15 One possible meaning of "Elvis" as given by the ontology Prior estimation for the likelihood of this meaning. Information Extraction as Formulas | words(D15) rel(ElvisPresley)| | words(D15) |

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 14 Assumptions: ر In one document, the same word has always the same meaning ر The ontology already knows all important meanings of proper names ElvisPresley). [0.7] Information Extraction as Formulas possibleMeaning(X,Y) => means(X,Y) means(X,Y) & Y Z => means(X,Z)

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 15 Information Extraction as Formulas Elvis died in England. Louis XIV died in France. Reasoning Problem Disambiguation Problem Pattern Matching Problem "died in" = diedInPlace ? type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) type(Elvis,Taxidophobist). ElvisPresley). [0.7]

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 16 Information Extraction as Formulas Elvis died in England. Louis XIV died in France. Pattern Matching Problem "died in" = diedInPlace ? occurs("died in", [14] occurs(P,Wic1,Wic2) & means(Wic1,X) & means(Wic2,Y) & mapsTo(P,R) => R(X,Y) occurs(P,Wic1,Wic2) & means(Wic1,X) & means(Wic2,Y) & R(X,Y) => mapsTo(P,R)

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 17 Information Extraction as Formulas Reasoning Problem Disambiguation Problem Pattern Matching Problem type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) type(Elvis,Taxidophobist). ElvisPresley). [0.7] occurs("died in", [14] Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized ElvisPresley) ? mapsTo("died In", diedInPlace) ? diedIn(ElvisPresley, England) ?

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 18 Weighted MAX SAT Problem Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized Problems: ر The Weighted MAX SAT Problem is NP-hard ر Our instance of the problem is huge ر The most popular linear approximation algorithm (Johnson's) does not work well with our type of formulas Weighted MAX SAT Problem Johnson's cannot approximate better than 2/3 bornInPlace(X,Y) => bornInPlace(X,Z) A v B A v C B v C

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 19 A v B [w1] A v B [w2] B v C [w3] C [w4] Formulas ABCABC Hypotheses The Functional MAX SAT Algorithm considers only unit clauses. = true = false FMS Algorithm The Functional MAX SAT Algorithm propagates Dominating Unit Clauses A v B [10] A [10] A [30] A = true 30 > 10+10

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 20 FMS Algorithm Experiments show better performance in practice than Johnson's algorithm in our setting. FMS Algorithm FOR i=1 TO NEXT i Approximation Guarantee Polynomial time

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 21 FMS Algorithm FOR i=1 TO NEXT i FMS Algorithm Elvis died in Englandr(X,Y) & s(Y) => t(X,Y)

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 22 England FMS Algorithm diedIn St. Elvis FMS Algorithm FOR i=1 TO NEXT i Elvis died in England type(Elvis,Taxidophobist)=1 diedIn(Elvis,England)=0 r(X,Y) & s(Y) => t(X,Y)

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 23 England FMS Algorithm diedIn St. Elvis FMS Algorithm FOR i=1 TO NEXT i r(X,Y) & s(Y) => t(X,Y)

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 24 CorpusType# DocsRelationsTimePrecision Wikipedia toy corpus structured10032min100% Wikipedia subcorpus semi- structured h94% News article toy corpus unstructured150124min91% Biographies from Web unstructured h90% Other Experiments

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 25 SOFIE unifies the tasks of ر entity disambiguation ر pattern extraction ر semantic constraint reasoning in a single framework, delivering ر canonicalized facts ر of high precision (experiments show 90% precision) Conclusion died in England...but is alive!

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 26 occurs(P,WX,WY) /\ refersTo(WX.X) /\ refersTo(WY,Y) /\ R(X,Y) => expresses(P,R) occurs(P,WX,WY) /\ expressed(P,R) /\ refersTo(WX.X) /\ refersTo(WY,Y) /\ range(R,D1) /\ domain(R,D2) /\ type(X,D1) /\ type(Y,D2) => R(X,Y) R(X,Y) /\ R(X,Z) /\ type(R,function) => Y = Z disambiguationPrior(W,X) => refersTo(W,X) bornInYear(X,B) /\ diedInYear(X,D) => B<D SOFIE rules!

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 27 SOFIE: Experiments CorpusType# DocsRelationsTimePrecisionRecall Wikipedia toy corpus structured10038min100%80% Wikipedia toy corpus semi-structured 50% infoboxes removed 10038min100%57% Wikipedia subcorpus semi-structured h94%? News article toy corpus unstructured150124min91%24%, 31% Snowball56%31% Biographies from Web unstructured h90%?

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 28 SOFIE: Large-Scale Experiment Goal: Extract bornIn, bornOnDate, diedIn, diedOnDate, politicianOf Corpus: 3700 biography documents downloaded from the Web Runtime: (summed over 5 batches) Parsing7:05h Hypothesis Generation6:15h Solving2:30h Total15:50h Results: (precision in %) bornIn bornOnD diedIn diedOnD polOf

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 29 SOFIE: Relation to Markov Logic P bornIn(Nicholas, Patras) false true P(X) ~ e sat(i,X) wi Number of satisfied instances of the i th formula Weight of the i th formula r(x,y) /\ s(x,z) => t(x,z) [w]... max X e sat(i,X) wi max X log( e sat(i,X) wi ) max X sat(i,X) w i ~~~~> Weighted MAX SAT problem

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 30 Grounding r(X,Y) & s(Y) => t(X,Y) { r(X,Y), s(Y), t(X,Y) } { r(a,a), s(a), t(a,a) } { r(a,b), s(b), t(a,b) } { r(b,a), s(a), t(b,a) } { r(b,b), s(b), t(b,b) } r(a,a) r(a,b) r(b,a) r(b,b) Immutable, complete facts (e.g. pattern occurrences) Entities={a,b}

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 31 Grounding r(X,Y) & s(Y) => t(X,Y) { r(X,Y), s(Y), t(X,Y) } { s(a), t(a,a) } [w] r(a,a) [w] r(a,b) r(b,a) r(b,b) Immutable, complete facts (e.g. pattern occurrences)

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 32 Grounding { s(a), t(a,a) } [w1] {p(c,d), q(e), } [w2] Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized ElvisPresley) = true ? mapsTo("died In", diedInPlace) = true ? diedIn(ElvisPresley, England) = true ?