Download presentation

Presentation is loading. Please wait.

Published byAlexandra Ware Modified over 4 years ago

1
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 1 SOFIE: A Self-Organizing Framework for Information Extraction Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum (Max-Planck-Institute for Informatics, Saarbrücken, Germany)

2
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 2 Ontologies Singer Country USA Entity bornInPlace type subclassOf Wikipedia DBpedia, YAGO, KYLIN,... Internet ? "Elvis died in England" birth-place: USA

3
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 3 Information Extraction England diedInPlace "Elvis died in England" Previous approaches: Espresso, DIPRE, LEILA, Snowball, TextRunner, Alice, and many more Goal: Extract ontological information from natural language documents died in, perished in, was killed in,... ر May deliver non-canonic relations England, UK, Great Britain,... ر May deliver non-canonic entities diedInPlace(Elvis,England) diedInPlace(Elvis,Germany) ر May deliver inconsistent facts

4
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 4 Pitfalls of Information Extraction Elvis died in England. Ontology Web page Louis XIV died in France. France diedInPlace If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace

5
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 5 Pitfalls of Information Extraction Elvis died in England. Ontology Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. "Elvis""England" diedInPlace

6
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 6 Pitfalls of Information Extraction Elvis died in England. Ontology Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. "Elvis""England" diedInPlace Taxidophobist ?

7
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 7 Pitfalls of Information Extraction Elvis died in England. Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. "Elvis""England" diedInPlace Taxidophobist Reasoning Problem

8
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 8 Pitfalls of Information Extraction Elvis died in England. Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. Taxidophobist Reasoning Problem Disambiguation Problem

9
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 9 Pitfalls of Information Extraction Elvis died in England. Louis XIV died in France. Taxidophobist Reasoning Problem Disambiguation Problem Pattern Matching Problem "died in" = diedInPlace ?

10
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 10 Information Extraction as Formulas type(Elvis,Taxidophobist). type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) [0.8] Taxidophobist Reasoning Problem

11
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 11 Information Extraction as Formulas Elvis died in England. Louis XIV died in France. Reasoning Problem Disambiguation Problem Pattern Matching Problem "died in" = diedInPlace ? type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) type(Elvis,Taxidophobist).

12
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 12 Assumptions: ر In one document, the same word has always the same meaning ر The ontology already knows all important meanings of proper names possibleMeaning(Elvis@D15, ElvisPresley). [0.7] Information Extraction as Formulas Disambiguation Problem

13
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 13 Assumptions: ر In one document, the same word has always the same meaning ر The ontology already knows all important meanings of proper names possibleMeaning(Elvis@D15, ElvisPresley). [0.7] A word in context (wic). Here: The word "Elvis" in document D15 One possible meaning of "Elvis" as given by the ontology Prior estimation for the likelihood of this meaning. Information Extraction as Formulas | words(D15) rel(ElvisPresley)| | words(D15) |

14
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 14 Assumptions: ر In one document, the same word has always the same meaning ر The ontology already knows all important meanings of proper names possibleMeaning(Elvis@D15, ElvisPresley). [0.7] Information Extraction as Formulas possibleMeaning(X,Y) => means(X,Y) means(X,Y) & Y Z => means(X,Z)

15
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 15 Information Extraction as Formulas Elvis died in England. Louis XIV died in France. Reasoning Problem Disambiguation Problem Pattern Matching Problem "died in" = diedInPlace ? type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) type(Elvis,Taxidophobist). meaning(Elvis@D15, ElvisPresley). [0.7]

16
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 16 Information Extraction as Formulas Elvis died in England. Louis XIV died in France. Pattern Matching Problem "died in" = diedInPlace ? occurs("died in", Elvis@D15, England@D15). [14] occurs(P,Wic1,Wic2) & means(Wic1,X) & means(Wic2,Y) & mapsTo(P,R) => R(X,Y) occurs(P,Wic1,Wic2) & means(Wic1,X) & means(Wic2,Y) & R(X,Y) => mapsTo(P,R)

17
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 17 Information Extraction as Formulas Reasoning Problem Disambiguation Problem Pattern Matching Problem type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) type(Elvis,Taxidophobist). meaning(Elvis@D15, ElvisPresley). [0.7] occurs("died in", Elvis@D15, England@D15). [14] Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized means(Elvis@D15, ElvisPresley) ? mapsTo("died In", diedInPlace) ? diedIn(ElvisPresley, England) ?

18
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 18 Weighted MAX SAT Problem Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized Problems: ر The Weighted MAX SAT Problem is NP-hard ر Our instance of the problem is huge ر The most popular linear approximation algorithm (Johnson's) does not work well with our type of formulas Weighted MAX SAT Problem Johnson's cannot approximate better than 2/3 bornInPlace(X,Y) => bornInPlace(X,Z) A v B A v C B v C

19
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 19 A v B [w1] A v B [w2] B v C [w3] C [w4] Formulas ABCABC Hypotheses The Functional MAX SAT Algorithm considers only unit clauses. = true = false FMS Algorithm The Functional MAX SAT Algorithm propagates Dominating Unit Clauses A v B [10] A [10] A [30] A = true 30 > 10+10

20
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 20 FMS Algorithm Experiments show better performance in practice than Johnson's algorithm in our setting. FMS Algorithm FOR i=1 TO 42... NEXT i Approximation Guarantee Polynomial time

21
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 21 FMS Algorithm FOR i=1 TO 42... NEXT i FMS Algorithm Elvis died in Englandr(X,Y) & s(Y) => t(X,Y)

22
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 22 England FMS Algorithm diedIn St. Elvis FMS Algorithm FOR i=1 TO 42... NEXT i Elvis died in England type(Elvis,Taxidophobist)=1 diedIn(Elvis,England)=0 means(Elvis@D15,Elvis)=0 means(Elvis@D15,...)=1 r(X,Y) & s(Y) => t(X,Y)

23
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 23 England FMS Algorithm diedIn St. Elvis FMS Algorithm FOR i=1 TO 42... NEXT i r(X,Y) & s(Y) => t(X,Y)

24
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 24 CorpusType# DocsRelationsTimePrecision Wikipedia toy corpus structured10032min100% Wikipedia subcorpus semi- structured 20001515h94% News article toy corpus unstructured150124min91% Biographies from Web unstructured3440515h90% Other Experiments

25
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 25 SOFIE unifies the tasks of ر entity disambiguation ر pattern extraction ر semantic constraint reasoning in a single framework, delivering ر canonicalized facts ر of high precision (experiments show 90% precision) Conclusion died in England...but is alive!

26
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 26 occurs(P,WX,WY) /\ refersTo(WX.X) /\ refersTo(WY,Y) /\ R(X,Y) => expresses(P,R) occurs(P,WX,WY) /\ expressed(P,R) /\ refersTo(WX.X) /\ refersTo(WY,Y) /\ range(R,D1) /\ domain(R,D2) /\ type(X,D1) /\ type(Y,D2) => R(X,Y) R(X,Y) /\ R(X,Z) /\ type(R,function) => Y = Z disambiguationPrior(W,X) => refersTo(W,X) bornInYear(X,B) /\ diedInYear(X,D) => B<D SOFIE rules!

27
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 27 SOFIE: Experiments CorpusType# DocsRelationsTimePrecisionRecall Wikipedia toy corpus structured10038min100%80% Wikipedia toy corpus semi-structured 50% infoboxes removed 10038min100%57% Wikipedia subcorpus semi-structured20001515h94%? News article toy corpus unstructured150124min91%24%, 31% Snowball56%31% Biographies from Web unstructured3440515h90%?

28
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 28 SOFIE: Large-Scale Experiment Goal: Extract bornIn, bornOnDate, diedIn, diedOnDate, politicianOf Corpus: 3700 biography documents downloaded from the Web Runtime: (summed over 5 batches) Parsing7:05h Hypothesis Generation6:15h Solving2:30h Total15:50h Results: (precision in %) bornIn bornOnD diedIn diedOnD polOf 87 87 13 98 95 90

29
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 29 SOFIE: Relation to Markov Logic P bornIn(Nicholas, Patras) false true P(X) ~ e sat(i,X) wi Number of satisfied instances of the i th formula Weight of the i th formula r(x,y) /\ s(x,z) => t(x,z) [w]... max X e sat(i,X) wi max X log( e sat(i,X) wi ) max X sat(i,X) w i ~~~~> Weighted MAX SAT problem

30
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 30 Grounding r(X,Y) & s(Y) => t(X,Y) { r(X,Y), s(Y), t(X,Y) } { r(a,a), s(a), t(a,a) } { r(a,b), s(b), t(a,b) } { r(b,a), s(a), t(b,a) } { r(b,b), s(b), t(b,b) } r(a,a) r(a,b) r(b,a) r(b,b) Immutable, complete facts (e.g. pattern occurrences) Entities={a,b}

31
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 31 Grounding r(X,Y) & s(Y) => t(X,Y) { r(X,Y), s(Y), t(X,Y) } { s(a), t(a,a) } [w] r(a,a) [w] r(a,b) r(b,a) r(b,b) Immutable, complete facts (e.g. pattern occurrences)

32
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 32 Grounding { s(a), t(a,a) } [w1] {p(c,d), q(e), } [w2] Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized means(Elvis@D15, ElvisPresley) = true ? mapsTo("died In", diedInPlace) = true ? diedIn(ElvisPresley, England) = true ?

Similar presentations

Presentation is loading. Please wait....

OK

© 2012 National Heart Foundation of Australia. Slide 2.

© 2012 National Heart Foundation of Australia. Slide 2.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google