Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 1 SOFIE: A Self-Organizing Framework for Information Extraction Fabian.

Similar presentations


Presentation on theme: "Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 1 SOFIE: A Self-Organizing Framework for Information Extraction Fabian."— Presentation transcript:

1 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 1 SOFIE: A Self-Organizing Framework for Information Extraction Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum (Max-Planck-Institute for Informatics, Saarbrücken, Germany)

2 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 2 Ontologies Singer Country USA Entity bornInPlace type subclassOf Wikipedia DBpedia, YAGO, KYLIN,... Internet ? "Elvis died in England" birth-place: USA

3 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 3 Information Extraction England diedInPlace "Elvis died in England" Previous approaches: Espresso, DIPRE, LEILA, Snowball, TextRunner, Alice, and many more Goal: Extract ontological information from natural language documents died in, perished in, was killed in,... ر May deliver non-canonic relations England, UK, Great Britain,... ر May deliver non-canonic entities diedInPlace(Elvis,England) diedInPlace(Elvis,Germany) ر May deliver inconsistent facts

4 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 4 Pitfalls of Information Extraction Elvis died in England. Ontology Web page Louis XIV died in France. France diedInPlace If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace

5 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 5 Pitfalls of Information Extraction Elvis died in England. Ontology Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. "Elvis""England" diedInPlace

6 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 6 Pitfalls of Information Extraction Elvis died in England. Ontology Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. "Elvis""England" diedInPlace Taxidophobist ?

7 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 7 Pitfalls of Information Extraction Elvis died in England. Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. "Elvis""England" diedInPlace Taxidophobist Reasoning Problem

8 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 8 Pitfalls of Information Extraction Elvis died in England. Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. Taxidophobist Reasoning Problem Disambiguation Problem

9 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 9 Pitfalls of Information Extraction Elvis died in England. Louis XIV died in France. Taxidophobist Reasoning Problem Disambiguation Problem Pattern Matching Problem "died in" = diedInPlace ?

10 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 10 Information Extraction as Formulas type(Elvis,Taxidophobist). type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) [0.8] Taxidophobist Reasoning Problem

11 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 11 Information Extraction as Formulas Elvis died in England. Louis XIV died in France. Reasoning Problem Disambiguation Problem Pattern Matching Problem "died in" = diedInPlace ? type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) type(Elvis,Taxidophobist).

12 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 12 Assumptions: ر In one document, the same word has always the same meaning ر The ontology already knows all important meanings of proper names possibleMeaning(Elvis@D15, ElvisPresley). [0.7] Information Extraction as Formulas Disambiguation Problem

13 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 13 Assumptions: ر In one document, the same word has always the same meaning ر The ontology already knows all important meanings of proper names possibleMeaning(Elvis@D15, ElvisPresley). [0.7] A word in context (wic). Here: The word "Elvis" in document D15 One possible meaning of "Elvis" as given by the ontology Prior estimation for the likelihood of this meaning. Information Extraction as Formulas | words(D15) rel(ElvisPresley)| | words(D15) |

14 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 14 Assumptions: ر In one document, the same word has always the same meaning ر The ontology already knows all important meanings of proper names possibleMeaning(Elvis@D15, ElvisPresley). [0.7] Information Extraction as Formulas possibleMeaning(X,Y) => means(X,Y) means(X,Y) & Y Z => means(X,Z)

15 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 15 Information Extraction as Formulas Elvis died in England. Louis XIV died in France. Reasoning Problem Disambiguation Problem Pattern Matching Problem "died in" = diedInPlace ? type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) type(Elvis,Taxidophobist). meaning(Elvis@D15, ElvisPresley). [0.7]

16 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 16 Information Extraction as Formulas Elvis died in England. Louis XIV died in France. Pattern Matching Problem "died in" = diedInPlace ? occurs("died in", Elvis@D15, England@D15). [14] occurs(P,Wic1,Wic2) & means(Wic1,X) & means(Wic2,Y) & mapsTo(P,R) => R(X,Y) occurs(P,Wic1,Wic2) & means(Wic1,X) & means(Wic2,Y) & R(X,Y) => mapsTo(P,R)

17 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 17 Information Extraction as Formulas Reasoning Problem Disambiguation Problem Pattern Matching Problem type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) type(Elvis,Taxidophobist). meaning(Elvis@D15, ElvisPresley). [0.7] occurs("died in", Elvis@D15, England@D15). [14] Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized means(Elvis@D15, ElvisPresley) ? mapsTo("died In", diedInPlace) ? diedIn(ElvisPresley, England) ?

18 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 18 Weighted MAX SAT Problem Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized Problems: ر The Weighted MAX SAT Problem is NP-hard ر Our instance of the problem is huge ر The most popular linear approximation algorithm (Johnson's) does not work well with our type of formulas Weighted MAX SAT Problem Johnson's cannot approximate better than 2/3 bornInPlace(X,Y) => bornInPlace(X,Z) A v B A v C B v C

19 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 19 A v B [w1] A v B [w2] B v C [w3] C [w4] Formulas ABCABC Hypotheses The Functional MAX SAT Algorithm considers only unit clauses. = true = false FMS Algorithm The Functional MAX SAT Algorithm propagates Dominating Unit Clauses A v B [10] A [10] A [30] A = true 30 > 10+10

20 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 20 FMS Algorithm Experiments show better performance in practice than Johnson's algorithm in our setting. FMS Algorithm FOR i=1 TO 42... NEXT i Approximation Guarantee Polynomial time

21 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 21 FMS Algorithm FOR i=1 TO 42... NEXT i FMS Algorithm Elvis died in Englandr(X,Y) & s(Y) => t(X,Y)

22 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 22 England FMS Algorithm diedIn St. Elvis FMS Algorithm FOR i=1 TO 42... NEXT i Elvis died in England type(Elvis,Taxidophobist)=1 diedIn(Elvis,England)=0 means(Elvis@D15,Elvis)=0 means(Elvis@D15,...)=1 r(X,Y) & s(Y) => t(X,Y)

23 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 23 England FMS Algorithm diedIn St. Elvis FMS Algorithm FOR i=1 TO 42... NEXT i r(X,Y) & s(Y) => t(X,Y)

24 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 24 CorpusType# DocsRelationsTimePrecision Wikipedia toy corpus structured10032min100% Wikipedia subcorpus semi- structured 20001515h94% News article toy corpus unstructured150124min91% Biographies from Web unstructured3440515h90% Other Experiments

25 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 25 SOFIE unifies the tasks of ر entity disambiguation ر pattern extraction ر semantic constraint reasoning in a single framework, delivering ر canonicalized facts ر of high precision (experiments show 90% precision) Conclusion died in England...but is alive!

26 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 26 occurs(P,WX,WY) /\ refersTo(WX.X) /\ refersTo(WY,Y) /\ R(X,Y) => expresses(P,R) occurs(P,WX,WY) /\ expressed(P,R) /\ refersTo(WX.X) /\ refersTo(WY,Y) /\ range(R,D1) /\ domain(R,D2) /\ type(X,D1) /\ type(Y,D2) => R(X,Y) R(X,Y) /\ R(X,Z) /\ type(R,function) => Y = Z disambiguationPrior(W,X) => refersTo(W,X) bornInYear(X,B) /\ diedInYear(X,D) => B<D SOFIE rules!

27 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 27 SOFIE: Experiments CorpusType# DocsRelationsTimePrecisionRecall Wikipedia toy corpus structured10038min100%80% Wikipedia toy corpus semi-structured 50% infoboxes removed 10038min100%57% Wikipedia subcorpus semi-structured20001515h94%? News article toy corpus unstructured150124min91%24%, 31% Snowball56%31% Biographies from Web unstructured3440515h90%?

28 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 28 SOFIE: Large-Scale Experiment Goal: Extract bornIn, bornOnDate, diedIn, diedOnDate, politicianOf Corpus: 3700 biography documents downloaded from the Web Runtime: (summed over 5 batches) Parsing7:05h Hypothesis Generation6:15h Solving2:30h Total15:50h Results: (precision in %) bornIn bornOnD diedIn diedOnD polOf 87 87 13 98 95 90

29 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 29 SOFIE: Relation to Markov Logic P bornIn(Nicholas, Patras) false true P(X) ~ e sat(i,X) wi Number of satisfied instances of the i th formula Weight of the i th formula r(x,y) /\ s(x,z) => t(x,z) [w]... max X e sat(i,X) wi max X log( e sat(i,X) wi ) max X sat(i,X) w i ~~~~> Weighted MAX SAT problem

30 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 30 Grounding r(X,Y) & s(Y) => t(X,Y) { r(X,Y), s(Y), t(X,Y) } { r(a,a), s(a), t(a,a) } { r(a,b), s(b), t(a,b) } { r(b,a), s(a), t(b,a) } { r(b,b), s(b), t(b,b) } r(a,a) r(a,b) r(b,a) r(b,b) Immutable, complete facts (e.g. pattern occurrences) Entities={a,b}

31 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 31 Grounding r(X,Y) & s(Y) => t(X,Y) { r(X,Y), s(Y), t(X,Y) } { s(a), t(a,a) } [w] r(a,a) [w] r(a,b) r(b,a) r(b,b) Immutable, complete facts (e.g. pattern occurrences)

32 Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 32 Grounding { s(a), t(a,a) } [w1] {p(c,d), q(e), } [w2] Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized means(Elvis@D15, ElvisPresley) = true ? mapsTo("died In", diedInPlace) = true ? diedIn(ElvisPresley, England) = true ?


Download ppt "Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 1 SOFIE: A Self-Organizing Framework for Information Extraction Fabian."

Similar presentations


Ads by Google