Presentation is loading. Please wait.

Presentation is loading. Please wait.

23.3 Information Extraction More complicated than an IR (Information Retrieval) system. Requires a limited notion of syntax and semantics.

Similar presentations


Presentation on theme: "23.3 Information Extraction More complicated than an IR (Information Retrieval) system. Requires a limited notion of syntax and semantics."— Presentation transcript:

1 23.3 Information Extraction More complicated than an IR (Information Retrieval) system. Requires a limited notion of syntax and semantics

2 Attribute-Based Systems Assumes entire text refers to a single object. Often, uses regular expressions to pull out values for attributes –[0-9], ?, +, *

3 Relational-Based Systems The text might refer to multiple objects. FAUSUS uses cascaded finite state transducers to perform the following steps: –Tokenization –Complex Word Handling –Basic Group Handling (NG, VG, PR, CJ) –Complex Phrase Handling –Structure Merging

4 23.4 Machine Translation Rough Translation (“gist”) Restricted Source (weather) Pre-edited (Caterpillar English) Literary (unsolved) Interlingua: A representation language that captures all meanings of an idea

5 Transfer System, Figure 23.5 Lexical Rule, e.g. ENG[cat]  FR[chat] Syntactic Rule, e.g. ENG[adj noun]  FR[noun adj] Memory Based Rule, e.g. ENG[The cat comes]  FR[Le chat arrive]

6 Statistical Machine Translation argmax F P(F | E) = argmax F P(E | F) * P(F) / P(E)  argmax F P(E | F) * P(F) P(F), language model (e.g. bigram model) P(E | F), translation model, p. 856 –P(fertility = n | word F ), fertility model –P(word E | word F ), word choice model –P(offset = o | pos, len E, len F ), offset model

7 Learning Probabilities Given a French text and an English text Segment into sentences Estimate F language model Align sentences Estimate fertility model Estimate word choice model Estimate offset model Improve models using a technique such as EM


Download ppt "23.3 Information Extraction More complicated than an IR (Information Retrieval) system. Requires a limited notion of syntax and semantics."

Similar presentations


Ads by Google