Presentation is loading. Please wait.

Presentation is loading. Please wait.

8/13/2004NYCNLP (COLING 2004) Cross-lingual Information Extraction System Evaluation Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University.

Similar presentations


Presentation on theme: "8/13/2004NYCNLP (COLING 2004) Cross-lingual Information Extraction System Evaluation Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University."— Presentation transcript:

1 8/13/2004NYCNLP (COLING 2004) Cross-lingual Information Extraction System Evaluation Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University

2 8/13/2004NYCNLP (COLING 2004) Outline 1.Introduction 2.Cross-lingual IE system Translation-based QDIE system Cross-lingual QDIE system 3.Experiment 4.Discussion 5.Conclusion

3 8/13/2004NYCNLP (COLING 2004) Information Extraction Identifying entities from source text and mapping from source text to pre-defined table. “A smiling Palestinian suicide bomber triggered a massive explosion in the heavily policed heart of downtown Jerusalem today, …” Date: Location: Perpetrator: downtown Jerusalem A … suicide bomber today (Terrorism Activity)

4 8/13/2004NYCNLP (COLING 2004) Local Context Local contexts provides a useful information to identify entities. Date: Location: Perpetrator: downtown Jerusalem A … suicide bomber today “A smiling Palestinian suicide bomber triggered a massive explosion in the heavily policed heart of downtown Jerusalem today, …”

5 8/13/2004NYCNLP (COLING 2004) Extraction Patterns Extraction patterns have been widely used as an effective means to extract entities. –Pre-defined template (Riloff 1993): (kidnapped in ) –Predicate-Argument (Yangarber et al. 2000): (, appoint, ) –Dependency Tree (Sudo et al. 2003): (trigger(OBJ: explosion) (ADV: ))) Because of the cost in portability of IE system, automatic pattern discovery technique has become important. –application of bootstrapping method (Riloff and Jones 1999, Yangarber et al. 2000)

6 8/13/2004NYCNLP (COLING 2004) Pattern Discovery ….. QDIE = query-driven information extraction query IR (1) Get relevant documents (2) Score pattern candidates based on TF/IDF (3) Use pattern matching Source document (Sudo et al. 2003) Preprocess source documents (NE-tagging, Dependency parsing) keyword narrative Any subtree that contains at least one NE instance

7 8/13/2004NYCNLP (COLING 2004) Cross-lingual IE Assume we have –Machine Translation System –Basic linguistic tools for source and target language Morphological analyzer, parser, NE-tagger, IR system query Japanese English Source document E-QDIE J-QDIE MT system

8 8/13/2004NYCNLP (COLING 2004) Outline 1.Introduction 2.Cross-lingual IE system Translation-based QDIE system Cross-lingual QDIE system 3.Experiment 4.Discussion 5.Conclusion

9 8/13/2004NYCNLP (COLING 2004) Translation-based QDIE system query Japanese English Source document (1) Translate the source documents …... (2) Use English QDIE system Source document

10 8/13/2004NYCNLP (COLING 2004) Cross-lingual QDIE system query Japanese English Source document …... query (1) Translate the user’s query (2) Use Japanese QDIE system (3) Translate the extracted table

11 8/13/2004NYCNLP (COLING 2004) Comparison of two systems Translation-based QDIE –No source-language-specific tools are necessary except MT system. –Tools for E-QDIE system were customized into English (not output of MT system) Cross-lingual QDIE –MT for short sentences or phrases (for query and extracted entities) –Tools for J-QDIE system were customized into Japanese.

12 8/13/2004NYCNLP (COLING 2004) Experiment Management Succession Extraction Task (simple version of MUC-6 task) –Identify the entities involved in a succession event. Person, Post, Organization Test document –100 articles (61 relevant, 39 irrelevant) accumulated from Yomiuri Newspaper 1999 (Japanese) –Person(173/651), Post(210/626), Organization(111/709) Source document and tools –130,000 articles from Yomiuri Newspaper 1998 (Japanese) –MT system: “King of Translation” (IBM) –NE tagger: (Sekine and Nobata 2004). Extraction performance is measured by recall/precision of extracted entities.

13 8/13/2004NYCNLP (COLING 2004) Cross-lingual QDIE does better Maximum recall: crosslingual system: 60% translation-based system:41%

14 8/13/2004NYCNLP (COLING 2004) Translation QDIE suffers from NE recognition errors NE tagger was customized for English (WSJ) –many of the Japanese NEs do not occur in WSJ. [ Kansai Economic Federation ] ORG → [ Kansai ] LOC [ Economic Federation ] ORG –Translation errors result in fewer and noisier pattern candidates Translation / Cross-lingual –Person:4543/ 12096 –Post:3924/ 14986 –Organization:4014/ 11812

15 used Giza++ (Och et al. 2003) to make word alignments between original Japanese sentences and MT-ed English sentences. doubled the number of pattern candidates. NE tagging by Cross-language Projection 順天堂 大 の 水野 美邦 教授 Professor Mizuno 美邦 of 順天堂 large (= Yoshikuni Mizuno, professor at Juntendo Univ.) 大 = abbreviation of 大学 (=Univ.) Frequently mistranslated as “Large” (inspired by Riloff et al. 2002) Japanese: MT output:

16 8/13/2004NYCNLP (COLING 2004) Still Cross-lingual QDIE does better Maximum recall: crosslingual system: 60% translation-based system withNE projection52% translation-based system:41%

17 8/13/2004NYCNLP (COLING 2004) Problems in Translation Incorrect dependency structure caused by MT translation errors.

18 8/13/2004NYCNLP (COLING 2004) Correct Translation: On the sixth, since the financial reports for the fiscal year that ended in February, 1999 will end in a deficit, "Okajima" (Marunouchi, Kofu- city), the leading department store in the prefecture, announced that six of the thirteen full-time directors, including President Hiroyuki Okajima (40), two executive directors and a managing director, submitted the resignation letter and will formally resign at the general meeting of shareholders of the company.

19 8/13/2004NYCNLP (COLING 2004) From Muika the term settlement of accounts ended February, 99 having become the prospect of the first deficit settlement of accounts after the war etc., six of President Hiroyuki Okajima ( 40 ), two managing directors, one managing directors, the full-time directors that are 13 persons submitted the resignation report, “Okajima” of Marunouchi, Kofu-shi who is the major department store within the prefecture announced that he resigns formally by the fixed general meeting of shareholders of the company planned at the end of this month. MT Output:

20 8/13/2004NYCNLP (COLING 2004) Problems in Translation Structural difference –multiple translations of a single source language expression make pattern discovery more difficult on MT output に就任する。 be appointed to assume be inaugurated as (translation error)

21 8/13/2004NYCNLP (COLING 2004) Related Work Riloff et al. 2002 –showed how CLIE systems can be developed with IE learning tools, bitext alignment and an MT system. –conducted experiments on relatively close language pair: English and French “achieved roughly the same level of performance as the source- language IE system” We expect that the perforamnce gap between translation-based IE and Cross-lingual IE is more pronounced with a more divergent language pair like Japanese and English.

22 8/13/2004NYCNLP (COLING 2004) Conclusion We discussed the difficulty in cross-lingual information extraction caused by the translation of the source text. Cross-lingual QDIE performs better –Translation-based QDIE suffers from NE recognition errors. –Structural errors and incorrect dependency analysis in MT output caused fewer and noisier pattern candidates

23 8/13/2004NYCNLP (COLING 2004) Further Discussions Linguistic tools necessary for QDIE systems are available for major languages. Speculation from TIDES Surprise Language Exercise: development of tools in a new language –Machine Translation –Cross-lingual Information Retrieval –Named Entity tagger –(dependency/shallow/full) parser needs more work Additional performance gain for Cross-lingual QDIE may be achieved by the techniques for query translation + query expansion.

24 8/13/2004NYCNLP (COLING 2004)

25 8/13/2004NYCNLP (COLING 2004) NE tagging by Cross-language Projection used Giza++ (Och et al. 2003) to make word alignments between original Japanese sentences and MT-ed English sentences. doubled the number of pattern candidates. President Akiyama is inaugurated as the following chairman of Kansai Economic Federation. 秋山社長が関西経済連合会の次期会長に就任する。 (inspired by Riloff et al. 2002)


Download ppt "8/13/2004NYCNLP (COLING 2004) Cross-lingual Information Extraction System Evaluation Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University."

Similar presentations


Ads by Google