Presentation is loading. Please wait.

Presentation is loading. Please wait.

AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6.

Similar presentations


Presentation on theme: "AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6."— Presentation transcript:

1 AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6 October 2004

2 2 AQUAINT Phase 2 Objectives End-to-End System; Multi-lingual Data Find appropriate information in a second language Be organized to maximize performance –Analyze, then translate? –Translate, then analyze? Focus on complex questions (e.g. definitional & biographical questions), rather than on factoid questions Determine whether two statements across languages convey –The same information, –Inconsistent information, or –Novel/complementary information (To be addressed later)

3 3 AQUAINTApproach Trained, language-independent algorithms for core NLP problems, e.g., –passage retrieval, –name tagging, –parsing and –co-reference Plug-and-play architecture for alternative MT systems for question & document translation Controlled experiments to measure and optimize QA performance –BBN’s AQUA system for English as monolingual baseline

4 4 AQUAINT Mono-Lingual System 12/2003 Question Classification Question Document Retrieval Linguistic Processing & Extraction of Kernel Facts Kernel Fact Ranking Redundancy Removal List of Responses Proposition Finding Co-reference Relation Extraction Name Annotation Name Tagging Question Profile Treebank Parsing Linguistically Motivated Components of SERIF Hand-crafted Patterns Surface Structure Matching Background Model

5 5 AQUAINT AQUA Cross-Lingual Architecture today implemented for English questions against Chinese data bases via analysis in Chinese Expansion later for –Arabic documents –Merging of answers from English, Arabic, and Chinese sources (Later) English & Arabic Database Answer Generation Transliteration User Interface Document Processing SERIF Machine Translation Chinese Text Translated English Text Chinese Extraction Output Analysis during Indexing Responding to Questions

6 6 AQUAINTOutline Transliterating names from English to foreign language (part of question analysis) Performance of Chinese analysis components Machine translation currently used Example output

7 7 AQUAINT Problems with Dialects in Transliteration Examples –George Bush 乔治 布什 (PRC) 乔治 布希 (Taiwan) –Blair 布莱尔 (PRC) 贝理雅 (Hong Kong) Even within a single dialect, there can be multiple transliterations in use Currently we use the PRC style of transliteration

8 8 AQUAINT Transliteration Algorithm Given an English name E, the algorithm (Al-Onaizan, 2002) finds C that maximizes P(E|C)*P(C) –English name E is segmented into phonemes (character sequences) –Probabilities of phoneme mappings P(E|C) are learned form human transliterated names –Language model probability P(C) is compiled from a Chinese corpus Transliteration Training Data –Person proper names –Mandarin training data –~500k name pairs taken from Chinese - English Name Entity Lists (LDC2003E01 v1.beta)

9 9 AQUAINT Statistical Transliteration: Examples Albright: 奥尔布赖特 5.5 * 10 -4 – 奥尔 :al 0.1648 – 布 :b 0.5292 – 赖 :righ 0.0113 – 特 :t 0.5657 Powell: 鲍威尔 2.4 * 10 -4 – 鲍 :po 0.0069 – 威尔 :well 0.0351

10 10 AQUAINT Current Chinese Component Performance Test SetRecallPrecisionF/Value Names ACE evaluation TDT4 data 80%77%78.26 F Descriptions ACE evaluation TDT4 data 60%76%66.82 F Entity Mentions ACE evaluation TDT4 data 78.7 (value) Entities ACE evaluation TDT4 data 72.4 (value) Parsing Chinese Treebank 82.8%81.3%82.04 F Represents state-of-the-art performance

11 11 AQUAINT Machine Translation Statistical MT learns to translate new text based on existing text translated by humans Model of translation trained by GIZA++ –Freely available at www.informatik.rwthaachen.de/Colleagues/och/softwar e/GIZA++.html Language Model trained using CMU Language Modelling Toolkit v2 Translation was done by USC/ISI’s ReWrite decoder, version 1.0.0a –Downloaded from http://www.isi.edu/licensed- sw/rewrite-decoder/

12 12 AQUAINT MT Training Data Translation –~315k sentence pairs (~11m Chinese characters) –Corpora: MTC-1 (Multiple Translation Chinese Corpus) Chinese-English Lexicon Chinese Treebank Hong Kong News Hong Kong Hansards (proceedings of the Legislative Council of the HKSAR) Language Model –Trigram language model –~60m English words –Corpora: TDT-4 (English portion) North American News Text Corpus

13 13 AQUAINT Steps to Improving MT Model Using GIZA++ & ReWrite –Take advantage of full UN Parallel Corpus –Tune training and decoding parameters Consider other MT systems

14 14 AQUAINT Example Answer Nuggets Who is Colin Powell? –Nugget from Copulas and Appositives 前任 美国 参谋长 连席 会议 主席 鲍威尔 “former Chairman of the Joint Chiefs of Staff” Who is Kofi Annan? –Nugget from Propositions 他 建议 由 两 族 轮流 选派 总统, 即 希腊族人 每 担任 两 任 总统, 土耳其族人 担任 一任 总统 。 “He proposed that Greece and Turkey alternately hold the presidency (of Cyprus)” –Nugget from Relations 安理会 主席 “Chairman of the UN Security Council”

15 15 AQUAINT User Interface

16 Former Chairman of the Joint Chiefs (app) Soon-to-be secretary of state, retired general Powell (app) A candidate acceptable to Republicans and Democrats (copula) The most likely candidate (copula) National Security Advisor to President Reagan (prop) General Powell will become the first black to be Secretary of State in US History (prop) Powell served as commander of US forces in South Korea from 1973 to 1974. (prop)

17 U.N. Secretary-General (app) Anan is in Jerusalem for a diplomatic mission … (app) U.N. Secretary-General (app) Became the first U.N. Secretary-General to make a statement at the refugee meeting. (prop) Proposed that Greece and Turkey alternately hold the presidency of Cyprus (prop)

18 18 AQUAINT Concluding Comments Initial question answering from Chinese corpus implemented Opportunities for improvement in all components, including –Transliteration –Machine translation –Passage retrieval –Answer finding and generation Positive experience in transitioning English AQUA to –AQUAINT testbed at MITRE –Fairfield experiment Baseline participant in relationship pilot study –No work on answering relationship questions Proposed pilot evaluation in spring, 2005 First step toward full goal of answer merging across –English –Arabic –Chinese


Download ppt "AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6."

Similar presentations


Ads by Google