Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical modelling of MT output corpora for Information Extraction.

Similar presentations


Presentation on theme: "Statistical modelling of MT output corpora for Information Extraction."— Presentation transcript:

1 Statistical modelling of MT output corpora for Information Extraction

2 Overview Using MT output for IE –Requirements and evaluation of usability –S-score: measuring the degree of word significance for a text by contrasting text and corpus usages Experiment set-up and MT evaluation metrics using differences in S-scores for MT evaluation Results of MT evaluation for IE –Comparison of MT systems –Correlations with human evaluation measures of MT –Issues of MT architecture and evaluation scores Conclusions & Future work

3 Using MT for IE Requirements for human use and for automatic processing are different: –fluency is less important than adequacy –stylistic errors are less important than factual errors, e.g.: MT: * Bill Fisher ~ 'to send a bill to a fisher Frequency issues: –low-frequent words carry the most important information (require accurate disambiguation) –Some IE tasks use statistical models (expected to be different for MT)

4 Frequency issues… disambiguation Examples:

5 Frequency issues: statistical modelling for IE Research on adaptive IE: automatic template acquisition via statistical means –find sentences containing statistically significant words –build templates around such sentences Template element fillers (e.g., NEs) often appear among statistically significant words Distribution of word frequencies is expected to be different for MT: checking if this is the case

6 Measuring statistical significance –S word[text] -- the score of statistical significance for a particular word in a particular text; –P word[text] -- the relative frequency of the word in the text; –P word[rest-corp] -- the relative frequency of the same word in the rest of the corpus, without this text; –N word[txt-not-found] -- the proportion of texts in the corpus, where this word is not found (number of texts, where it is not found divided by number of texts in the corpus); –P word[all-corp] -- the relative frequency of the word in the whole corpus, including this particular text

7 Intuitive appeal of significance scores Selecting words potentially important for IE: In the Marseille Facet of the Urba-Gracco Affair, Messrs. Emmanuelli, Laignel, Pezet, and Sanmarco Confronted by the Former Officials of the SP Research Department On Wednesday, February 9, the presiding judge of the Court of Criminal Appeals of Lyon, Henri Blondet, charged with investigating the Marseille facet of the Urba-Gracco affair, proceeded with an extensive confrontation among several Socialist deputies and former directors of Urba-Gracco. Ten persons, including Henri Emmanuelli and Andre Laignel, former treasurers of the SP, Michel Pezet, and Philippe Sanmarco, former deputies (SP) from the Bouches-du-Rhône, took part in a hearing which lasted more than seven hours

8 ...Intuitive appeal of significance scores Ordering words:

9 Metric for usability of MT for IE Suggestion: measuring differences in statistical significance for a human translation and MT allows estimating the amount of prospective problems Question: do any human evaluation measures of MT correlate with differences in S-scores for different MT systems?

10 Experiment setup Available: 100 texts developed for DARPA 94 MT evaluation exercise: French originals 2 different human translations (reference and expert) 5 translations of MT systems ("French into English): –knowledge-based: Systran; Reverso; Metal; Globalink –IBM statistical approach to MT: Candide DARPA evaluation scores available for each system and for human expert translation: –Informativeness; Adequacy; Fluency Calculating distances of combined S-scores between: the human reference translation & other translations (MT and the expert translation)

11 The distance scores Based on comparing sets of words with S-score > 1 –words significant in both texts with different statistical significance scores –words not present in the reference translation (overgenerated in MT) –words not present in MT, but present in the reference translation (undergenerated in MT) Computing distance scores –o-score for «avoiding overgeneration» (~ Presicion) –u-score for «avoiding undergeneration» (~ Recall) –u&o combined score (calculated as F-measure)

12 Computing distance scores... Words that changed their significance Overgeneration score: Undergeneration score:

13 … Computing distance scores Scores for avoiding over- and under-generation Making scores compatible across texts (the number of significant words may be different):

14 The resulting distance scores

15 DARPA Adequacy and scores

16 o-score & DARPA 94 Adequacy

17 DARPA Fluency and scores

18 u&o-score and DARPA 94 Fluency

19 Results and correlation of scores Human expert translation scores higher than MT Statistical MT system «Candide» is characteristically different Strong positive correlation found for: –o-score & DARPA adequacy Weak positive correlation found for –u&o & DARPA fluency No correlation was found between u-score (high for statistical MT) and human MT evaluation measures

20 Conclusions Word-significance measure S – is useful in other areas (e.g., distinguishing lexical and morphological differences) Threshold S > 1 distinguishes content and functional words across different languages (checked for English, French and Russian) Statistical modelling showed substantial differences between human translation and MT output corpora Measures of contrastive frequencies for words in a particular text and the rest of the corpus correlate with human evaluation of MT (scores for adequacy)

21 Future work Statistical modelling of Example-based MT Investigating the actual performance of IE systems on different tasks using MT of different quality (with different "usability for IE" scores) and its correlation with proposed MT evaluation measures Establishing formal properties for intuitive judgements about translation quality (translation equivalence, adequacy, and fluency in human translation and MT)


Download ppt "Statistical modelling of MT output corpora for Information Extraction."

Similar presentations


Ads by Google