Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov. 17-19, 2013 Discriminative Latent Variable Based Classifier.

Similar presentations


Presentation on theme: "Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov. 17-19, 2013 Discriminative Latent Variable Based Classifier."— Presentation transcript:

1 Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov. 17-19, 2013 Discriminative Latent Variable Based Classifier for Translation Error Detection

2 jhdu@xaut.edu.cn Outline 1.Introduction 2. DPLVM for Translation Error Detection 3. Experiments and Analysis 4. Conclusions and Future Work

3 jhdu@xaut.edu.cn 1. Introduction Problem 1.In localization industry, human is always involved in post-editing the MT results; 2. MT errors always increase human cost to obtain a reasonable translation; 3. Translation error detection or word confidence estimation can improve working efficiency of post- editors in some extent. 1.In localization industry, human is always involved in post-editing the MT results; 2. MT errors always increase human cost to obtain a reasonable translation; 3. Translation error detection or word confidence estimation can improve working efficiency of post- editors in some extent. Research Question: how to improve the detection accuracy of detecting translation errors?

4 jhdu@xaut.edu.cn Blatz et al. combined the neural network and a naive Bayes classifier 2004 Ueffing and Ney exhaustively explored various kinds of WPP features 2003/20 07 Specia et al. worked on confidence estimation in CAT field 2009/20 11 Xiong et al. used a MaxEnt-based classifier to predict translation errors 2010 1. Introduction Related Work

5 jhdu@xaut.edu.cn For same feature set, different classifiers show different performance, thus how to select/design a proper classifier is important Classifiers For a classifier, different features reflect different characteristics of problem, how to select/design a feature set is crucial Features 1. Introduction Key Factors

6 jhdu@xaut.edu.cn Title in here Feature set Title in here Comparison with SVM and MaxEnt Title in here Discriminative Latent Variable classifier 1. Introduction Our Work

7 jhdu@xaut.edu.cn 2. DPLVM Algorithm  Conditions: a sequence of observations x = {x 1, x 2,…, x m } a sequence of labels y = {y 1, y 2,…, y m }  Assumption: a sequence of latent variables h = {h 1, h 2,…, h m }  Goal: to learn a mapping between x and y  Definition: (1)

8 jhdu@xaut.edu.cn Simplified Algorithm  Assumptions: the model is restricted to have disjoint sets of latent variables associated with each class label; h j H y j y j Each h j is a member in a set H y j of possible latent variables for the class label y j ; so sequences which have any will by definition have Equation (1) can be re-written as: where (2) (3)

9 jhdu@xaut.edu.cn Parameter Estimation  Decoding for test set:  Decoding algorithm: Sun and Tsujii (2009): a latent-dynamic inference (LDI) method based on A* search and dynamic programming;

10 jhdu@xaut.edu.cn DPLVM in Translation Error Detection Task  Prerequisites: Types of errors can be classified; Each class has a specific label; The classification task can be regarded as a labelling task;  2 Classes of word label C: correct Good words  label: c I: incorrect Bad words  label: i

11 jhdu@xaut.edu.cn Feature Set Word Posterior Probabilities Fixed position based WPP Flexible position based WPP Word alignment based WPP Lexical Features Part of speech (POS) word entity Syntactic Features word links from LG parser

12 jhdu@xaut.edu.cn Feature Representation

13 jhdu@xaut.edu.cn 3. Experiments and Analysis Experimental Settings – SMT system Language pair: Chinese-English Training set: NIST data set,3.4m Devset: NIST MT 2006 current set Testset: NIST MT 2005,2008 sets SMT Performance

14 jhdu@xaut.edu.cn Experimental Settings for Error Detection Task Devset: translations of NIST MT-08 Testset: translations of NIST MT-05 Annotation: TER to determine the true labels for words, 37.99% ratio of correct words for MT-08, 41.59% RCW for MT-05 Data Set and Data Annotation Evaluation Metrics

15 jhdu@xaut.edu.cn Comparison (1) Classification Experiments based on Individual Features

16 jhdu@xaut.edu.cn (2) Classification Experiment on Combined Features

17 jhdu@xaut.edu.cn Observations The name entities are prone to be wrongly classified The prepositions, conjunctions, auxiliary verbs and articles are easier to be wrongly classified The proportion of the notional words that are wrongly classified is relatively small

18 jhdu@xaut.edu.cn 4. Conclusions and Future Work Conclusions Presents a new classifier - DPLVM-based classifier -for translation error detection Introduces three different kinds of WPP features, three linguistic features Compares the MaxEnt classifier, SVM classifier and our DPLVM classifier The proposed classifier performs best compared to two other individual classifiers in terms of CER

19 jhdu@xaut.edu.cn introducing paraphrases to annotate the hypotheses introducing new useful features to further improve the detection capability performing experiments on more language pairs to verify our proposed method. 4. Conclusions and Future Work Future Work

20 jhdu@xaut.edu.cn Thanks for your attention!


Download ppt "Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov. 17-19, 2013 Discriminative Latent Variable Based Classifier."

Similar presentations


Ads by Google