Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.

Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore

Talk Overview Background Related Work Approaches –Previous approach: Hidden Event Language Model –Previous approach: Linear-Chain CRF –This work: Factorial CRF Evaluation Conclusion 2

Automatically insert punctuation symbols into transcribed speech utterances Widely studied in speech processing community Example: >> Original speech utterance: >> Punctuated (and cased) version: You are quite welcome. And by the way, we may get other reservations, so could you please call us as soon as you fix the date ? you are quite welcome and by the way we may get other reservations so could you please call us as soon as you fix the date Punctuation Prediction 3

Our Task Processing prosodic features requires access to the raw speech data, which may be unavailable Tackles the problem from a text processing perspective Perform punctuation prediction for conversational speech texts without relying on prosodic features 4

Related Work With prosodic features –Kim and Woodland (2001): a decision tree framework –Christensen et al. (2001): a finite state and a multi- layer perceptron –Huang and Zweig (2002): a maximum entropy-based approach –Liu et al. (2005): linear-chain conditional random fields Without prosodic features –Beeferman et al. (1998): comma prediction with a trigram language model –Gravano et al. (2009): n-gram based approach 5

Related Work (continued) One well-known approach that does not exploit prosodic features –Stolcke et al. (1998) presented a hidden event language model –It treats boundary detection and punctuation insertion as an inter-word hidden event detection task –Widely used in many recent spoken language translation tasks as either a pre-processing (Wang et al., 2008) or post-processing (Kirchhoff and Yang, 2007) step 6

Hidden Event Language Model 7 HMM (Hidden Markov Model)-based approach –A joint distribution over words and inter-word events –Observations are the words, and word/event pairs are hidden states Implemented in the S RILM toolkit (Stolcke, 2002) Variant of this approach –Relocates/duplicates the ending punctuation symbol to be closer to the indicative words –Works well for predicting English question marks where is the nearest bus stop ? ? where is the nearest bus stop

Linear-Chain CRF 8 Linear-chain conditional random fields (L-CRF): Undirected graphical model used for sequence learning –Avoid the strong assumptions about dependencies in the hidden event language model –Capable of modeling dependencies with arbitrary non-independent overlapping features Y1Y1 Y2Y2 Y3Y3 YnYn X1X1 X2X2 X3X3 XnXn … word-layer tags utterance

An Example L-CRF A linear-chain CRF assigns a single tag to each individual word at each time step –Tags: N ONE, C OMMA, P ERIOD, Q MARK, E MARK –Factorized features Sentence: no, please do not. would you save your questions for the end of my talk, when i ask for them ? C OMMA N ONE N ONE P ERIOD N ONE N ONE … N ONE C OMMA N ONE … Q MARK no please do not would you … my talk when … them 9

Features for L-CRF Feature factorization (Sutton et al., 2007) –Product of a binary function on assignment of the set of cliques at each time step, and a feature function solely defined on the observation sequence –Feature functions: n-gram (n = 1,2,3) occurrences within 5 words from the current word Example: for the word “ do” : do @0, please @-1, would_you @[2,3], no_please_do @[-2,0] C OMMA N ONE N ONE P ERIOD N ONE N ONE … N ONE C OMMA N ONE … Q MARK no please do not would you … my talk when … them 10

Problems with L-CRF Long-range dependency between the punctuation symbols and the indicative words cannot be captured properly For example: no please do not would you save your questions for the end of my talk when i ask for them It is hard to capture the long range dependency between the ending question mark ( ? ) and the initial phrase “ would you ” with a linear-chain CRF 11

Problems with L-CRF What humans might do –no please do not would you save your questions for the end of my talk when i ask for them –no, please do not. would you save your questions for the end of my talk, when i ask for them ? Sentence level punctuation (. ? ! ) are associated with the complete sentence, and therefore should be assigned at the sentence level 12

What Do We Want? A model that jointly performs all the following three tasks together –Sentence boundary detection (or sentence segmentation) –Sentence type identification –Punctuation insertion 13

Factorial CRF 14 An instance of dynamic CRF –Two-layer factorial CRF (F-CRF) jointly annotates an observation sequence with two label sequences –Models the conditional probability of the label sequence pairs given the observation sequence X Y1Y1 Y2Y2 Y3Y3 YnYn X1X1 X2X2 X3X3 XnXn … Z1Z1 Z2Z2 Z3Z3 ZnZn … sentence-layer tags word-layer tags utterance

Example of F-CRF D E B EG D E I N D E I N D E I N Q N B EG Q N I N … Q N I N Q N I N Q N I N … Q N I N C OMMA N ONE N ONE P ERIOD N ONE N ONE … N ONE C OMMA N ONE … Q MARK no please do not would you … my talk when … them Propose two sets of tags for this joint task –Word-layer: N ONE, C OMMA, P ERIOD, Q MARK, E MARK –Sentence-layer: D E B EG, D E I N, Q N B EG, Q N I N, E X B EG, E X I N –Analogous feature factorization and the same feature functions used in L-CRF are used 15

Why Does it Work? The sentence-layer tags are used for sentence segmentation and sentence type identification The word-layer tags are used for punctuation insertion Knowledge learned from the sentence-layer can guide the word-layer tagging process The two layers are jointly learned, thus providing evidences that influence each other’s tagging process [no please do not] declarative sent. [would you save your questions for the end of my talk when i ask for them] question sent. ? Q N B EG Q N I N … 16

Evaluation Datasets BTECCT CNENCNEN Number of utterance pairs19,97210,061 Percentage of declarative sentences64%65%77%81% Percentage of question sentences36%35%22%19% Multiple sentences per utterance14%17%29%39% Average words per utterance8.599.4610.1814.33 17 IWSLT 2009 BTEC and CT datasets Consists of both English (EN) and Chinese (CN) 90% used for training, 10% for testing

Experimental Setup Designed extensive experiments for Hidden Event Language Model –Duplication vs. No duplication –Single-pass vs. Cascaded –Trigram vs. 5-gram Conducted the following experiments –Accuracy on CRR texts (F1 measure) –Accuracy on ASR texts (F1 measure) –Translation performance with punctuated ASR texts (B LEU metric) 18

Precision # correctly predicted punctuation symbols # predicted punctuation symbols Recall # correctly predicted punctuation symbols # expected punctuation symbols F1 measure 2 1/Precision + 1/Recall Punctuation Prediction: Evaluation Metrics 19

BTEC NO DUPLICATIONUSE DUPLICATION L-CRFF-CRF Single PassCascadedSingle PassCascaded LM ORDER 35353535 CN Prec.87.4086.4487.7287.1376.7477.5877.8978.5094.8294.83 Rec.83.0183.5882.0483.7672.6273.7273.0275.5387.0687.94 F185.1584.9984.7985.4174.6375.6075.3776.9990.7891.25 EN Prec.64.7262.7062.3958.1085.3385.7484.4481.3788.3792.76 Rec.60.7659.4958.5755.2880.4280.9879.4377.5280.2884.73 F162.6861.0660.4256.6682.8083.2981.8679.4084.1388.56 Punctuation Prediction Evaluation: Correctly Recognized Texts (I) 20 The “duplication” trick for hidden event language model is language specific Unlike English, indicative words can appear anywhere in a Chinese sentence

CT NO DUPLICATIONUSE DUPLICATION L-CRFF-CRF Single PassCascadedSingle PassCascaded LM ORDER 35353535 CN Prec.89.1487.8390.9788.0474.6375.4275.3776.8793.1492.77 Rec.84.7184.1677.7884.0870.6970.8464.6273.6083.4586.92 F186.8785.9683.8686.0172.6073.0669.5875.2088.0389.75 EN Prec.73.8673.4267.0265.1575.8777.7874.7574.4483.0786.69 Rec.68.9468.7962.1361.2370.3372.5669.2869.9376.0979.62 F171.3171.0364.4863.1372.9975.0871.9172.1279.4383.01 Punctuation Prediction Evaluation: Correctly Recognized Texts (II) 21 Significant improvement over L-CRF (p<0.01) Our approach is general: requires minimal linguistic knowledge, consistently performs well across different languages

BTEC NO DUPLICATIONUSE DUPLICATION L-CRFF-CRF Single PassCascadedSingle PassCascaded LM ORDER 35353535 CN Prec. 85.9684.8086.4885.1266.8668.7668.0068.7592.8193.82 Rec. 81.8782.7883.1582.7863.9266.1265.3866.4885.1689.01 F1 83.8683.7884.7883.9465.3667.4166.6767.6088.8391.35 EN Prec. 62.3859.2956.8654.2285.2387.2984.4981.3290.6793.72 Rec. 64.1760.9958.7656.2188.2289.6587.5884.5588.2292.68 F1 63.2760.1357.7955.2086.7088.4586.0082.9089.4393.19 Punctuation Prediction Evaluation: Automatically Recognized Texts 22 504 Chinese utterances, and 498 English utterances Recognition accuracy: 86% and 80% respectively Significant improvement (p < 0.01)

BTEC NO DUPLICATIONUSE DUPLICATION L-CRFF-CRF Single PassCascadedSingle PassCascaded LM ORDER 35353535 CN  EN 30.7730.7130.9830.6430.1630.2630.3330.4231.2731.30 EN  CN 21.2121.0021.1620.7623.0324.0423.6123.3423.4424.18 Punctuation Prediction Evaluation: Translation Performance 23 This tells us how well the punctuated ASR outputs can be used for downstream NLP tasks Use Berkeley aligner and Moses (lexicalized reordering) Averaged B LEU -4 scores over 10 MERT runs with random initial parameters

Conclusion 24 We propose a novel approach for punctuation prediction without relying on prosodic features –Jointly performs punctuation prediction, sentence boundary detection, and sentence type identification –Performs better than the hidden event language model and a linear-chain CRF model –A general approach that consistently works well across different languages –Effective when incorporated with downstream NLP tasks

Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.

Similar presentations

Presentation on theme: "Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.

Similar presentations

Presentation on theme: "Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore."— Presentation transcript:

Similar presentations

About project

Feedback