Presentation is loading. Please wait.

Presentation is loading. Please wait.

WP4-22. Final Evaluation of Subtitle Generator Vincent Vandeghinste, Pan Yi CCL – KULeuven.

Similar presentations


Presentation on theme: "WP4-22. Final Evaluation of Subtitle Generator Vincent Vandeghinste, Pan Yi CCL – KULeuven."— Presentation transcript:

1 WP4-22. Final Evaluation of Subtitle Generator Vincent Vandeghinste, Pan Yi CCL – KULeuven

2 Example Transcript: Het meest spectaculaire aan de daadwerkelijke start van de euro is dat er eigenlijk niets spectaculairs te melden valt. Ondertitel: Het meest spectaculaire aan de start van de euro was dat er niets spectaculairs te melden valt.

3 Flow

4 Availability Calculator Pronunciation Time of Input Sentence => estimate nr of characters available in subtitle If UNKNOWN, estimate it by – counting nr of syllables –Average speaking rate for Dutch

5 Syllable Counter Rule-based Evaluated on CGN-lexicon combined with FREQ-lists Estimated nr  Nr of syl in phonetic transcripts 99.63% of all words in CGN is correctly estimated

6 Average Syllable Duration ASDNo pausesPauses included Literature177 ms All CGN files186 ms237 ms One Speaker185 ms239 ms Read-aloud188 ms256 ms

7 Availability Calculator When pronunciation time not given: estimate it Subtitles: 70 chars / 6 sec = chars/sec If nr of chars > nr of available chars => compress sentence

8 Sentence Compressor Parallel Corpus Sentence Analysis Sentence Compression Evaluation

9 Parallel Corpus Sentence aligned Source & Target corpus: –Tagging –Chunking –SSUB detection Chunk alignment

10 Chunk Alignment Every 4-gram from src-chnk is compared with every 4-gram from tgt-chnk A = ( m / (m+n)). (L1 + L2)/2 If (A > 0.315) then Align Chunk F-value for NP/PP-alignment is 95%

11 Sentence Analysis Tagging (TnT): accuracy = 96.2% (Oostdijk et al., 2002) Chunking Chunk TypePrec.RecallF-value NP94.36%93.91%94.13% PP94.84%95.22%95.03%

12 Sentence Analysis (2) SSUB detection Type of SPrec.RecallF-value OTI71.43%65.22%68.18% RELP69.66%68.89%69.27% SSUB56.83%60.77%58.74%

13 Sentence Compression Use of statistics Use of rules Word reduction Selection of the Compressed Sentence

14 Use of statistics

15 Use of rules To avoid generating ungrammatical sentences Rules of type For every NP, never remove the head noun Rules are applied recursively

16 Word Reduction Example: replace gevangenisstraf by straf Counterexample: replace voetbal by bal Making use of Wordbuilding module (WP2) Introduces a lot of errors: added accuracy? Better integration with rest of system should be possible

17 Selection of the Compressed Sentence All previous steps result in an ordered list of sentence alternatives –Supposedly grammatically correct –Sentences are ordered depending on their probability –First sentence (most probable) with a length smaller than available nr of chars is chosen

18 Evaluation ConditionABC ASD185 ms/syl192ms/syl256 ms/syl No output44.33%41.67%15.67% Red rate39.93%37.65%16.93% Interrater Agreement 86.2%86.9%91.7% Accurate4.8%8.0%28.9% ± accurate28.1%26.3%22.1% Reasonable32.9%34.3%51%

19 Subtitle Layout Generator Actieve of gewezen voetballers zoals Ruud Gullit of Dennis Bergkamp moeten het stellen met nauwelijks anderhalf miljard. wordt Actieve of gewezen voetballers zoals Ruud Gullit of Dennis Bergkamp moeten het stellen met nauwelijks anderhalf miljard.

20 Conclusion System approach works very well: –If sentence analysis is correct –If there are possible reductions (according to the ruleset) A lot of No Output cases: System cannot reduce sentence –Sentence cannot be reduced (even by humans) –Rule-set is too strict / Wrong sentence analysis –Not fine-grained enough statistical info Bad output: –Wrong sentence analysis (CONJ) –Wrong word-reductions

21 Future Near future (within Atranos) –Better integration of word-reduction –Combine advantages of CNTS approach and CCL approach into one approach Far future (outside Atranos) –Better sentence analysis: full parse is needed –More fine-grained analysis of parallel corpus


Download ppt "WP4-22. Final Evaluation of Subtitle Generator Vincent Vandeghinste, Pan Yi CCL – KULeuven."

Similar presentations


Ads by Google