Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU.

Similar presentations


Presentation on theme: "Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU."— Presentation transcript:

1 Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

2 Overview of Presentation Speech & Language Technologies in the NGL CSET

3 Overview of Presentation Speech & Language Technologies in the NGL CSET Facilitating Optimal Multilingual NGL Applications

4 Overview of Presentation Speech & Language Technologies in the NGL CSET Facilitating Optimal Multilingual NGL Applications Key Research Challenges

5 Overview of Presentation Speech & Language Technologies in the NGL CSET Facilitating Optimal Multilingual NGL Applications Key Research Challenges Novel Research Tracks

6 Overview of Presentation Speech & Language Technologies in the NGL CSET Facilitating Optimal Multilingual NGL Applications Key Research Challenges Novel Research Tracks Typical LSP’s Translation Process

7 Overview of Presentation Speech & Language Technologies in the NGL CSET Facilitating Optimal Multilingual NGL Applications Key Research Challenges Novel Research Tracks Typical LSP’s Translation Process Key Integration Challenges

8 Overview of Presentation Speech & Language Technologies in the NGL CSET Facilitating Optimal Multilingual NGL Applications Key Research Challenges Novel Research Tracks Typical LSP’s Translation Process Key Integration Challenges Concluding Remarks

9 ILT - Integrated Language Technologies Next Generation Localisation Systems Framework Enterprise Localisation Personalised Localisation Unified Model Digital Content Management Integrated Language Technologies Prof. Andy Way ILT Area Coordinator

10 ILT: Facilitating Optimal Multilingual NGL Applications Machine Translation Text Input Text Output Text Processing e.g. bulk localisation

11 ILT: Facilitating Optimal Multilingual NGL Applications Speech Technologies Machine Translation Text Input Text Output Speech Output Speech Input Text Processing e.g. bulk localisation e.g. personalisation

12 Machine Translation: Significance For our industrial partners, volume of material needing translation increasing, while budgets remain the same In the EU, now 23 official languages (506 language pairs), and expanding … In the US, huge investment in translation between Arabic , Chinese  and Urdu  English …

13 Machine Translation: Significance For our industrial partners, volume of material needing translation increasing, while budgets remain the same In the EU, now 23 official languages (506 language pairs), and expanding … In the US, huge investment in translation between Arabic , Chinese  and Urdu  English …  Automation the only option (especially for PL) …

14 Enhanced Translation Quality MT: Key Research Challenges Enhanced Translation Quality Faster Translation Times Scalability Other Modalities (Speech, SMS etc.)

15 The State-of-the-Art Source: Reference: The two sides highlighted the role of the World Trade Organization (WTO) Baseline: The two sides on the role of the WTO

16 Improving the State-of-the-Art Our MT systems have knowledge of syntax Parts of speech (nouns, verbs etc.) Roles in sentences (subject, object etc.)  better translation quality Source: Reference: The two sides highlighted the role of the World Trade Organization (WTO) Baseline: The two sides on the role of the WTO Our System: The two sides reaffirmed the role of the WTO

17 The State-of-the-Art Source: Reference: Mahmoud Abbas: The wall and settlements will not bring Israel security Baseline: Mahmoud Abbas, the wall and settlements will provide security to Israel Our System: Mahmoud Abbas, the wall and settlements will not provide security for Israel

18 Improving the State-of-the-Art  better translation quality (especially where end-users are concerned) DCU Arabic  English system ranked first at international MT evaluation in Oct. 2007 Source: Reference: Mahmoud Abbas: The wall and settlements will not bring Israel security Baseline: Mahmoud Abbas, the wall and settlements will provide security to Israel Our System: Mahmoud Abbas, the wall and settlements will not provide security for Israel

19 MT Novel Research: Handling Different Types of Text Translating patent applications, or doctors’ prescriptions, or visa applications: different tasks, as the content is different … So is the form …

20 MT Novel Research: Handling Different Types of Text Translating patent applications, or doctors’ prescriptions, or visa applications: different tasks, as the content is different … So is the form …  Build different MT systems for each different task, using our industrial partners’ documentation

21 Text Processing: Significance and Challenges If texts are automatically annotated with: syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM)

22 Text Processing: Significance and Challenges If texts are automatically annotated with: syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM) text-type and genre information, this helps our MT systems disambiguate text and improve translation quality

23 Text Processing: Significance and Challenges If texts are automatically annotated with: syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM) text-type and genre information, this helps our MT systems disambiguate text and improve translation quality localisation information (e.g. Andy Way ), then the workflows of our industrial partners (currently done manually) can be significantly improved (cf. LOC)

24 Speech Technology : Significance Speech interfaces for eyes-busy, hands-busy scenairos Speech recognition and synthesis systems which can deal with potentially an unlimited vocabulary multiple (and non-native) speakers multiple languages and can be tightly integrated with MT  localisation & personalisation  volume & scalability  access

25 the more it snows the more it goes… them ore its nows them ore it goes? themoreitsnows themoreitgoes Speech Technology: Challenges

26 the more it snows the more it goes… them ore its nows them ore it goes? themoreitsnows themoreitgoes demoreisnows demoregoes Speech Technology: Challenges

27 themoreitsnows themoreitgoes linguistic competence of native speaker “rules” and vocabulary of system performance of (native) speaker Speech Technology: Challenges the more it snows the more it goes… them ore its nows them ore it goes? demoreisnows demoregoes

28 themoreitsnows themoreitgoes the more it snows the more it goes… linguistic competence of native speaker them ore its nows them ore it goes? “rules” and vocabulary of system performance of (native) speaker Speech Technology: Innovations which integrates explicit linguistic knowledge Robust & Novel Speech Recognition Engine demoreisnows demoregoes

29 themoreitsnows themoreitgoes detverkarhavarite nstorstormhurmån the more it snows the more it goes… linguistic competence of native speaker them ore its nows them ore it goes? “rules” and vocabulary of system Jemehreschneit destomehres geht Innovations: Speech Recognition & MT Robust & Novel Speech Recognition Engine Tight coupling with MT Engines which integrates explicit linguistic knowledge

30 themoreitsnows themoreitgoes detverkarhavarite nstorstormhurmån Jemehreschneit destomehres geht Innovations: MT & Speech Synthesis Robust & Novel Speech Synthesis Engine which integrates explicit linguistic knowledge Tight coupling with MT Engines

31 Typical LSP’s Translation Process Freelance Translators Step 2: Post- editing & translation In-house Translators Incoming documents (segmented) Partially Translated Documents, with confidence rating for segments Translation Memory DB Step 1: Translation Memory Step 3: Documents Validation & Finalization Requirement: minimal disruption of this process & Machine Translation TM match score < 50 %: expensive 50 % < TM match score < 70 %: medium TM match score > 70 %: cheap

32 Key Integration Challenges Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]

33 Key Integration Challenges Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] Linking MT automatic evaluation metrics with post- editing cost

34 Key Integration Challenges Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] Linking MT automatic evaluation metrics with post- editing cost Ensuring that MT omissions are highlighted

35 Key Integration Challenges Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] Linking MT automatic evaluation metrics with post- editing cost Ensuring that MT omissions are highlighted Enforcing customer terminology

36 Key Integration Challenges Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] Linking MT automatic evaluation metrics with post- editing cost Ensuring that MT omissions are highlighted Enforcing customer terminology Deal with markup, tags …

37 Key Integration Challenges Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] Linking MT automatic evaluation metrics with post- editing cost Ensuring that MT omissions are highlighted Enforcing customer terminology Deal with markup, tags … Produce true-cased translations

38 Key Integration Challenges Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] Linking MT automatic evaluation metrics with post- editing cost Ensuring that MT omissions are highlighted Enforcing customer terminology Deal with markup, tags … Produce true-cased translations Integrate into pre-existing workflows!

39 Concluding Remarks For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students

40 Concluding Remarks For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students Large interest from industrial partners, both large and small

41 Concluding Remarks For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students Large interest from industrial partners, both large and small Input from LOC, DCM and SF

42 Concluding Remarks For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students Large interest from industrial partners, both large and small Input from LOC, DCM and SF Significant role in CNGL demonstrators

43 Concluding Remarks For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students Large interest from industrial partners, both large and small Input from LOC, DCM and SF Significant role in CNGL demonstrators Research tools  Industrial prototypes

44 Concluding Remarks For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students Large interest from industrial partners, both large and small Input from LOC, DCM and SF Significant role in CNGL demonstrators Research tools  Industrial prototypes Well placed to succeed in going ‘beyond TMs’ …

45 Speech & Language Technologies in the NGL CSET Thanks for listening! Questions? http://www.cngl.ie away@computing.dcu.ie


Download ppt "Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU."

Similar presentations


Ads by Google