Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System Alon Lavie, Carnegie Mellon University Florian Metze, University of.

Similar presentations


Presentation on theme: "A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System Alon Lavie, Carnegie Mellon University Florian Metze, University of."— Presentation transcript:

1 A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System Alon Lavie, Carnegie Mellon University Florian Metze, University of Karlsruhe Roldano Cattoni, ITC-irst Erica Costantini, University of Trieste

2 July 8, 2002ACL-02 S2S-Translation Wshp2 Outline The NESPOLE! Project Approach and System Architecture Performance and Usability Challenges: –Distributed real-time performance over internet –Integration and use of multi-modal capabilities –End-to-end Translation performance Lessons learned and conclusions

3 Speech-to-speech translation for E-Commerce applications Partners: CMU, Univ of Karlsruhe, ITC-irst, UJF-CLIPS, AETHRA, APT-Trentino Builds on successful collaboration within C-STAR Improved limited-domain speech translation Experiment with multimodality and with MEMT Showcase-1: Travel and Tourism in Trentino, completed in Nov-2001, demonstrated Showcase-2: expanded travel + medical service

4 July 8, 2002ACL-02 S2S-Translation Wshp4 Speech-to-speech in E-commerce Replace current passive web E- commerce with live interaction capabilities Client starts via web, can easily connect to agent for specific information “Thin client” - very little special hardware and software on client PC: browser, MS Netmeeting, Shared Whiteboard

5 July 8, 2002ACL-02 S2S-Translation Wshp5 NESPOLE! User Interfaces

6 July 8, 2002ACL-02 S2S-Translation Wshp6 NESPOLE! Architecture

7 July 8, 2002ACL-02 S2S-Translation Wshp7 Distributed S2S Translation over the Internet

8 July 8, 2002ACL-02 S2S-Translation Wshp8 Network Traffic Impact

9 July 8, 2002ACL-02 S2S-Translation Wshp9 NESPOLE! Monitor

10 July 8, 2002ACL-02 S2S-Translation Wshp10 Aethra Whiteboard

11 July 8, 2002ACL-02 S2S-Translation Wshp11 Recent Developments: Apr-02 Improved analysis and generation grammars (using old C-STAR data) Improved SR engines Packet-loss, video, and modem connection tests Data Collection for Showcase 2A Evaluation Scheme Experiment Paper and Demo at HLT-02 Paper submissions to ACL-02, ICSLP-02, ESSLLI-02

12 July 8, 2002ACL-02 S2S-Translation Wshp12 IF Status Report Presented by Donna Gates

13 July 8, 2002ACL-02 S2S-Translation Wshp13 WP5: HLT Modules Data Collection for Showcase-2A completed in February-2002 Status of transcriptions from all sites? CMU will maintain a data repository: (Alon collecting all data CDs here) IF discussions and development have already started (Donna) Development Schedule?

14 July 8, 2002ACL-02 S2S-Translation Wshp14 WP7: Evaluation D9: Evaluation of Showcase-1 Report: draft circulated earlier this week Each site should verify that most up-to-date results are being reported Include detailed tables in the report? Majority vote – finalize a common procedure New evaluation experiments

15 July 8, 2002ACL-02 S2S-Translation Wshp15 Majority Vote Scheme Issue: did all sites use same guidelines? What to do when there is no majority? –i.e. 4 graders assign P/P/K/K What to do when there is complete disagreement? –i.e. 3 graders assign P/K/B Need to recalculate scores from prev evaluation?

16 July 8, 2002ACL-02 S2S-Translation Wshp16 New Evaluation Experiments We are investigating three main issues: –Binary versus 3-way grading –Majority vote versus averaging of scores –Intercoder and Intracoder agreement Grading Experiment: –Four groups, three graders in each group –Each group grades two sets, two weeks apart –Sets are different but have a common large overlap –Groups differ in eval scheme used (binary/3-way)

17 July 8, 2002ACL-02 S2S-Translation Wshp17 Planned Analysis of Data Compare results across grading schemes (binary vs. 3-way) on same set of data Compare majority scores with average scores Evaluate Intercoder agreement between graders (on same set and same scheme) Evaluate Intracoder agreement of same grader (on overlap data in the two sets, same grading scheme in both sessions)

18 July 8, 2002ACL-02 S2S-Translation Wshp18 Preliminary Results Group(procedure)W1 AccW1 BadW2 AccW2 Bad Gr1 (binary/3-way)50.249.848.751.3 Gr2 (3-way/binary)52.447.648.851.2 Gr3 (3-way/3-way)53.846.254.945.1 Gr4 (binary/binary)49.051.050.0

19 July 8, 2002ACL-02 S2S-Translation Wshp19 Plans for Final Evaluations Improved end-to-end evaluations Additional component evaluations? Additional user studies? How do we evaluate user interfaces, communication effectiveness?


Download ppt "A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System Alon Lavie, Carnegie Mellon University Florian Metze, University of."

Similar presentations


Ads by Google