LingTour http://www.get-telecom.fr/
Outline Rationale for Lingtour Objectives Lingtour partners Technical developments Application architecture The Lingtour Project Groupe des Ecoles des Télécommunications
Objectives: 3 scenarios Accessing information: the Virtual Guide Facilitating communication: the Communication Assistant Finding local information: the Orientation Assistant The Lingtour Project Groupe des Ecoles des Télécommunications
Rationale for Lingtour A more user-friendly assistant Multimedia (text, speech, image, video) Multimodal access (text, speech, pen, visual I/O) Initially targeted for tourist applications The Lingtour Project Groupe des Ecoles des Télécommunications
Accessing information: the Virtual Guide Convenient and rapid way to access useful information, locally or from a remote server Hotel / restaurant (location/style/pricing), Travel (possibilities/hours/fares), City transportation (routes/time/fares/traffic), Places to go / visit (location/hours/fees/route) Multimodal Combining speech, text, map/image browsing Interactive (dialogues, question refinement) Zoomable User Interfaces (ZUIs) + 2D Control menus Tap and talk Embodied Conversational Agents (ECAs) The Lingtour Project Groupe des Ecoles des Télécommunications
Facilitating communication: the Communication Assistant Visual display to mediate the dialogue Translation assistant browsable sets of questions / answers focused on useful situations : taxi, hotel, haggling over… browsable lexicon to help communication for speech training thanks to the includes ASR and TTS Access to a remote server / operator for difficult tasks Multimodal Speech + text + sketching Interactive 2D Control menus Tap and talk ECA + TTS for speech and gestural training The Lingtour Project Groupe des Ecoles des Télécommunications
The Communication Assistant: modes of operation Tourist-to-local communication, or Local-to-tourist communication Speech / text / menu-selected input Menus for refinement / correction of ASR Translation Display and speech synthesis of translation Pronunciation practice From lexicon or virtual guide items Training modules Downloaded from a server situation-specific (hotel, restaurant, taxi…) The Lingtour Project Groupe des Ecoles des Télécommunications
Finding local information: the Orientation Assistant Collecting input around the device to Help localize the user interpret the environment “intelligent camera” : ability to refine pictures integrated (Chinese) character recognition can also operate on characters sketched on the display ? localization facilities based on triangularisation and / or picture interpretation possibility subject to the local network(s) characteristics. The Lingtour Project Groupe des Ecoles des Télécommunications
Lingtour partners TsingHua University CLIPS Paris 8 University INT Pr. Mao Yuhang: translation from Chinese to French and English Pr. Ding Xiaoqing: Chinese OCR, intelligent camera Pr. Wang Zuo-yin: ASR CLIPS Christian Boitet: translation Mutsuko Tomokiyo: Multimedia-UNL Paris 8 University Catherine Pélachaud: ECAs INT Yang Ni: image refinement Bernadette Dorizzi: HCI ENST-Paris Gérard Chollet + Shiuan-Sung Lin: multilingual SR Eric Lecolinet: ZUIs and 2-D control menus Laurence Likforman: OCR Jacques Prado + Alain Goyé: PDA-server communications ENST-Bretagne Yannis Haralambous + Andre Thepaut: OCR The Lingtour Project Groupe des Ecoles des Télécommunications
Technical developments Chinese character recognition « Intelligent » Camera Text extraction Multilingual Speech Recognition Zoomable User Interfaces with 2-D control menus « Cultural » Embedded Conversational Agents The Lingtour Project Groupe des Ecoles des Télécommunications
Chinese character recognition The Lingtour Project Groupe des Ecoles des Télécommunications
Intelligent camera from TsingHua University capture reco translation The Lingtour Project Groupe des Ecoles des Télécommunications
Extracting text from scene images Complex color images Uncontrolled illumination Variations : size, fonts, orientation, texture Complex backgrounds, shadows The Lingtour Project Groupe des Ecoles des Télécommunications
Text extraction Searching for character regions (text has uniform color) Multi-channel decomposition Connected components analysis Grouping of components Alignment analysis (number of horizontally or vertically aligned components) Text identification (language independant features : size, alignment,…) Detection rate : 84 % False alarm rate : 5.6 % The Lingtour Project Groupe des Ecoles des Télécommunications
Automatic Speech Recognition in Multiple Languages Sharing of acoustic models between languages to simplify extensibility to other languages. Combination of phone models and adaptation from small amounts of data in new languages. Model adaptation to user and environmental situations. Shared acoustic models Chinese French Language specific models The Lingtour Project Groupe des Ecoles des Télécommunications
Zoomable user interfaces with 2-D control menus combine the selection and the control of an operation integrate up to two scroll bars or spin-boxes users keep their attention focused on the contents can have sub-menus retain novice and expert modes as marking menus http://www.infres.enst.fr/net/zomit/cdi.html The Lingtour Project Groupe des Ecoles des Télécommunications
Cultural Embedded Conversational Agents Behaviour adaptable to: cultural and social context user (tourist, journalist) various forms / complexity (2D, 3D, vector…) depending on device (PDA, Kiosk) driven by a Representation Language based on XML-XSD standard (UNL type) embedding the influence of a given culture, for example on: choice of communicative gesture (smile vs head nod) the duration of gaze… The Lingtour Project Groupe des Ecoles des Télécommunications
Application architecture UMTS (?) server Access information a word graph, + a list of keywords Translation Speech synthesis The Lingtour Project Groupe des Ecoles des Télécommunications