Speech based Drug Information System for Aged and Visually Impaired Persons Géza Németh, Gábor Olaszy, Mátyás Bartalis, Géza Kiss, Csaba Zainkó, and Péter.

Slides:



Advertisements
Similar presentations
Module 3: Block 3 Call Management
Advertisements

Chapter 7 Constructors and Other Tools. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-2 Learning Objectives Constructors Definitions.
Completing the Medicine Price Data Collection form Presentation template for adaptation and use in medicine prices and availability survey training workshop.
1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.
LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0)
Programming Language Concepts
Welcome to Medicines in My Home.
Telephone Skills.
Chapter 11: Models of Computation
© Richard A. Medeiros 2004 x y Function Machine Function Machine next.
Tuning Jenny Burr August Discussion Topics What is tuning? What is the process of tuning?
© Telcordia Technologies 2004 – All Rights Reserved AETG Web Service Advanced Features AETG is a service mark of Telcordia Technologies. Telcordia Technologies.
Adaptive Solutions Inc. ADAPTIVE SOLUTIONS, INC. Mobile, Alabama Proudly Presents Technology for individuals with Special Needs.
Proposals for Extending SSML 1.0 from the Point-of-View of Hungarian TTS Developers Géza Németh, Géza Kiss, Bálint Tóth Laboratory of Speech Technology,
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
Benchmark Series Microsoft Excel 2013 Level 2
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
VMR-WB – Operation of the 3GPP2 Wideband Speech Coding Standard M. Jelinek†, R. Salami‡ and S. Ahmadi * †University of Sherbrooke, Canada ‡VoiceAge Corporation,
Improved Name Recognition with Meta-data Dependent Name Networks published by Sameer R. Maskey, Michiel Bacchiani, Brian Roark, and Richard Sproat presented.
1 How Do I Order From.decimal? Rev 05/04/09 This instructional training document may be updated at anytime. Please visit and check the.
© Aastra 2012 CMG 7.5 Speech Attendant Sales Presentation.
Chapter 10: The Traditional Approach to Design
Systems Analysis and Design in a Changing World, Fifth Edition
James A. Senn’s Information Technology, 3rd Edition
Building an ASR using HTK CS4706
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Auditory User Interfaces
Why is ASR Hard? Natural speech is continuous
Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Introduction to Automatic Speech Recognition
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Arizona English Language Learner Assessment AZELLA
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
Performance Comparison of Speaker and Emotion Recognition
VoiceXML – Speech Recognition Yousef Rabah. VoiceXML Markup Language Dialogs Dependencies Standalone Vs. Hosted Speaker Dependent Vs. Speaker Independent.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Intro to Health Science Chapter 4 Section 3.3
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
Basics of Natural Language Processing Introduction to Computational Linguistics.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
A NONPARAMETRIC BAYESIAN APPROACH FOR
G. Anushiya Rachel Project Officer
Automatic Speech Recognition
Automatic Speech Recognition
Artificial Intelligence for Speech Recognition
Statistical Models for Automatic Speech Recognition
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Statistical Models for Automatic Speech Recognition
Presentation transcript:

Speech based Drug Information System for Aged and Visually Impaired Persons Géza Németh, Gábor Olaszy, Mátyás Bartalis, Géza Kiss, Csaba Zainkó, and Péter Mihajlik Department of Telecommunications and Media Informatics, BME, Budapest, Hungary { nemeth, olaszy, bartalis, kgeza, zainko, mihajlik 1. System summary Extend Medicine Line to other languages looking for partners (e.g. FP7 call) User behaviour analysis Extension with other functions (e.g. coupling to home care systems) 5. TTS for pharmaceutical texts 7. Evaluation4. ASR for drug names Reading medical terminology by TTS [3] required the following adaptations: Pronunciation sub-module for non-Hungarian words, abbreviations, chemical expressions. Original examples:N-hepa; hidroxi-propil-metil-cellulóz; 40 µg PGE; 800 mOsm/1; kallikrein inactivator unit; HMG-CoA; non-Hodgkin lymphoma Special prosody module for pharmaceutical texts with 34 new rules for pause and 12 new rules for prosody. This module handles long and complicated sentences in PILs, text parts between brackets, long enumerations etc. Translated example: If you feel side effects, as for example squeamishness of stomach, sweating, shaking, weakness, giddiness, dryness in the mouth, sleepiness, sleeplessness, costiveness, diarrhea, less appetite, nervousness, excitement, headache, sexual troubles, please ask your doctor to modify the dosage. 2. System components Key references [1] Henton, C. "Bitter Pills to Swallow. ASR and TTS have Drug Problems" Int. Journal of Speech Technology. 8, 2005., pp [2] Fegyó, T., Mihajlik, P., Szarvas, M., Tatai, P., Tatai, G., "VOXenter - Intelligent voice enabled call center for Hungarian", In Proc. Eurospeech 03, pp [3] Olaszy, G., Németh, G., Olaszi, P., Kiss, G., Zainkó, Cs., Gordos, G. "Profivox – a Hungarian TTS System for Telecommunications Applications", Int. Journal of Speech Technology, Vol 3-4. Kluwer Academic Publishers, 2000., pp Acknowledgements The project was realized together with the National Institute of Pharmacy, Budapest, Hungary. This work was supported by the Hungarian National Office for Research and Technology (GVOP project no ). 8. Sample dialogue (translated) User dials Machine: Welcome to the Drug Information System. Please tell the drug name after the beep! U: Algopyrin M: Algopyrin, If this is not the desired drug please press #. M: Three versions of this drug are available in our database: Algopyrin 1g/2ml injection (first); Algopyrin 500 mg pill (second); Algopyrin Complex pill (third). Please tell the desired number. U: Second M: Algopyrin 500 mg pill, Please pronounce one of the following chapter titles: What is the drug used for? Before use. How to use? Side effects. How to store? U: Side effects. Machine begins to read the chapter Side effects. 6. Automatic updater maintanence An automatic updater helps the operator to maintain the database to load new drugs or leaflets or delete old ones. A new entry can be automatically checked for proper recognition and pronunciation performance. 3. User controls Speed: normal/faster (selection at the beginning of the dialogue) Read previous sentence Read next sentence Repeat the actual sentence Jump to the beginning of the chapter Jump to the end of the chapter Stop/continue To try dial: (Hungarian only) ASR test by phone for 1321 drug names 3 male and 3 female speakers. Aged between % is the recognition rate after the first utterance. Complex system evaluation in 3 user goups under 25 years (15 persons) years (33 persons, including 7 visually impaired) over 60 years (12 persons) System evaluation results clearly show the difference between the young and the elderly generation: Group A found the voice of the TTS less intelligible than Group C. Elderly people (C) found the speed of synthetic speech mostly very good, but young people (A) regarded it too slow. People of Group C found user friendliness acceptable, and persons of group A had an even better opinion. Only question 4 was evaluated by all groups similarly. 9. Future plans Medicine Line (MLN) is an automatic Hungarian telephone information system operating in Hungary since December It is primarily intended for visually handicapped persons and elderly people with normal speech functions.[1] The caller tells the drug name, the chapter title etc., which is processed by a specialized ASR module. Medicine Line reads the Patient Information Leaflets (PIL) chapter by chapter. In Hungary there are about 5000 different medicine types approved by the National Institute of Pharmacy. The output is given by a TTS synthesizer specialized to read drug names, Latin words and pharmaceutical texts correctly. The MLN system ensures 24 hour access. General purpose recognizer [2] adapted to drug names with the following features: MFCC+BEQ acoustic features GMM-HMM acoustic models are trained on 20 hours of SpeechDat-like data (Hungarian only) ML estimation of GMM parameters up to 10 mixtures Speaker independent decision-tree state clustered cross- word Hungarian triphone models Hungarian grapheme to phoneme rules tailored to drug names + manual control on the output transcription 5 Kword vocabulary 0-gram language model Off-line recognition network optimization using WFST algorithms Real-time one-pass decoding of up to 32 channels