World Languages Mandarin English Challenges in Mandarin Speech Recognition  Highly developed language model is required due to highly contextual nature.

Slides:



Advertisements
Similar presentations
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Advertisements

Introduction to Computational Linguistics
Introduction to Computational Linguistics
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Sub-Project I Prosody, Tones and Text-To-Speech Synthesis Sin-Horng Chen (PI), Chiu-yu Tseng (Co-PI), Yih-Ru Wang (Co-PI), Yuan-Fu Liao (Co-PI), Lin-shan.
Mandarin Chinese Speech Recognition. Mandarin Chinese Tonal language (inflection matters!) Tonal language (inflection matters!) 1 st tone – High, constant.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
Introduction to Computational Linguistics Lecture 2.
Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
To an Automatic Speech Recognition System? Jiang Wu Electrical Engineering Department.
12.0 Computer-Assisted Language Learning (CALL) References: 1.“An Overview of Spoken Language Technology for Education”, Speech Communications, 51, pp ,
Review of the paper entitled “The development of a phonetically balanced word recognition test in the Ilocano language” written by Renita Sagon, Doctor.
Artificial Intelligence. Agenda StartEnd Introduction AI Future Recent Developments Turing Test Turing Test Evaluation.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Toshiba (China) R&D Center LOU Xiaoyan, LI Jian Research and Development Center, Toshiba China Suggestions on Tone and Word Boundary of Mandarin for SSML.
A Survey of ICASSP 2013 Language Model Department of Computer Science & Information Engineering National Taiwan Normal University 報告者:郝柏翰 2013/06/19.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
English vs. Mandarin: A Phonetic Comparison Experimental Setup Abstract The focus of this work is to assess the performance of three new variational inference.
Mandarin-English Information (MEI) Johns Hopkins University Summer Workshop 2000 presented at the TDT-3 Workshop February 28, 2000 Helen Meng The Chinese.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Nasal endings of Taiwan Mandarin: Production, perception, and linguistic change Student : Shu-Ping Huang ID No. : NA3C0004 Professor : Dr. Chung Chienjer.
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander.
Automated Scoring of Picture- based Story Narration Swapna Somasundaran Chong Min Lee Martin Chodorow Xinhao Wang.
1 The Ferret Copy Detector Finding short passages of similar texts in large document collections Relevance to natural computing: System is based on processing.
Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
THE IMPACT OF LINGUISTIC AND CULTURAL DIVERSITY ON SPEECH SOUND PRODUCTION.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Acoustic Properties of Taiwanese High School Students ’ Stress in English Intonation Advisor: Dr. Raung-Fu Chung Student: Hong-Yao Chen.
學中文? Tsuhua Susan Chen  Mastering Chinese will open up your window to the world  Mastering Chinese will open up your window of opportunities (both.
CS 6998 Computational Approach to Emotional Speech Instructor: Prof. Julia Hirschberg Columbia University 12/21/2009 Julia’s Little Helper : A Real-time.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
I. INTRODUCTION.
Rutgers Multimedia Chinese Teaching System (RMCTS) MERLOT International Conference, August 7-10, 2008.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of three new variational inference.
Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.
Learning Usage of English KWICly with WebLEAP/DSR Takashi Yamanoue Kagoshima University, Japan Toshiro Minami Kyushu Institute of Information Sciences.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.
Tone Recognition With Fractionized Models and Outlined Features Ye Tian, Jian-Lai Zhou, Min Chu, Eric Chang ICASSP 2004 Hsiao-Tsung Hung Department of.
English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of new variational inference.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung.
Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.
An Adaptive Learning with an Application to Chinese Homophone Disambiguation from Yue-shi Lee International Journal of Computer Processing of Oriental.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Linguistic knowledge for Speech recognition
Investigating Pitch Accent Recognition in Non-native Speech
College of Engineering Temple University
Automatic Speech Recognition
College of Engineering
A Country Report – COCOSDA Activities in China Data More and more companies on data resources and services suppliers are emerging in China: a new.
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Audio Books for Phonetics Research
Mandarin and Dialects 普通话与方言.
Teaching English to Speakers of Other Languages
Oral Reading and the Development of Early Reading Ability
Research on the Modeling of Chinese Continuous Speech Recognition
Ju Lin, Yanlu Xie, Yingming Gao, Jinsong Zhang
Presentation transcript:

World Languages Mandarin English Challenges in Mandarin Speech Recognition  Highly developed language model is required due to highly contextual nature of Mandarin  Tone modeling  Coarticulation  Large # of homophones  Chinese text is unsegmented  No standard lexicon  Chinese sentence/word structure is very flexible  Ex: Beijing Da Xue -> Bei Da Challenges in Mandarin Speech Recognition  Highly developed language model is required due to highly contextual nature of Mandarin  Tone modeling  Coarticulation  Large # of homophones  Chinese text is unsegmented  No standard lexicon  Chinese sentence/word structure is very flexible  Ex: Beijing Da Xue -> Bei Da Abstract Speech recognition has recently gained a great deal of popularity in video gaming, robotics, speech prosthetics, and mobile technology. With China becoming an increasingly important player in the world economy, a new demand for Mandarin speech recognition has emerged. This poster will discuss Applications of speech recognition Obstacles in English vs. Mandarin speech recognition Methods used to successfully implement Mandarin speech recognition systems. Benchmarks in the history of Mandarin speech recognition Abstract Speech recognition has recently gained a great deal of popularity in video gaming, robotics, speech prosthetics, and mobile technology. With China becoming an increasingly important player in the world economy, a new demand for Mandarin speech recognition has emerged. This poster will discuss Applications of speech recognition Obstacles in English vs. Mandarin speech recognition Methods used to successfully implement Mandarin speech recognition systems. Benchmarks in the history of Mandarin speech recognition John Steinberg & Dr. Joseph Picone Department of Electrical and Computer Engineering, Temple University, Philadelphia, Pennsylvania An Introduction to Mandarin Speech Recognition College of Engineering Temple University Speech Recognition Systems Speech Recognition Systems Modeling Methods  Prosodic Features  Models rhythm, focus, and intonation of speech  Reduces search space (i.e. reduces number of homophones and constrains sentence flexibility)  Language Models  Reduces solution space by using probable combinations of phones, words, phrases, etc. Modeling Methods  Prosodic Features  Models rhythm, focus, and intonation of speech  Reduces search space (i.e. reduces number of homophones and constrains sentence flexibility)  Language Models  Reduces solution space by using probable combinations of phones, words, phrases, etc. A Timeline of Mandarin Speech Recognition Mandarin Chinese Conclusions Mandarin speech recognition is a developing and increasingly important field Mandarin speech recognition involves many challenges including large numbers of homophones, word & sentence flexibility, a lack of a segmented lexicon, and coarticulation effects. Possible improvements to CER may be found in prosodic features and language models Reference [1] Lee, C-H. “Advances in Chinese Spoken Language Processing”, World Scientific Publishing Co., Singapore, 2007 pp 128 [2]Lin-Shan Lee; Keh-Jiann Chen; et. al., "Golden Mandarin(II)-an intelligent Mandarin dictation machine for Chinese character input with adaptation/learning functions," ISSIPNN '94, 1994 pp vol.1, Apr 1994 [3] Ren-Yuan Lyu; Lee-Feng Chien; et. al., "Golden Mandarin (III)-a user-adaptive prosodic-segment-based Mandarin dictation machine for Chinese language with very large vocabulary," ICASSP-95, vol.1, pp vol.1, 9-12 May 1995 [4]“The History of Automatic Speech Recognition Evaluations at NIST,” [5] Lee, L.S.; Tseng, C.Y.; Gu, H.Y.; et. al., "Golden Mandarin (I)-A real-time Mandarin speech dictation machine for Chinese language with very large vocabulary," Speech and Audio Processing, IEEE Transactions on, vol.1, no.2, pp , Apr 1993 [6] ] K. Kūriákī, A Grammar of Modern Indo-European, Asociación Cultural Dnghu, 2007 Conclusions Mandarin speech recognition is a developing and increasingly important field Mandarin speech recognition involves many challenges including large numbers of homophones, word & sentence flexibility, a lack of a segmented lexicon, and coarticulation effects. Possible improvements to CER may be found in prosodic features and language models Reference [1] Lee, C-H. “Advances in Chinese Spoken Language Processing”, World Scientific Publishing Co., Singapore, 2007 pp 128 [2]Lin-Shan Lee; Keh-Jiann Chen; et. al., "Golden Mandarin(II)-an intelligent Mandarin dictation machine for Chinese character input with adaptation/learning functions," ISSIPNN '94, 1994 pp vol.1, Apr 1994 [3] Ren-Yuan Lyu; Lee-Feng Chien; et. al., "Golden Mandarin (III)-a user-adaptive prosodic-segment-based Mandarin dictation machine for Chinese language with very large vocabulary," ICASSP-95, vol.1, pp vol.1, 9-12 May 1995 [4]“The History of Automatic Speech Recognition Evaluations at NIST,” [5] Lee, L.S.; Tseng, C.Y.; Gu, H.Y.; et. al., "Golden Mandarin (I)-A real-time Mandarin speech dictation machine for Chinese language with very large vocabulary," Speech and Audio Processing, IEEE Transactions on, vol.1, no.2, pp , Apr 1993 [6] ] K. Kūriákī, A Grammar of Modern Indo-European, Asociación Cultural Dnghu, 2007 A Timeline of Benchmarks English speakers in the World: ~ 350 million [2] Estimated # of current English learners in China: million [3] Estimated # of native Mandarin speakers: 1+ Billion Tonal Language  4 distinct tones, 1 neutral tone Tonal Language  4 distinct tones, 1 neutral tone Character Based  characters compose 80k-200k common words  Act as morphemes  Monosyllabic  Have single associated tone Character Based  characters compose 80k-200k common words  Act as morphemes  Monosyllabic  Have single associated tone Coarticulation  Context can cause changes in tone Bu4 + Dui4 = Bu2 Dui4 Ni3 + Hao3 = Ni2 Hao3Coarticulation  Context can cause changes in tone Bu4 + Dui4 = Bu2 Dui4 Ni3 + Hao3 = Ni2 Hao3 Highly Contextual  Few syllables compared to English  English: ~10,000  Mandarin: ~1300  High # of homo- phones 7 instances of “ma” Highly Contextual  Few syllables compared to English  English: ~10,000  Mandarin: ~1300  High # of homo- phones 7 instances of “ma” [1] Speech Recognition Applications Speech Recognition Applications MobileTechnology Auto/GPS NationalIntelligence Other Applications Translators Prostheses Language Educ. Multimedia Search Prosodic Features Used for segmentation Measures pause duration, segment / syllable duration, and F 0 reset. Prosodic Features Used for segmentation Measures pause duration, segment / syllable duration, and F 0 reset. Language Models N-Grams Probabilities of the occurrence of N sequences of words Neural-Networks High dimensionality Random Forests Incorporates morphology into language model Language Models N-Grams Probabilities of the occurrence of N sequences of words Neural-Networks High dimensionality Random Forests Incorporates morphology into language model [1] [4] Language families across the world [6] Language families by speaker population [6]