Speech Recognition Final Project Resources

Slides:



Advertisements
Similar presentations
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Advertisements

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
EE3P BEng Final Year Project – 1 st meeting SLaTE – Speech and Language Technology in Education Martin Russell
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
AN INTRODUCTION TO PRAAT Tina John M.A. Institute of Phonetics and digital Speech Processing - University Kiel Institute of Phonetics and Speech Processing.
The 1980’s Collection of large standard corpora Front ends: auditory models, dynamics Engineering: scaling to large vocabulary continuous speech Second.
Designing a Multi-Lingual Corpus Collection System Jonathan Law Naresh Trilok Pace University 04/19/2002 Advisors: Dr. Charles Tappert (Pace University)
1 Problems and Prospects in Collecting Spoken Language Data Kishore Prahallad Suryakanth V Gangashetty B. Yegnanarayana Raj Reddy IIIT Hyderabad, India.
Introduction to Automatic Speech Recognition
Section 2.1 Compare the Internet and the Web Identify Web browser components Compare Web sites and Web pages Describe types of Web sites Section 2.2 Identify.
1 JCM 106 Computer Application for Journalism Lecture 1 – Introduction to Computing.
STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Capstone Experience at UNH Manchester Student Guided Mentoring for an Undergraduate Research Group in Speech Capstone Objectives Challenges Technology.
1 Web Basics Section 1.1 Compare the Internet and the Web Compare Web sites and Web pages Identify Web browser components Describe types of Web sites Section.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
How Spread Works. Spread Spread stands for Speech and Phoneme Recognition as Educational Aid for the Deaf and Hearing Impaired Children It is a game used.
Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
Recent Activities of Speech Corpora and Assessment in Korea Yong-Ju Lee Wonkwang University Korea.
CMU Shpinx Speech Recognition Engine Reporter : Chun-Feng Liao NCCU Dept. of Computer Sceince Intelligent Media Lab.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.
Hands-on tutorial: Using Praat for analysing a speech corpus Mietta Lennes Palmse, Estonia Department of Speech Sciences University of Helsinki.
Lessons Learned Mokusei: Multilingual Conversational Interfaces Future Plans Explore language-independent approaches to speech understanding and generation.
Enhanced Infrastructure for Creation & Collection of Translation Resources Zhiyi Song, Stephanie Strassel (speaker), Gary Krug, Kazuaki Maeda.
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
UNESCO ICTLIP Module 1. Lesson 61 Introduction to Information and Communication Technologies Lesson 6. What is the Internet?
Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea.
2XML Marko Tadić Department of linguistics, Faculty of philosophy, University of Zagreb ( Tübingen,
The SAIC Operation 54 Network and the Internet. Overview The purpose of this brown bag training session is to provide you with an introduction to the.
Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
Training and Evaluation Tool Milan Jovic Dusan Jevtic Dr Dragan Jankovic Public Reporting on Project Results TEMPUS project.
A quick walk through phonetic databases Read English –TIMIT –Boston University Radio News Spontaneous English –Switchboard ICSI transcriptions –Buckeye.
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
S PEECH T ECHNOLOGY Answers to some Questions. S PEECH T ECHNOLOGY WHAT IS SPEECH TECHNOLOGY ABOUT ?? SPEECH TECHNOLOGY IS ABOUT PROCESSING HUMAN SPEECH.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Search and Decoding Final Project Identify Type of Articles Using Property of Perplexity By Chih-Ti Shih Advisor: Dr. V. Kepuska.
ONZEminer Margaret Maclagan, ONZE director Robert Fromont, designer.
Basic structure of sphinx 4
Getting Data ● Kinds of data for linguistics – Written – Spoken – Visual (ASL, body language) ● Phonetics – Implosives-larynx lowering, rounding, x-ray.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Web programming Part 1: HTML 由 NordriDesign 提供
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
Speech Recognition Created By : Kanjariya Hardik G.
Simple Project on Digit Recognition By: Class: Faculty: Manish Ravlani Speech Recognition Dr. Kepuska.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
Speech Processing AEGIS RET All-Hands Meeting
Automatic Speech Recognition
Speech recognition in mobile environment Robust ASR with dual Mic
3.0 Map of Subject Areas.
Why Study Spoken Language?
Lab 2: Isolated Word Recognition
Creating Transcripts of Your Narrated PowerPoints Richard Oliver Department of Information Systems 2018 Quality in Online Education Conference.
Why Study Spoken Language?
Hands-on tutorial: Using Praat for analysing a speech corpus
Lab 3: Isolated Word Recognition
King Saud University, Riyadh, Saudi Arabia
Command Me Specification
Presentation transcript:

Speech Recognition Final Project Resources Professor: Dr. Veton Kepuska Class: ECE5526 Speech Recognition Student: Chih-Ti Shih

FTP Server Information Host: 163.118.203.219 User ID: student Password: student Port:21

Callhome English Speech Corpus The Callhome English Speech Corpus, produced by the Linguistic Data Consortium. The CALLHOME English corpus of telephone speech consists of 120 unscripted telephone conversations between native speakers of English.

Callhome English Speech Corpus - directory callhome/doc: directory of documentation for Callhome English speech. callhome/english: path to the speech data files, divided into train, devtest and evltest. 0README.1st : Corpus information file.

TIMIT Acoustic-Phonetic Continuous Speech Corpus The TIMIT corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT contains a total of 6300 sentences, 10 sentences spoken by each of 630 speakers from 8 major dialect regions of the United States.

TIMIT Acoustic-Phonetic Continuous Speech Corpus

TIMIT Acoustic-Phonetic Continuous Speech Corpus

FFM TIMIT The FFMTIMIT corpus contains the previously unreleased secondary microphone recordings of the TIMIT corpus. FFMTIMIT contains a total of 6130 sentences, 10 sentences spoken by each of 613 speakers from 8 major dialect regions of the United States.

FFM TIMIT – speaker information

FFM TIMIT – dialect information

FFM TIMIT - directory FFM Timit/sphere/ : directory containing the NIST Speech Header Resources (SPHERE) software; SPHERE is a set of "C" library routines and programs for manipulating the NIST header structure prepared to the FFMTIMIT waveform files. FFM Timit/ffmtimit/ : directory containing the FFMTIMIT corpus as well as FFMTIMIT related documentation.

MOCHA - TIMIT The MOCHA TIMIT corpus includes 3 sets of 460 short sentences designed to include the main connected speech processes in English. The corpus includes Acoustic Speech Waveform, Laryngograph Waveform, Electromagnetic Articulograph and Electropalatograph Frames.

MOCHA TIMIT – File Formate Total of 3 sample sets: fsew0_v1.1.tar, maps0.tar and msak0_v1.1.tar. Each of them includes: *.wav file, Acoustic Speech Waveform. *.lar file, Laryngograph Waveform. *.ema file, Electromagnetic Articulograph. *.epg file, Electropalatograph Frames. *.lab file, Label *.lab

NYNEX PhoneBook PhoneBook is a phonetically-rich, isolated-word, telephone-speech database, created because of : The lack of available large-vocabulary isolated-word data. Anticipated continued importance of isolated-word and keyword-spotting technology to speech-recognition-based applications over the telephone. Findings that continuous-speech training data is inferior to isolated-word training for isolated-word recognition.

NYNEX PhoneBook - information The core section of PhoneBook consists of a total of 93,667 isolated-word utterances, totalling 23 hours of speech. This breaks down to 7,979 distinct words, each said by an average of 11.7 talkers, with 1,358 talkers each saying up to 75 words. All data were collected in 8-bit mu-law digital form directly from a T1 telephone line. Talkers were adult native speakers of American English chosen to be demographically representative of the U.S.

NYNEX PhoneBook – directory & files The disc 1 and 2 include the read isolated word set. The disc 3 includes spontaneous utterance set. fnl_rprt.doc: documentation describing corpus collection. wav_file.lst: list of file name paths to all speech files on this disc. sphere/ : NIST SPHERE software package (source code). read_sp/ : isolated word speech files (discs 1 and 2) spon_sp/ : spontaneous phrase speech files (disc 3) wordlist/ : complete set of data tables relating words,

ICSI Meeting Recorder Digits Corpus ICSI (International Computer Science Institute) Meeting Recorder Digits Corpus non-segmented recordings of read connected digits. ICSI Meeting Recorder Digits Corpus includes 2790 digit utterance. Directory: ICSI_Meeting_Recorder_Digits_Corpus/ ICSI Project site: Link

CCW17 Corpus (WUW Corpus) Directory: CCW17/ Subdirectory and files: Calls/ : Isolated words utterances recorded in 8-bit ulaw format. Ccw17.trans : file IDs include utterances location and transcriptions.

WUW_Corpus WUW corpus is a corpus used in WUW project by Dr. Kepuska. Directory: WUW_Corpus Subdirectory and files: Calls/ : Isolated words utterances recorded in 8-bit ulaw format. WUW.trans :utterances information and location.

WUWII_Corpus WUW 2 corpus is a corpus used in WUW project by Dr. Kepuska. Directory: WUWII_Corpus/ Subdirectory and files: Calls/ : Isolated words utterances recorded in 8-bit ulaw format. WUWII.trans :utterances information and location.

Speech Tools: Praat Praat: program for speech analysis and synthesis. Introduction presentation done by current student, Dileep. Link Official site: Link Praat Lab: Link

Speech Tool: CMU Sphinx The CMU Sphinx consists the following elements: Decoder: Sphinx2, Sphinx3, Sphinx4 and PocketSphinx. Acoustic Model Training tool: Sphinx Train. Language Model Training tool: cmuclmtk (The CMU-Cambridge Statistical Language Modeling Toolkit) and SimpleLM.

Speech Tool: CMU Sphinx - resource Audio data: MicArray, AN4, Let’s go, CMU-SIN, PDA and RM1. Open Source Models: Communicator acoustic models, dialog system. WSJ1 acoustic models, dictation. HUB4 acoustic models, broadcast news. Dictionary: The CMU Pronouncing Dictionary

Speech Tools: BootCat LM toolkit BootCaT: Bootstrapping Corpora and Terms from the Web. Simple Utilities for Bootstrapping Corpora and Terms from the Web. Directory: Tool/BootCat/ Using BootCat to create LM from WWW. Link

Speech Tools: VoiceBox VoiceBox is a speech processing toolbox consists of MATLAB routines. Directory: Tool/voicebox/ VoiceBox TK includes audio file input/output, Speech Analysis, Speech Synthesis and Signal Processing tools. Documentation and function list: Link

Speech Recognition Final Project Resources END