Sphinx 3.4 Development Progress Report in February Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004.

Slides:



Advertisements
Similar presentations
USA AREA CODES APPLICATION by Koffi Eddy Ihou May 6,2011 Florida Institute of Technology 1.
Advertisements

Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
CALO Decoder Progress Report for March Arthur (Decoder and ICSI Training) Jahanzeb (Decoder) Ziad (ICSI Training) Moss (ICSI Training) Carnegie Mellon.
SYSTEM PROGRAMMING & SYSTEM ADMINISTRATION
SVM—Support Vector Machines
Versioning Extensions for Linux CS736 Spring 1999 J. Adam Butts Paramjit Oberoi.
Brief Overview of Different Versions of Sphinx Arthur Chan.
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder)
ClearEye: An Visualization System for Document Revision CPSC 533C Project Update Qiang Kong Qixing Zheng.
Speed-up Facilities in s3.3 GMM Computation Seach Frame-Level Senone-Level Gaussian-Level Component-Level Not implemented SVQ-based GMM Selection Sub-vector.
© Tefko Saracevic, Rutgers University1 Interaction in information retrieval There is MUCH more to searching than knowing computers, networks & commands,
Incremental Network Programming for Wireless Sensors NEST Retreat June 3 rd, 2004 Jaein Jeong UC Berkeley, EECS Introduction Background – Mechanisms of.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
From Main() to the search routine in Sphinx 3 (s3accurate) Arthur Chan July 8, 2004.
Progress of Sphinx 3.X, From X=4 to X=5 By Arthur Chan Evandro Gouvea Yitao Sun David Huggins-Daines Jahanzeb Sherwani.
Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh Welcome.
Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky.
Prototyping Teppo Räisänen
Sphinx 3.4 Development Progress Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 4, 2004.
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.
Programming a computer. What does programming a computer mean ? Programming a computer: Since a computer can only execute machine instructions (encoded.
What is Unix Prepared by Dr. Bahjat Qazzaz. What is Unix UNIX is a computer operating system. An operating system is the program that – controls all the.
RISC and CISC. Dec. 2008/Dec. and RISC versus CISC The world of microprocessors and CPUs can be divided into two parts:
CCSA 221 Programming in C CHAPTER 2 SOME FUNDAMENTALS 1 ALHANOUF ALAMR.
Antigone Engine Kevin Kassing – Period
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
Invitation to Computer Science 5th Edition
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –
March 24, 2005EARS STT Workshop1 A Study of Some Factors Impacting SuperARV Language Modeling Wen Wang 1 Andreas Stolcke 1 Mary P. Harper 2 1. Speech Technology.
Temple University Goals : 1.Down sample 20 khz TIDigits data to 16 khz. 2. Use Down sample data run regression test and Compare results posted in Sphinx-4.
Nightly Releases and Testing Alexander Undrus Atlas SW week, May
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
1 History of compiler development 1953 IBM develops the 701 EDPM (Electronic Data Processing Machine), the first general purpose computer, built as a “defense.
Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
CST320 - Lec 11 Why study compilers? n n Ties lots of things you know together: –Theory (finite automata, grammars) –Data structures –Modularization –Utilization.
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
Example: Sorting on Distributed Computing Environment Apr 20,
Parallel and Distributed Simulation Memory Management & Other Optimistic Protocols.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
An Introduction to Support Vector Machines (M. Law)
Building Marketing Databases. In-House or Outside Bureau? Outside Bureau: Outside agency that specializes in designing and developing customized databases.
HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.
Chap#11 What is User Support?
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Compiler Construction (CS-636)
GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.
Basic structure of sphinx 4
Cross Language Clone Analysis Team 2 February 3, 2011.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Antigone Engine. Introduction Antigone = “Counter Generation” Library of functions for simplifying 3D application development Written in C for speed (compatible.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
The information systems lifecycle Far more boring than you ever dreamed possible!
If you have a transaction processing system, John Meisenbacher
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Current Status of the Tracking Trigger Software Andrew W. Rose.
Antigone Engine.
Real-Time Ray Tracing Stefan Popov.
Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )
CALO Decoder Progress Report for April/May
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Progress Report of Sphinx in Q (Sep 1st to Dec 30th)
Sphinx Recognizer Progress Q2 2004
Planning and Scheduling in Manufacturing and Services
Chapter 11 user support.
Presentation transcript:

Sphinx 3.4 Development Progress Report in February Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004

This Presentation  S3.4 Development Progress Speed-up Language Model facilities  CALO and S3.5 Development Which features should be there to make CALO better? Schedule for next three months

Review of Last Month Progress  Last month Wrote a speed-up version of s3. Completed some coding of s3.4 speed-up task.  This month Backbone of speed-up functionalities s3.4 completed and tested. Basic LM facilities completed and smoked-tested.

Current Systems Specifications (without Gaussian Selection) Sphinx 3Sphinx 3.3 Speed in P4-1G Tested in Communicator Task ERR 17.2% 11xRT GMM, 3xRT Srch ERR 18.6% 6xRT GMM, 1xRT Srch GMM ComputationsNot optimized (few code optimization) Can applied Sub-VQ-based Gauss. Selection LexiconFlatTree SearchBeam on search, no beam on GMM Beam on Search Beam on GMM.

Speed-up Facilities in s3.3 GMM Computation Seach Frame-Level Senone-Level Gaussian-Level Component-Level Not implemented SVQ-based GMM Selection Sub-vector constrained to 3 SVQ code removed Lexicon Structure Pruning Heuristic Search Speed-up Tree. Standard Not Implemented

Speed-up Facilities in s3.4 GMM Computation Seach Frame-Level Senone-Level Gaussian-Level Component-Level (New) Naïve Down-Sampling (New) Conditional Down-Sampling (New) CI-based GMM Selection (New) VQ-based GMM Selection (New) Unconstrained no. of sub- vectors in SVQ-based GMM Selection (New) SVQ code enabled Lexicon Structure Pruning Heuristic Search Speed-up Tree (New) Improved Word-end Pruning (New) Phoneme- Look-ahead

S3.4 Speed Performance in Communicator Task Sphinx 3.3Sphinx 3.4 Error RateERR: 18.6%ERR: 18.7% Speed (P4-1G)6xRT GMM, 1xRT Search 1.2xRT GMM, 1.5xRT Search Speed (P4-2G)1.6xRT GMM, 0.6xRT Search 0.4xRT GMM 0.9xRT Search Techniques used--CI-based GMM Selection -Word-end pruning

Issues in Speed Optimization  Implementation Issues: Beams applied on GMM causing many techniques hard to be implemented Some facilities were hardwired for specific purpose.  Performance Issues Each techniques reduced computation by % with <5% degradation. However, they didn ’ t add-up ……  Reduction in computation has certain lower bound (usually 75%-80% reduction is max.) Overhead is huge in some techniques  E.g. VQ-based Gaussian Selection take 0.25xRT

Language Model Facilities  S3.3 only accept single LM without class in binary format  So far, S3.4 is able to accept multiple class- based LMs in binary format. One major modification of codes  Affect 6-7 files. Caveats:  Not perfect implementation.  Text format is not yet supported. Backward compatibility is an issue.  Lack of test-cases. Only slightly smoke-tested ~1 more week work

Problems with s3.4 (valid for Feb 29th, 2004)  Only accept DMP file. Txt format reader is very complex in Sphinx 2. Straight conversion is not clean.  LMs are all loaded into memory We can work on this.  Lexical tree are all built at the beginning We tried to avoid the overhead of rebuilding tree in every utterance.

Summary in Sphinx 3.4 Development  Derivative s3.3 With Speed Optimization Better LM facilities  Algorithmic Optimization is 90% completed Still need to improve overhead performance. Tree-based GMM selection is desirable. Improvement for individual technique.  Go-through the major hurdle of multiple LMs and class-based LMs. Need more time to make it more stable.  Expected internal release time : March 8, 2004

Sphinx 3.4 and CALO  Which pieces are missing? Sphinx 3.4 ’ s decoding is still not streamlined => Continuous Listening is not yet enabled. Sphinx ’ s speed may still not be ideal. From s3 to s3.3, ~10% degradation. Sphinx 3.4 doesn ’ t learn from data yet.

Sphinx 3.5. What should we do in next 3 months?  Expected release time (May – June)  Interfaces: Streamlined front-end and decoding (?) Portaudio based audio routine.  Speed/Accuracy Improved lexical tree search Machine optimization of Gaussian computation. Combination of multiple recognizers  Learning Acoustic Model adaptation (?) Language Model adaptation (In Phoenix) Better semantic parsing  Resource Acquisition and Load Balancing

Highlight I: Speed/Accuracy  Improved lexical tree search Current implementation used single lexical tree. May be desirable to create tree copies.  Machine Optimization of Gaussian Computation SIMD (Single Implementation Multiple Data) Require help from assembly language experts. (Jason/Thomas)

Highlight II: Multiple Recognizer Combination and Resource Acquisition  Research by Rong suggests combination of multiple recognizer can improve accuracy  Speed worsen by 100% if we run two recognizers.  An interesting solution: Computation can be shared by other machines in the meeting. Inspired by routing implementation. A very natural solution in meeting scenario because usually only one person will be speaking.  Challenges : Bandwidth and Load Balancing

Highlight III:  Learning Acoustic Model  Maximum Likely Linear Regression (MLLR)  Will be responsible by Jahanzeb (?)Language Model  How?  Cached-based LM? (?)Improved Robust Parsing  Better parsing based on previous command history  Phoenix ’ s source code is not easy to trace  Thomas Harris ’ s implementation may be a good place to start.

Arthur and Jahanzeb ’ s Proposed Schedule ArthurJahanzeb Mar 1 – Mar 15 Windows Port+ Stream-line S3.4 decoding Regression- test + Adaptation Milestone 1 Mar 15- Apr 1 Multiple recognizers Experiments Apr 1 – Apr 15 Preparation for Demo + if (we want) {write-up paper ICSLP}

Cont. ArthurJahanzeb Apr 16 – May 7 Search modification: tree copies implementation Regression- test + Adaptation Milestone 2 May 7 – June 1 Sphinx 3.5 Learning code development + s3.5 release (?)