Presentation is loading. Please wait.

Presentation is loading. Please wait.

CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder)

Similar presentations


Presentation on theme: "CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder)"— Presentation transcript:

1 CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder) Ziad Al Bawab (Recorder/End-pointer) Rong Zhang (ICSI Training) Arthur Chan (Recorder/End-Pointer/Decoder/Trainer) Carnegie Mellon University Aug 30, 2004

2 Summer Highlight  This presentation (15 pages) Review of June and Highlight (2 pages) Recorder (3 pages) Decoder (6 pages) ICSI Training (1 page) Trainer (1 page) Documentation (1 page) Conclusion (1 page)

3 Review of June  Three goals we set in June 1, Recorder/Classifier/Decoder Integration 2, Further Improvement of ICSI Training 3, Speaker Adaptation  Summer highlight We solve 2 (1+0.5 + 0.5) out of the 3 problems Plus more

4 Problems we faced in the Summer  Summer is a nice season  Many of us had vacation/left Alex : Went to Spain in last three weeks of July Jason : Left and went to Texas ThomasQ, Yash, Moss : Internship in other states Ziad : Back to Lebanon from Aug 1 – Aug 21 Mock : Back to Thailand from Aug 1 – Aug 15 (Evandro) : Went to vacation from Aug 1 – Aug 15 Arthur : broke down from Aug 12 – Aug 22  Lack of man power were a big problem.

5 Recorder (Integration)  Ziad/Yitao/Arthur  Recorder + Classifier + Decoder Code Integration is completed  Classifier and end-pointer are now modularized and incorporated to CALO Recorder.  “FSM” of end-pointer is now implemented  Classifier + Decoder had a hard-time Trapped by feature mismatch Now fixed.  Yitao also separate classifier and decoder into separate thread.  Outlook: Before code check-in, we may need to fix speed-up problems (Our weakness) 3 components are closely coupled

6 Recorder (Portability)  By Jason/Arthur  We are not yet “CP”  In Windows, cygwin, linux and Mac OSX, our codebase in CVS compiled linked  It now works in the following platforms: Windows -Fully functioning with extra functions specific to Windows Cygwin -Small problems in GUI, NTP works now MacOSX -Fully functioning, just need to fix some memory leaks and invalid memory read/write  In Linux AD97 chipset still confuse Portaudio library

7 Recorder Outlook in Q4  What should we do? Linux : Focus on Linux’s Port  Fix portaudio problem  Fix offline classifier Barely able to support more feature requests without Thomas.  We need to implement switch for processing routines.  Reducing the boundary of release and development After we fix the portability problem, it’s time to move to SRI’s CVS.  Memory management can be an issue Need to scan it using memory checking tools

8 Decoder (Live Mode APIs)  More robust than Jun Fixed couple of memory problems  Now going through in-depth code review  Documented and commented An advantage for our partner.

9 Decoder (Speed)  We finally have a s3.x setup for ICSI  A quick hack without careful tuning  0.6xRT in a 2G machine with relative 20% degradation (from 69% -> 63%)  Outlook: become important Q4’s goals again

10 Decoder (Speaker Adaptation)  Single regression class MLLR is now fully supported  Produce exactly the same result as Sam- Joo’s package  Lack of test cases for now  Outlook: In Q4, we need to Test the current package with more test cases. If time allows, enable multiple regression class and MAP.

11 Decoder (s3.0/s3.x code merging)  align, astar, allphone, dag, decode-anytopo are now in s3.5 codebase Thanks to Carl Quillen  Merging is 80% completed, Code compiled, linked and ran. align and allphone are fixed. There are still small difference because there are small difference between s3.0 and s3.x astar/dag/decode-anytopo in progress. 12k lines of code are saved  from s3.0 + s3.2 (63k) to s3.5 (51k) Only slight increase in the package size  0.3 M to 0.5 M

12 Decoder (s3.0/s3.x code merging) (cont.)  Consequence of merging, it will be possible to use 3.x to Generate alignment Generate n-best Do phoneme recognition Search best path in the lattice Do flat lexicon search. Interface is also available reading N-best. Not exposed yet.  Outlook : More code merging activities will happened in next two quarters.

13 Decoder (Release)  We need to provide our partners a recognizer With state of the art technology high performance  Sphinx 3.5 will be released at the beginning of September  Still need work on Write two more chapters of documentation Polish live-mode APIs Some small code clean-ups  Will also announce corresponding tag for SphinxTrain. A simultaneous release of s3.5 + ST

14 ICSI Training (Phase III)  By Rong  Phase I and II had been completed in May and June.  Now in Phase III: Tuning We already tuned the parameters such as # of senone and # of mixture. Ziad and Arthur are too busy in Summer  Outlook: an area which was under-worked in Summer. Need to do more in Q4.

15 Trainer (Clean-up)  Unification of the front-end Sphinx 2/ Sphinx 3/SphinxTrain Thanks to Evandro No need to worry about code-level mismatch  Unification of command-line interface 36 out of 37 tools now have standard command- line interface. All support options –example and -help Appendix A.2 of Hieroglyph  A 94 pages comprehensive and formatted documentation can now be found on-line

16 Documentation  Project Hieroglyph An effort to build a set of comprehensive documentation using Sphinx, SphinxTrain and CMU LM Toolkit to build speech application  In Summer 1 st Draft of “Speaker Adaptation” (Chapter 9) is completed 1 st Draft of “SphinxTrain command line reference” (Chapter A.2) is completed. 2 nd Draft of “License of Sphinx” is completed. All can be found in  www.cs.cmu.edu/~archan/sphinxDoc.html www.cs.cmu.edu/~archan/sphinxDoc.html

17 Conclusion  We have done something in the Summer But with great pain We need to put more stress on some weak areas in Q4  Outlook in September and Q4 September : ICASSP 2005 and ICSLP 2004 preparation October : Polish Speaker Adaptation November : Complete dynamic LM addition/deletion December : Search refinement, further speed- up.


Download ppt "CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder)"

Similar presentations


Ads by Google