Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Main() to the search routine in Sphinx 3 (s3accurate) Arthur Chan July 8, 2004.

Similar presentations


Presentation on theme: "From Main() to the search routine in Sphinx 3 (s3accurate) Arthur Chan July 8, 2004."— Presentation transcript:

1 From Main() to the search routine in Sphinx 3 (s3accurate) Arthur Chan July 8, 2004

2 This presentation…… Design goals of Sphinx 3 (s3accurate) (4 pages) How hard is to trace the Sphinx code? (2 pages) How to trace source code in general? (1 page) A tour from the main function to the search routine. (11 pages) Next time: detail of s3accurate or we could start s3fast.

3 Design Goals of a Speech Recognizer/Trainer Different goals of building a speech recognition software Software Accuracy Software Speed Software Usability Code Readability – Is it entertaining or easy to read?

4 If we optimize on only one aspect of Design goal…… Optimizing Accuracy…… Use 1000xRT techniques to optimize recognition rate. Optimizing Speed…… Cryptic coding makes programmer unable to understand what’s going on. Optimizing Usability…… Allow > 1000 options for users in the application But the code become extremely complex.

5 Conflicts between goals Accuracy vs Speed Usage of approximate search usually give faster decoding time but lower accuracy. Whole sentence decoding make sense for many techniques but cause a lot of delay. Usability vs Readability To make a software to be very usable, the code will be more complicated. Complicated code is generally hard to read.

6 Design goal of Sphinx 3 (s3accurate) Accuracy > Readability > Usability > Speed Sphinx 3 – the flat lexicon version, was mainly used for research purpose. very modular and easy to be changed by the researchers with experience in C Not many difficult code optimization could be found.

7 How hard is to trace code of Sphinx? (My view). As many programs Sphinx is just as a set of commands which do additions, subtractions, multiplications and divisions in a specific orders. So…… It is not difficult to understand the code if the underlining algorithm is understood.

8 Is there a lot of things to read? Number of lines of code of all.c and.h files. = 36143 Number of lines in Harry Potter novels Book 1 : ~9000 lines Book 2 : ~9000 lines Book 3 : ~12000 lines Book 4 : ~21000 lines Book 5 : ~27000 lines So…… It is much shorter than the whole series of Harry Potter novels. Actually, it is not much to read.

9 How to read the source code in general? Several advices Jot notes on the dependency of the code (A file can be found in README.tracing can be found in s3fast and SphinxTrain. ) Jot notes on how certain parts of the code work Something useful but not necessary: An editor with program statistics and hyperlink of function definitions. Such as Microsoft Visual C++ (  ) Or emacs ( ) or vi (  ) Grab a comfortable chair. Be patient.

10 A tour from the main function to the search routine: Overview Physical layout of the code s3decode-anytopo Brief tour of the programs Initialization Processing Post-processing.

11 Physical Layout of s3 Getting the code from Sourceforge cvs -d:pserver:anonymous@cvs.sourceforge.net: /cvsroot/cmusphinx/ co archive_s3 archive_s3 S3 : <<= THE ONE S3.0 : <-Why is it there? S3.2 : <- Legacy implementation s3fast wo live- mode decoder S3.3 : <- Legacy implementation s3fast w live- mode decoder

12 s3 config/ <- for configuration of make doc/ <- documentation include/ <- header files src/ <- all the *.c files. lib/ <- where the library will be bin/ <- where the binary will be.

13 Inside src/ libio/: file IO functions libutil/: useful data structure libfbs/ : source for all searching code libfeat/ : code for feature extraction

14 Inside libfbs/ Several files with main() include align-main.c -> the entry point for s3align main.c -> the entry point for s3decode-anytopo allphone-main.c -> the entry point for s3allphone astar-main.c -> the entry point for s3astar nbestrescore-main.c -> the entry point for s3nbestrescore dag.c -> the entry point for s3dag

15 Logical structure of the code The forward search 1, Do GMM computation for every senones 2, Do search for 1 frames. 3, Iterate until the end of the utterances.

16 main.c Pseudocode of top level 1, Read command line arguments cmd_ln_define cmdline_parse 2, Initialization Initialize log table lookup, logs3_init Initialize feature feat_init Initialize models models_init 3, Processing of all control files (also do decoding) Process_ctlfiles

17 process_ctlfiles Pseudocode 1, Read in more parameters 2, For every cepstral file, Run decoding (decode_utt)

18 decode_utt Run forward search fwdvit Dump hypothesis/lattice/statistics to the output

19 fwd_vit 1, Initialization of the feature vector 2, For every frame a, Computation of feature b, Computation of Gaussian distribution c, Do forward search for 1 frame

20 We are almost done ! We went-through the key logic that leads us from the main() function to the GMM computation and search Next time Each component in more details.


Download ppt "From Main() to the search routine in Sphinx 3 (s3accurate) Arthur Chan July 8, 2004."

Similar presentations


Ads by Google