Nuance Overview and Tutorial David Hjelm 2003-03-20.

Nuance Overview and Tutorial David Hjelm 2003-03-20

The Nuance system contains… Speech recognition, voice authentication and text-to-speech engines API:s to create speech-recognition and text- to-speech clients. VoiceXML platform and development tools Utility programs, e.g. a SLM-trainer for creating N-gram language models

Nuance ASR – supported languages Arabic Cantonese Czech Danish Dutch English (5 varieties) French (2 varieties) German (2 varieties) Greek Hebrew Italian Japanese Korean Mandarin (2 varieties) Norwegian Portuguese Spanish (2 varieties) Swedish Turkish

Nuance Vocalizer (TTS) – supported languages Dutch English (US & UK) French German Italian Norwegian Spanish Swedish

Some Nuance programs and utilities nlm –license manager Xapp –for testing ASR resource-manager –distributes client requests recserver –ASR server vocalizer –TTS server compilation-server –compiles recognition grammars at runtime nuance-compile –compiles recognition grammars train-slm –trains statistical language models parse-tool –checks if a recognition grammar accepts a string generate –checks what strings a recognition grammar can generate

Nuance license manager (nlm) –License manager must be running to run recserver, resource-manager, compilation-server, etcetera. nlm C:\Program2\Nuance\v8.0.0>nlm ntk8-800-a-x46-9d6108a3c3a1 SN: 800 HostLock: anyhost Port: 8470 Checksum: c0683bc253e3 1/ 1 server 800 899 1-jan-2038 2/ 2 sp-chan 800 899 1-jan-2038 2/ 2 v-chan 800 899 1-jan-2038 1/ 1 tool 800 899 1-jan-2038 1/ 1 vocalizer 800 899 1-jan-2038 Nuance License Manager ready.

Compiling recognition grammars (nuance-compile) Speech recognition grammars are compiled into recognition packages by the command nuance-compile: Grammars are written in Nuance’s GSL (grammar specification language) format. GSL is essentially CFG + operations for nuance semantics. GSL – CFG part: –Uppercase symbols are non-terminals –Uppercase symbols preceded by a dot are start-symbols (top-level grammars) –Lowercase symbols are terminals –Rule consists of LHS and RHS –LHS is a non-terminal (or a start symbol) –RHS is an GSL expression consisting of (non-start) symbols and grammar operators: ( ) [ ] ? * +

Compiling recognition grammars, contd. Contents of file ConfirmDisconfirm.grammar:.ConfirmDisconfirm [Confirm DisConfirm] Confirm +( [yes yeah sure (you bet) ] ?buddy ) DisConfirm ( [ no nope ( no way ) ] ?[ idiot moron] ).Testing *[ testing (one ?( two ?three ) ) ] Compilation of ConfirmDisconfirm.grammar results in a recognition package with two top-level grammars:.ConfirmDisconfirm and.Testing

Compiling recognition grammars, contd. nuance-compile [options] [parameters] E:\david\korp>nuance-compile ConfirmDisconfirm.grammar English.America –auto_pron lm.Addresses=localhost Results of compilation is the directory ConfirmDisconfirm. Option –auto_pron tells the compiler to guess pronunciation for out-of-dictionary words – if any, Parameter lm.Addresses=localhost says that the license manager is running on localhost.

Nuance recognition server (recserver) recserver takes a list of recognition packages as argument and an optional list of nuance parameters. The recognition packages have been compiled in advance (by e.g. nuance-compile). Any toplevel grammar symbol in one of the packages can be used for performing speech recognition: E:\david\korp>recserver -package ConfirmDisconfirm -package KonfirmeraFoerkasta lm.Addresses=localhost Nuance Recognition Server All debug information is currently logged into./logs/recserver_log_current 10,-,LOG,STATUS,0,2003/03/19 15:45:08.984,JONES,recserver,1456,RS,AppOutput,33,L oading recognition packages... 10,-,LOG,STATUS,0,2003/03/19 15:45:25.578,JONES,recserver,1456,RS,AppOutput,34,A ll recognition packages loaded. 10,-,LOG,STATUS,0,2003/03/19 15:45:25.578,JONES,recserver,1456,RS,AppOutput,53,R ecserver ready to accept connections on port 8200

Grammar testing tool - Xapp Xapp can be used for testing recognition grammars. Xapp opens up a speech-channel which can be used for speech recognition. Each speech channel instance can load no less and no more than one recognition package. Before Xapp is started a license manager and a recserver with the appropriate package loaded must be running. Xapp –package [parameters] E:\david\korp>Xapp -package ConfirmDisConfirm lm.Addresses=localhost

Xapp contd.

Recognition clients Recognition clients (like e.g. Xapp) are the applications that read and write audio and send requests to the recognition server or text-to- speech engine. An audio provider act as a middleware layer between the client and the sound stream. Audio providers exist for –native audio (PC:SoundBlaster soundcard, Unix: /dev/audio) –VoIP telephony (H323, SIP, RTP) –telephony cards (Dialogic, Aculab, …) The native audio claims exclusive rights to the sound card, so there can be only one native-audio client simultaneously. However there can be as many VoIP clients as the license manager(s) grant and as many telephony-card clients as permitted by the license manager and hardware.

Recognition clients (two at once) The parameter audio.Provider specifies the audio provider for the recognition client at start-time E:\david\korp>Xapp -package ConfirmDisconfirm lm.Addresses=localhost E:\david\korp>Xapp -package KonfirmeraFoerkasta lm.Addresses=localhost audio.Provider=h323 Both ConfirmDisconfirm and KonfirmeraFoerkasta have been loaded onto the same recserver. ConfirmDisconfirm is compiled using the English.America master package and KonfirmeraFoerkasta is compiled using the Swedish master package. Each package can only contain grammars in one language and each recognition client can only load one package at a time. Thus recservers can be multilingual, but recclients can not.

Recognition clients contd. Recognition clients are normally created through one of the API:s provided by Nuance. API:s exist for Java, C++ and C The API:s provide methods for creating speech channel instances, answering calls, performing speech recognition and text-to-speech synthesis, setting parameters, compiling grammars at runtime etcetera. The API:s do not provide methods for starting and stopping recognition servers and text-to-speech servers etcetera.

vocalizer Vocalizer is Nuance’s TTS engine. Recognition clients call vocalizer for TTS services just like they call recserver for ASR services. vocalizer [option value]* [parameters] -text_type sets the text type (string) (default Plaintext) -filter sets the text filter (string) (default none) -language to set the language (string) (default USEnglish) -gender to set the gender (string) (default MaleOrAny) -voice to specify the voice (string) (no default) -speed the relative speed (0-100, default 50) -pitch the relative pitch (0-100, default 50) -volume the relative volume (0-100, default 50) -encoder audio encoding used (string) (default MuLaw) -bufferms number of milliseconds of buffering (number) (default 750)

vocalizer contd. vocalizer does not have to be started before the recognition client, but it must be running when it calls a function where TTS is needed The parameter tts.Port specifies the port the vocalizer process should listen to requests on. Default port is 32323: C:\>vocalizer -language UsEnglish -gender Male -pitch 90 Starting vocalizer built on NUANCE v7.0.4. Toolbox Server, version Mar 11 2002. Copyright (c) 2001, Nuance Communications. All Rights Reserved. MULTI-threaded server. Server ready to accept connections on port 32323...

compilation-server compilation-server is used for compiling dynamic grammars via one of the API:s. This compilation server instance can be used for compiling dynamic grammars using master package English.America, since ConfirmDisconfirm was compiled using that msaster package. E:\david\korp>compilation-server -package ConfirmDisconfirm lm.Addresses=localhost Nuance Compilation Server All debug information is currently logged into./logs/compilation-server_log_cur rent 10,-,LOG,STATUS,0,2003/03/20 11:39:59.750,JONES,compilation- server,1328,COMPILER,AppOutput,17,Initializing...

Nuance Java SpeechChannel API Main class is NuanceSpeechChannel, which defines methods for speech recognition as well as inner classes for telephony control and TTS. NuanceSpeechChannel constructor takes a NuanceConfig object which defines the Nuance Configuration as an argument. Example: String[] params = {”-package”,”ConfirmDisconfirm”,”lm.Addresses=localhost”} NuanceConfig nc = new NuanceConfig(params); NuanceSpeechChannel nsc = new NuanceSpeechChannel(nc);

Nuance Java SpeechChannel API contd. If the NuanceSpeechChannel is created with a telephony-based audio provider, a call must be answered prior to recognition. If a native audio provider is used this is not necessary. CoreTelephonyControl tel = nsc.getTelephonyControl(); if(tel!=null){ //if native audio, getTelephonyControl returns null tel.waitForCall(); tel.answerCall(); } To play a prompt, use the method appendPrompt in class CorePromptPlayer, then use one of methods play (synchronous) or startPlay (asynchronous) in NuanceSpeechChannel. To synthesize text use appendTTS instead. CorePromptPlayer pp = nsc.getPromptPlayer(); pp.appendTTS(”I am a talking machine”); pp.play(false); //false specifies that DTMF input should not interrupt the prompt

Nuance Java SpeechChannel API contd. The playAndRecognize method plays appended prompts and performs speech recognition using the specified top level grammar. The result of playAndRecognize is a RecResult object which basically is a set of key-value pairs where the values can be complex: RecResult rr = nsc.playAndRecognize(”.ConfirmDisConfirm”); If a compilation server is running with a package which is compiled with the same master package as the NuanceSpeechChannel package, the grammar can be specified in the call to playAndRecognize: RecResult rr = nscplayAndRecognize(”[ yes no maybe]”); NuanceSpeechChannel also contains methods for inserting precompiled dynamic grammars into a grammar and can also interact with a database to save and load grammars.

Nuance Java SpeechChannel Events. The SpeechChannel throws events which can be captured and handled by EventListeners: –StartOfSpeechEvent – when the user starts speaking, –EndOfSpeechEvent - when the user has stopped speaking –PartialResultEvent – encodes partial result during recognition –PlaybackStartedEvent – when system starts playback –PlaybackStoppedEvent – when system stops playback –CallConnectedEvent – Someone called. –DTMFEvent – user presses keypad on telephone, encodes the digit pressed –HungupEvent – user hung up telephone

Nuance parameters A large part of the Nuance functionality is specified by parameters. Parameters can be set at initialization-time on the command-line or at run-time via the API:s. Parameters can control input and output volume, name of logging file, whether barge-in should be allowed, IP- address of TTS- and ASR servers and lots of other things. An exhaustive description of the parameters is given in the Nuance manual.

Some handy Nuance parameters… config.DebugLevel(1-6)Amount of debugging info given by Nuance. 6 = max config.ServerHostNameWhere the recognition client should try to connect to a recserver config.ServerPortWhere the recognition client should try to connect to a recserver client.TTSAddressesList of addresses ( host[:port] [,host[:port]]*) to which the recognition client should try to connect to a TTS-server client.NoSpeechTimeoutSecsHow many seconds (given as a float) should recognition client wait before a recognize function call returns if the user does not say anything. If set to 0.0 it waits forever… client.TooMuchSpeechTimeoutSecsMaximum number of seconds of speech to allow after user starts speaking. 0.0 disables timeout. rec.DoNBest (TRUE/FALSE) Whether to return a list with top probability recognition results instead of a single top-probability result rec.NumNBest Maximum number of results to return if rec.DoNBest=TRUE rec.ConfidenceRejectionThreshold ( 0-100) If a recognized utterance has a score below this threshold, it is marked as REJECTED

Some handy Nuance parameters contd. rec.GenPartialResults(TRUE/FALSE)Whether to return intermediate results during the recognition process. Partial results are given as asynchronous events/notifications rec.PartialResultSecondsHow often partial results should be generated audio.InputVolume(0-255)input volume audio.OutputVolume(0-255)output volume audio.Provider(native/dialogic,…)Which audio provider to use. The audio providers have, themselves different parameters… ep.EndSecondsHow long pause it takes after a user utterance to trigger end-of-speech lm.Addresseshost[,host]*which host(s) the clients should try to connect to to find a license manager with a spare license but there are lots more…

Nuance Overview and Tutorial David Hjelm 2003-03-20.

Similar presentations

Presentation on theme: "Nuance Overview and Tutorial David Hjelm 2003-03-20."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Nuance Overview and Tutorial David Hjelm 2003-03-20.

Similar presentations

Presentation on theme: "Nuance Overview and Tutorial David Hjelm 2003-03-20."— Presentation transcript:

Similar presentations

About project

Feedback