Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By.

Similar presentations


Presentation on theme: "Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By."— Presentation transcript:

1 Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By By Hesheng Li Hesheng Li Instructor: Dr.Kepuska Department of Electrical and Computer Engineering

2 Speech Processing and Recognition 2 © Florida Institute of Technology Overview   Introduction   Three models to access live audio data   How to get audio data by using low level API model?   Application in speech recognition   Comparison and Analysis   Conclusion

3 Speech Processing and Recognition 3 © Florida Institute of Technology Introduction Why ? How ? Live audio data access has a Wide application !

4 Speech Processing and Recognition 4 © Florida Institute of Technology Three model to access live audio data   High level Digital Audio API-----MCI   DirectSound   Low level Digital Audio API----WaveX

5 Speech Processing and Recognition 5 © Florida Institute of Technology High level Digital Audio API MCI   MCI The media control interface (MCI) provides standard command for playing multi-media device and recording multi-media resource files   Two different ways are possible to send devices a command. 1. Command message interface 2. Command string interface

6 Speech Processing and Recognition 6 © Florida Institute of Technology Command message interface   Passing binary values and structures to an Audio device is referred to as using the "Command message interface“   We use the function mciSendCommand() to send commands using this approach.   Example waveParams.lpstrElementName = "C:\\WINDOWS\\CHORD.WAV"; mciSendCommand(0, MCI_OPEN, MCI_WAIT|MCI_OPEN_ELEMENT|MCI_OPEN_TYPE| MCI_OPEN_TYPE_ID, (DWORD)(LPVOID)&waveParams)

7 Speech Processing and Recognition 7 © Florida Institute of Technology Command string interface   Passing strings to an Audio device is referred to as using the "Command string interface“   We use the function mciSendString() to send commands using this approach.   Example mciSendString( “ open C:\\WINDOWS\\CHORD.WAV type waveaudio alias A_Chord", 0, 0, 0)))

8 Speech Processing and Recognition 8 © Florida Institute of Technology MCI Some other command: Command message interface: 1.Start record by “MCI _REOCRD” 2.Write data to wave file by “MCI _SAVE” 3.Stop by “MCI _STOP” 4.Play by “MCI_PLAY” Command string interface: 1.Play by "play %s %s %s" 2.Stop by “ stop %s %s %s"

9 Speech Processing and Recognition 9 © Florida Institute of Technology DirectSound   Like other components of DirectX,DirectSound allow you to use the hardware in the most efficient way Here are some other things that DirectSound makes easy:    Querying hardware capabilities at run time to determine the best solution for any given personal computer configuration   Using property sets so that new hardware capabilities can be exploited even when they are not directly supported by DirectSound   Low-latency mixing of audio streams for rapid response   Implementing three dimensional (3-D) sound

10 Speech Processing and Recognition 10 © Florida Institute of Technology Directsound   DirectSound playback is built on the IDirectSoundIDirectSound Component Object Model (COM) interface and on the IDirectSoundBuffer interface for manipulating soundIDirectSoundBuffer buffers.   DirectSound capture is based on the IDirectSoundCaptureIDirectSoundCapture and IDirectSoundCaptureBuffer COM interfaces.IDirectSoundCaptureBuffer

11 Speech Processing and Recognition 11 © Florida Institute of Technology Low level Digital Audio API----WaveX Open audio device Open audio device Prepare structure Prepare structure for recording for recording Start recording Data processing Release structure Close audio device

12 Speech Processing and Recognition 12 © Florida Institute of Technology Open Audio Device   There are several different approaches you can take, depending upon how fancy and flexible you want your program to be. 1. 1. Pass the value ”Wave mapper ” to open "preferred audio input/output device. 2. 2. Call function to get the list of the devices and then open the audio device which one you want 3. 3. WaveInOpen() and WaveOutOpen()

13 Speech Processing and Recognition 13 © Florida Institute of Technology EXAMPLE result = waveInOpen(&outHandle, WAVE_MAPPER, result = waveInOpen(&outHandle, WAVE_MAPPER, &waveFormat, &waveFormat, (DWORD)myWindow, (DWORD)myWindow, 0,CALLBACK_WINDOW); 0,CALLBACK_WINDOW); if (result) if (result) { printf("There was an error opening the preferred Digital Audio in device!\r\n"); } { printf("There was an error opening the preferred Digital Audio in device!\r\n"); }

14 Speech Processing and Recognition 14 © Florida Institute of Technology EXAMPLE iNumDevs = waveInGetNumDevs(); for (i = 0; i < iNumDevs; i++) { if (!waveOutGetDevCaps(i, &woc, sizeof(WAVEOUTCAPS))) if (!waveOutGetDevCaps(i, &woc, sizeof(WAVEOUTCAPS))) { printf("Device ID #%u: %s\r\n", i, woc.szPname); } } { printf("Device ID #%u: %s\r\n", i, woc.szPname); } } result = waveInOpen(&outHandle,iNumDevs,&waveFormat, (DWORD)myWindow, (DWORD)myWindow, 0,CALLBACK_WINDOW); 0,CALLBACK_WINDOW); Return

15 Speech Processing and Recognition 15 © Florida Institute of Technology Structure wavefomatex WFomatTag PCM, Mulaw, Aulaw nChannelsMono,Stereo nSamplePer Sec Sample rates,ie 8000HZ navgBytePe rSec Average data-transfer rate nBlockAlig n Minimum atomic unit of data wBitsPerSa mple 8bits or 16bits per sample cbSize Extra format information

16 Speech Processing and Recognition 16 © Florida Institute of Technology Example WAVEFORMATEX waveFormat; /* Initialize the WAVEFORMATEX for 16-bit, 44KHz, stereo */ waveFormat.wFormatTag = WAVE_FORMAT_PCM; waveFormat.nChannels = 2; waveFormat.nSamplesPerSec = 44100; waveFormat.wBitsPerSample = 16; waveFormat.nSamplesPerSec = 44100; waveFormat.wBitsPerSample = 16; waveFormat.nBlockAlign =waveFormat.nChannels* waveFormat.nBlockAlign =waveFormat.nChannels* (waveFormat.wBitsPerSample/8); waveFormat.nAvgBytesPerSec=waveFormat.nSamplesPerSec * (waveFormat.wBitsPerSample/8); waveFormat.nAvgBytesPerSec=waveFormat.nSamplesPerSec * waveFormat.nBlockAlign; waveFormat.nBlockAlign; waveFormat.cbSize = 0; waveFormat.cbSize = 0; Return

17 Speech Processing and Recognition 17 © Florida Institute of Technology Recording engine buffer1buffer2buffer3buffer4 Call back function Data proccesing AddInBuffer() waveInStart() Audio device msg

18 Speech Processing and Recognition 18 © Florida Institute of Technology Recording engine buffer2buffer3buffer4buffer1 Call back function Data processing Data processingmsg Audio device Circular buffer

19 Speech Processing and Recognition 19 © Florida Institute of Technology 1+3+1 Three Important methods:   prepare a buffer for wave-audio input function: WaveInPrepareHeader()   Send the buffer to audio device,when the buffer is full the application is notified function: WaveInAddBuffer()   Start recording function: WaveInStart()

20 Speech Processing and Recognition 20 © Florida Institute of Technology Example if(MMSYSERR_NOERROR != waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR))) waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR))) { printf(“prepare buffer faliure!”) printf(“prepare buffer faliure!”)} waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR)); waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR)); waveInStart(m_hWaveIn); waveInStart(m_hWaveIn);

21 Speech Processing and Recognition 21 © Florida Institute of Technology Message   Windows messages: MM_WIM_DATA:this message is sent to a window when the data is present in the buffer and buffer is being returned to the application Other messages : MM_WIM_CLOSE 、 MM_WIM_OPEN 、 MM_WOM_CLOSE MM_WOM_DONE 、 MM_WOM_OPEN   Call back function messages: WIM_DATA : this message is sent to the given call back function when the data is present in the input buffer and the buffer is being returned to the application Other messages: WIM_CLOSE 、 WIM_DONE 、 WIN_OPEN 、 WOM_CLOSE 、 WOM_DONE 、 WOM_OPEN

22 Speech Processing and Recognition 22 © Florida Institute of Technology Message Example   Call back message waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, waveInProc, 0L, CALLBACK_FUNCTION ) waveInProc, 0L, CALLBACK_FUNCTION ) waveInProc(…..) { switch(msg) { case WIM_OPEN: …………. break, break, case WIM_DATA: …………. break, break, case WIM_CLOSE: …………   Window message waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, hWnd, 0L, CALLBACK_WINDOW ) hWnd, 0L, CALLBACK_WINDOW ) Return

23 Speech Processing and Recognition 23 © Florida Institute of Technology Application in Real-time Key Word Recognition To be continued….

24 Speech Processing and Recognition 24 © Florida Institute of Technology Application in Real-time Key Word Recognition  Practical problems when we apply this model in speech recognition 1. Asynchronism Asynchronism 2. Efficiency

25 Speech Processing and Recognition 25 © Florida Institute of Technology Application in Real-time Key Word Recognition buffer2 Call back function Data proccessing buffer3buffer4buffer500 …. msg CALL buffer1

26 Speech Processing and Recognition 26 © Florida Institute of Technology Comparison and Analysis  Mci is the easiest model,very convenient,but offers the least amount control,”FileLevel”  waveX is more complicit,but can flexible control audio data,”BufferLevel”  Direct sound is the most efficient method,but most complicit, ”BufferLevel”

27 Speech Processing and Recognition 27 © Florida Institute of Technology Conclusion  Apply MCI to audio document part in “video conference”  Apply WaveX to real time speech recognition and also to “video conference”  Direct sound is widely used in computer game design


Download ppt "Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By."

Similar presentations


Ads by Google