Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld.

Similar presentations


Presentation on theme: "Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld."— Presentation transcript:

1 Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld

2 Objectives Analysis and exploration of Voice-Recognition systems, the abilities of such systems and its limitations Understanding the Windows architecture and programming concepts Development and implementation of a tool that enables users to execute voice commands in a Windows environment, including the restructuring of a graphic interface (GUI) of the tool. Learning the Microsoft Speech SDK 5.1 (Software Development Kit) and its speech engine

3 Project skills C++ programming skills XML (Extensible Markup Language) programming skills Programming in windows environment include API (Application Programming Interface) commands

4 Brief history 1994 - Release of Dragon Systems' “DragonDictate” for Windows 1.0, using discrete speech recognition technology 1996 - Introduction of IBM’s “MedSpeak”, being the first continuous speech recognition software 1997 - Dragon Systems’ “NaturallySpeaking” first general-purpose continuous speech software program Two months later IBM release it’s “ViaVoice” 2005 – Due to improvements in PC’s process time and in the algorithms used - today there are several speech recognition programs in the market.

5 Voice recognition Voice recognition follows these steps: 1. Spoken words enter a microphone 2. Audio is processed by the computer's sound card 3. The software discriminates between lower-frequency vowels and higher-frequency consonants and compares the results with phonemes, the smallest building blocks of speech The software then compares results to groups of phonemes, and then to actual words, determining the most likely match 4. The sentence is transferred to a word processing application

6 Architecture Voice command by the user SAPI 5.1 Speech Application Program Interface Commands execution using API functions Processing the recognized commands by C++/XML code

7 GUI Execution file - spik.exe The GUI - A window that receives the voice commands from the user. This GUI has been built in C++ using the basic “Windown” class.

8 Sapi 5.1 The SAPI provides a high-level interface between the application and the speech engine The TTS (Text-To-Speech) system synthesize text strings and files into spoken audio Speech Speech recognizers convert human spoken audio into readable text strings

9 Processing Main function contains the infinite loop waiting for messages to process Main window procedure that handles the messages to the window Execute commands that have been identified by the speech engine Microsoft Speech Engine API functions

10 Commands Execution Windows API is a set of Application Programming Interfaces available in the Microsoft Windows operating systems which enable developers to create software The API consists of C functions implemented in dynamically linked libraries (DLLs), mainly in core DLLs - kernel32.dll, user32.dll and gdi32.dll Main API functions we have used: CreateProcess()– runs executable files WinExec() – runs windows procedures ShellExecute() – runs URL files ShowWindow() – sets the specified window's show state SendMessage() – sends the specified message to a window or windows keybd_event() – synthesizes a keystroke PostMessage() – places (posts) a message in the message queue associated with the thread that created the specified window

11 The Code קבצי קוד מקור בשפת ++C קבצי Header של התוכנית קובץ תוכנית הרצה קובץ טקסט בפורמט XML לשימוש מנוע זיהוי הקול קובץ טקסט המכיל מחרוזות לשימוש התוכנית קובץ מקומפל לשימוש מנוע זיהוי הקול קבצי Header של מנוע זיהוי הקול

12 Adaptation & Training The speech recognition engine adapts itself to the user’s voice, vocabulary and speech style in order to improve speech recognition accuracy After adaptation there will be only ¼ of recognition errors and the accuracy will rise As more training is being done, accuracy will rise to around 95%.

13 Voice command example Calculator usage: Say the voice command “Open Calculator” To run the calc.exe program Say a simple exercise And than say “Equal” or “Result” To show the solution

14 Voice command example Run programs - notepad command line Internet usage - search google Windows navigation - my documents system properties start menu screen saver

15 Added value of the project Advanced versions based on Spik v1.0 will be a helpful tool for using the computer and the web, for physically challenged population

16 Future Development Advanced OS navigation in order to eliminate the use of the keyboard Adding Speech-to-Text capabilities Improved GUI to let users enter their own voice commands

17 Q&A


Download ppt "Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld."

Similar presentations


Ads by Google