Presentation is loading. Please wait.

Presentation is loading. Please wait.

EEC-693/793 Applied Computer Vision with Depth Cameras

Similar presentations


Presentation on theme: "EEC-693/793 Applied Computer Vision with Depth Cameras"— Presentation transcript:

1 EEC-693/793 Applied Computer Vision with Depth Cameras
Lecture 12 Wenbing Zhao 1

2 Outline Speech Recognition How speech recognition works
Exploring Microsoft Speech API (SAPI) Creating your own grammar and choices for the speech recognition engine Draw What I Want – building a speech-enabled application

3 How Speech Recognition Works
Kinect microphone array captures the audio stream, and convert the analog audio into digital sound signals The audio sound signals are sent to the speech recognition engine for recognition The acoustic model of the speech recognition engine analyzes the audio and converts the sound into a number of basic speech elements, phonemes Then, the language model is used to analyze the content of the speech and match the word by combining the phonemes with a building dictionary Context sensitive

4 How Speech Recognition Works

5 Types of Speech Recognition
Command mode; you say a command at a time for the speech recognition engine to recognize Sentence mode / diction mode: you say a sentence to perform an operation, e.g., mirror the shape

6 Microsoft Speech API Kinect SDK comes with the Microsoft Kinect speech recognition language pack

7 SpeechRecognitionEngine Class
The InstalledRecognizers method of the speechRecognitionEngine class returns the lists of installed recognizers in the system, and we can filter them out based on the recognizer ID The SpeechRecognitionEngine class accepts an audio stream from the Kinect sensor and processes it The SpeechRecognitionEngine class raises a sequence of events when the audio stream is detected: SpeechDetected is raised if the audio appears to be a speech SpeechHypothesized then fires multiple times when the words are tentatively detected. Finally SpeechRecognized is raised when the recognizer finds the speech If the speech is detected but does not match properly or is of very low confidence level, the SpeechRecognitionRejected event handler will fire.

8 Steps for building speech-enabled apps
Enable the Kinect audio source Start capturing the audio data stream Identify the speech recognizer Define the grammar for the speech recognizer Start the speech recognizer Attach the speech audio source to the recognizer Register the event handler for speech recognition Handle the different events invoked by the speech recognition engine

9 Identify the speech recognizer
private static RecognizerInfo GetKinectRecognizer() { foreach (RecognizerInfo recognizer in SpeechRecognitionEngine.InstalledRecognizers()) string value; recognizer.AdditionalInfo.TryGetValue("Kinect", out value); if ("True".Equals(value, StringComparison.OrdinalIgnoreCase) && "en-US". Equals(recognizer.Culture.Name, StringComparison.OrdinalIgnoreCase)) return recognizer; } return null; RecognizerInfo recognizerInfo = GetKinectRecognizer();

10 Define grammar for the speech recognizer
Using choice and GrammarBuilder Multiple sets of choices can be added in GrammarBuilder Creating grammar from GrammarBuilder Loading grammar into speech recognizer var colorObjects = new Choices(); colorObjects.Add("red"); colorObjects.Add("green"); colorObjects.Add("blue"); colorObjects.Add("yellow"); colorObjects.Add("gray"); // New Grammar builder for color grammarBuilder.Append(colorObjects); // Another Grammar Builder for object grammarBuilder.Append(new Choices("circle", "square", "triangle", "rectangle")); // Create Grammar from GrammarBuilder var grammar = new Grammar(grammarBuilder);

11 Define grammar for the speech recognizer
Can also build grammar using XML SrgsDocument grammarDoc = new SrgsDocument("mygrammar.xml"); Grammar grammar = new Grammar(grammarDoc); <?xml version="1.0" encoding="UTF-8" ?> <grammar version="1.0" xml:lang="en-US" xmlns=" tag-format="semantics/1.0" root="Main"> <rule id="color" scope="public"> <one-of> <item>red</item> <item>green</item> <item>blue</item> </one-of> </rule> </grammar>

12 Define grammar for the speech recognizer
Multiple grammars can be loaded to the recognizer

13 Define grammar for the speech recognizer
private void BuildGrammarforRecognizer(RecognizerInfo recognizerInfo) { var grammarBuilder = new GrammarBuilder { Culture = recognizerInfo.Culture }; // first say Draw grammarBuilder.Append(new Choices("draw")); var colorObjects = new Choices(); colorObjects.Add("red"); colorObjects.Add("green"); colorObjects.Add("blue"); colorObjects.Add("yellow"); colorObjects.Add("gray"); // New Grammar builder for color grammarBuilder.Append(colorObjects); // Another Grammar Builder for object grammarBuilder.Append(new Choices("circle", "square", "triangle", "rectangle")); // Create Grammar from GrammarBuilder var grammar = new Grammar(grammarBuilder); // Creating another Grammar and load var newGrammarBuilder = new GrammarBuilder(); newGrammarBuilder.Append("close the application"); var grammarClose = new Grammar(newGrammarBuilder);

14 // Start the speech recognizer
speechEngine = new SpeechRecognitionEngine(recognizerInfo.Id); speechEngine.LoadGrammar(grammar); // loading grammer into recognizer speechEngine.LoadGrammar(grammarClose); // Attach the speech audio source to the recognizer int SamplesPerSecond = 16000; int bitsPerSample = 16; int channels = 1; int averageBytesPerSecond = 32000; int blockAlign = 2; speechEngine.SetInputToAudioStream( audioStream, new SpeechAudioFormatInfo(EncodingFormat.Pcm, SamplesPerSecond, bitsPerSample, channels, averageBytesPerSecond, blockAlign, null)); // Register the event handler for speech recognition speechEngine.SpeechRecognized += speechRecognized; speechEngine.SpeechHypothesized += speechHypothesized; speechEngine.SpeechRecognitionRejected += speechRecognitionRejected; speechEngine.RecognizeAsync(RecognizeMode.Multiple); } RecognizeAsync(): performs a single, asynchronous recognition operation. The recognizer performs the operation against its loaded and enabled speech recognition grammars

15 Handle the different events invoked by the speech recognition engine
private void speechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e) { } private void speechHypothesized(object sender, SpeechHypothesizedEventArgs e) { wordsTenative.Text = e.Result.Text; } private void speechRecognized(object sender, SpeechRecognizedEventArgs e) wordsRecognized.Text = e.Result.Text; confidenceTxt.Text = e.Result.Confidence.ToString(); float confidenceThreshold = 0.6f; if (e.Result.Confidence > confidenceThreshold) CommandsParser(e);

16 private void CommandsParser(SpeechRecognizedEventArgs e)
{ var result = e.Result; Color objectColor; Shape drawObject; System.Collections.ObjectModel.ReadOnlyCollection<RecognizedWordUnit> words = e.Result.Words; if (words[0].Text == "draw") string colorObject = words[1].Text; switch (colorObject) case "red": objectColor = Colors.Red; break; case "green": objectColor = Colors.Green; case "blue": objectColor = Colors.Blue; case "yellow": objectColor = Colors.Yellow; case "gray": objectColor = Colors.Gray; default: return; }

17 var shapeString = words[2].Text;
switch (shapeString) { case "circle": drawObject = new Ellipse(); drawObject.Width = 100; drawObject.Height = 100; break; case "square": drawObject = new Rectangle(); drawObject.Width = 100; drawObject.Height = 100; case "rectangle": drawObject.Width = 100; drawObject.Height = 60; case "triangle": var polygon = new Polygon(); polygon.Points.Add(new Point(0, 30)); polygon.Points.Add(new Point(-60, -30)); polygon.Points.Add(new Point(60, -30)); drawObject = polygon; default: return; }

18 canvas1.Children.Clear();
drawObject.SetValue(Canvas.LeftProperty, 80.0); drawObject.SetValue(Canvas.TopProperty, 80.0); drawObject.Fill = new SolidColorBrush(objectColor); canvas1.Children.Add(drawObject); } if (words[0].Text == "close" && words[1].Text == "the" && words[2].Text == "application") { this.Close();

19 Build KinectAudio App Create a new C# WPF project with name DrawShapeFromSpeech Add Microsoft.Kinect reference Add Microsoft.Speech (not System.Speech!!!) Design GUI Adding code

20 Add Microsoft.Speech assembly

21 GUI Design Canvas

22 Adding Code Import namespaces Add member variables:
using Microsoft.Kinect; using Microsoft.Speech.Recognition; using Microsoft.Speech.AudioFormat; using System.IO; Import namespaces Add member variables: Register WindowLoaded event handler programmatically KinectSensor sensor; Stream audioStream; SpeechRecognitionEngine speechEngine; public MainWindow() { InitializeComponent(); Loaded += new RoutedEventHandler(WindowLoaded); }

23 Adding Code: WindowLoaded
private void WindowLoaded(object sender, RoutedEventArgs e) { this.sensor = KinectSensor.KinectSensors[0]; this.sensor.Start(); audioStream = this.sensor.AudioSource.Start(); RecognizerInfo recognizerInfo = GetKinectRecognizer(); if (recognizerInfo == null) MessageBox.Show("Could not find Kinect speech recognizer"); return; } BuildGrammarforRecognizer(recognizerInfo); // provided earlier statusBar.Text = "Speech Recognizer is ready";

24 Adding Code: code provided earlier
Add event handler for speechHypothesized Add event handler for speechRecognized CommandsParser() is invoked, which draws the shape spoken You can close the app by saying: close the application Add event handler for speechRecognitionRejected empty

25 EEC492/693/793 - iPhone Application Development
Challenge Tasks For advanced students, improve the app in the following ways: Enable both color image and skeleton data streams Display color image frames (but not the skeleton) Modify the grammar such that you can add a particular shape to a particular joint location E.g., draw a red circle at the right eye 4/17/2017 EEC492/693/793 - iPhone Application Development


Download ppt "EEC-693/793 Applied Computer Vision with Depth Cameras"

Similar presentations


Ads by Google