EEC-693/793 Applied Computer Vision with Depth Cameras

Slides:



Advertisements
Similar presentations
Page 1 | Microsoft Work With Depth Data Kinect for Windows Video Courses Jan 2013.
Advertisements

Advanced Audio Setup Troubleshooting your audio Users’ Reference.
5/19/2015 EEC484/584: Computer Networks 1 EEC-490 Senior Design (CE) Kinect Programming Tutorial 1 Wenbing Zhao
EEC-492/592 Kinect Application Development Lecture 15 Wenbing Zhao
Work With Skeleton Data
Human-human interaction Human-object interaction Human-computer interaction.
By Rishabh Maheshwari. Objective of today’s lecture Play Angry Birds in 3D.
EEC-693/793 Applied Computer Vision with Depth Cameras
Speech Detection Project 1. Outline Motivation Problem Statement Details Hints.
Page 1 | Microsoft Work With Color Data Kinect for Windows Video Courses Jan 2013.
Sound Chapter Types of Sound Waveforms MIDI Sound is related to many things in computers but only Wav and MIDI exist in PCs.
Page 1 | Microsoft Introduction to audio stream Kinect for Windows Video Courses.
EEC-492/592 Kinect Application Development
EEC-492/592 Kinect Application Development Lecture 10 Wenbing Zhao
Rujchai Ung-arunyawee Department of Computer Engineering Khon Kaen University.
Getting Started. XNA Game Studio 4.0 To download XNA Game Studio 4.0 itself, go to XNA Game.
Page 1 | Microsoft Work With Skeleton Data Kinect for Windows Video Courses Jan 2013.
Developing the Game User Interface (UI) Lesson 5.
1 EEC-492/592 Kinect Application Development Lecture 2 Wenbing Zhao
12/5/2015 EEC492/693/793 - iPhone Application Development 1 EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 4 Wenbing Zhao
KINECT FOR WINDOWS Ken Casada Developer Evangelist, Microsoft Switzerland | blogblog.
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao
2/16/2016 EEC492/693/793 - iPhone Application Development 1 EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 4 Wenbing Zhao
3/3/2016 EEC492/693/793 - iPhone Application Development 1 EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 4 Wenbing Zhao
EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 9 Wenbing Zhao
A particular student is not able to hear me?
EEC-492/592 Kinect Application Development
Introduction to Microsoft Kinect Sensor Programming
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-492/592 Kinect Application Development
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
Presentation transcript:

EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 11 Wenbing Zhao wenbing@ieee.org 1

Outline Using Kinect’s Microphone Array Verifying the Kinect audio configuration The Kinect SDK architecture for audio How Kinect processes audio signals Inside Kinect's microphone array Capturing and playing audio Processing audio data by suppressing and canceling noise Understanding sound source localization and beam formation

Verifying the Kinect audio configuration Navigate to Control Panel | Device Manager, look for the Kinect for Windows node, find Kinect for Windows Audio Array Control Also check Sound, video and game controllers node Sound driver for microphone array

Troubleshooting: Kinect USB Audio not recognizing While installing the SDK make sure the Kinect device is unplugged Before using Kinect, restart your system once the installation is done This will ensure that the skeleton tracking engine tracks only the skeleton that you have identified and ignores the others. It's always recommended to assign a dedicated USB Controller to the Kinect sensor

Using the Kinect microphone array with your computer Navigate to Control Panel | Sound and switch to the Recording tab, find Kinect's Microphone Array as a recording device, set it as default Can test it using Windows sound recorder This will ensure that the skeleton tracking engine tracks only the skeleton that you have identified and ignores the others.

The Kinect SDK architecture for Audio Applying smoothing on skeleton data could be very expensive in terms of application performance. This is because the skeleton stream data itself is massive, and applying smoothing filtering on it makes data processing slow and this highly depends on the parameters you are using

DirectX Media Object Controls Noise Suppression (NS) Acoustic Echo Cancellation (AEC) Automatic Gain Control (AGC) SDK exposes a set of Kinect SDK offers APIs to control these above features

Major focus area of Kinect audio Human speech recognition Recognize player’s voice despite loud noise and echos Identify the speech within a dynamic range of area While playing, the player could change his position, or Multiple players could be speaking from different directions

Why Microphone Array? The logic behind placing microphones in different places is to identify the following: The direction of the incoming sound The distance of the sound source => origin of the sound As all the microphones are placed in different positions, the sound will arrive at each of the microphones at different time intervals

Audio signal processing in Kinect Once the source and position of the sound is calculated, the audio-processing pipeline merges the signals of all microphones and produces a signal with high-quality sound Kinect can identify the sound within a range of +50 to -50 degrees (NOT radians as stated in the textbook!!!) KinectAudioSource class MaxSoundSourceAngle MinSoundSourceAngle

Audio signal processing in Kinect The SDK fires the SoundSourceAngleChanged event if there is any change in the source angle The SoundSourceAngleChangedEventArgs class contains two properties: The current source angle and the confidence that the sound source angle is correct The sound source angle identifies the direction (not the location) of a sound source The range of the angle is [-50, +50] degrees Confidence level Range: 0.0 (no confidence) to 1.0 (full confidence) KinectAudioSource class has the SoundSourceAngleConfidence property

Beam Forming Like the cone of light from a lighthouse where the light is the brightest, the audio capture hardware has an imaginary cone that is able to capture audio signals the best Audio waves that propagate through the length of the cone can be separated from audio waves that travel across the cone If you point the cone in the direction of the audio that your application is most interested in capturing, you can improve the ability to capture and separate that audio source from other competing audio sources Use the beam angle to set the direction of the imaginary cone to improve your ability to capture a specific audio source

Beam Angle The beam angle identifies a preferred direction for the sensor to listen The angle is one of the following values (in degrees): {-50, -40, -30, -20, -10, 0, +10, +20, +30, +40, +50} The sign determines direction: A negative number indicates the audio source is on the right side of the sensor (left side of the user) A positive value indicates the audio source is on the left side of the sensor (right side of the user) 0 indicates the audio source is centered in front of the sensor

Sound Source vs. Beam Angle The sound source angle and the beam angle are both defined from the sensor location. Both angles are defined in the x-z plane of the sensor perpendicular to the z-axis of the sensor. However, the two angles have completely different functionality The sound source angle identifies the direction (not the location) of a sound source. The range of the angle is [-50, +50] degrees The beam angle identifies a preferred direction for the sensor to listen. The angle is one of the following values (in degrees): {-50, -40, -30, -20, -10, 0, +10, +20, +30, +40, +50}.

Sound Source vs. Beam Angle Both angles (beam and sound source) are updated continuously once the sensor has started streaming audio data (when the Start method is called) Use the sound source angle to decide which beam angle to choose if you want to capture a particular sound source

Programming Audio Source Start/stop audio source The Kinect sensor must be in the running state in order to start the audio stream If you try to start the audio stream and the sensor is not running, the application will throw an InvalidOperationException Register sound source angle and beam angle events Specify audio processing parameters Manually control beam angle Recording audio this.sensor.AudioSource.Start(); this.sensor.AudioSource.Stop(); this.sensor.AudioSource.SoundSourceAngleChanged += soundSourceAngleChanged; this.sensor.AudioSource.BeamAngleChanged += beamAngleChanged;

Setting Audio Processing Parameters Audio data can be processed by interacting with DMO via KinectAudioSource class Echo cancellation CancellationOnly CancellationAndSuppression None (default setting) Noise suppression (enabled by default) Automatically gain control: amplify weak audio, disabled by default this.sensor.AudioSource.EchoCancellationMode = EchoCancellationMode. CancellationOnly; this.sensor.AudioSource.EchoCancellationSpeakerIndex = 0; this.sensor.AudioSource.NoiseSuppression = true; this.sensor.AudioSource.AutomaticGainControlEnabled = true;

Beam Angle Mode BeamAngleModel Enumeration Automatic and Adaptive: Kinect runtime sets the beam angle automatically Automatic: default setting, use this for a low-volume loudspeaker and/or isotropic background noise Adaptive: Use this for a high-volume loudspeaker and/or higher noise levels Manual: The user sets the beam angle to point in the direction of the audio source of interest The beam angle is located in the xz plane between the z-axis and the direction of an audio source set the angle using the ManualBeamAngle property this.sensor.AudioSource.BeamAngleMode = BeamAngleMode.Automatic; this.sensor.AudioSource.BeamAngleMode = BeamAngleMode.Manual; // Track the ManualBeamAngle on the right hand position this.sensor.AudioSource.ManualBeamAngle = Math.Atan(rightHand.Position.X / rightHand.Position.Z) * (180 / Math.PI);

Recording Audio RecordAudio() is blocking public void RecordAudio() { int recordingLength = (int)10 * 2 * 16000; byte[] buffer = new byte[1024]; Boolean startAudioStreamHere = false; using (FileStream fileStream = new FileStream(wavfilename, FileMode.Create)) { WriteWavHeader(fileStream, recordingLength); if (audioStream == null) { startAudioStreamHere = true; audioStream = this.sensor.AudioSource.Start(); } int count, totalCount = 0; while ((count = audioStream.Read(buffer, 0, buffer.Length)) > 0 && totalCount < recordingLength) fileStream.Write(buffer, 0, count); totalCount += count; if (startAudioStreamHere == true) this.sensor.AudioSource.Stop();

Recording Audio To avoid blocking the UI, use thread private void startBtn_Click(object sender, RoutedEventArgs e) { var audioThread = new Thread(new ThreadStart(RecordAudio)); audioThread.SetApartmentState(ApartmentState.MTA); audioThread.Start(); }

Recording Audio Helper methods static void WriteWavHeader(Stream stream, int dataLength) { using (var memStream = new MemoryStream(64)) { int cbFormat = 18; //sizeof(WAVEFORMATEX) WAVEFORMATEX format = new WAVEFORMATEX() { wFormatTag = 1,nChannels = 1, nSamplesPerSec = 16000, nAvgBytesPerSec = 32000, nBlockAlign = 2, wBitsPerSample = 16, cbSize = 0 }; using (var binarywriter = new BinaryWriter(memStream)) //RIFF header WriteString(memStream, "RIFF"); binarywriter.Write(dataLength + cbFormat + 4); WriteString(memStream, "WAVE"); WriteString(memStream, "fmt "); binarywriter.Write(cbFormat); //WAVEFORMATEX binarywriter.Write(format.wFormatTag); binarywriter.Write(format.nChannels); binarywriter.Write(format.nSamplesPerSec); binarywriter.Write(format.nAvgBytesPerSec); binarywriter.Write(format.nBlockAlign); binarywriter.Write(format.wBitsPerSample); binarywriter.Write(format.cbSize); //data header WriteString(memStream, "data"); binarywriter.Write(dataLength); memStream.WriteTo(stream); } Recording Audio Helper methods struct WAVEFORMATEX { public ushort wFormatTag; public ushort nChannels; public uint nSamplesPerSec; public uint nAvgBytesPerSec; public ushort nBlockAlign; public ushort wBitsPerSample; public ushort cbSize; } static void WriteString(Stream stream, string s) { byte[] bytes = Encoding.ASCII.GetBytes(s); stream.Write(bytes, 0, bytes.Length); }

Build KinectAudio App Create a new C# WPF project with name KinectAudio Add Microsoft.Kinect reference Design GUI Adding code

GUI Design Image MediaElement

Adding Code Import namespaces Add member variables: Register WindowLoaded event handler programmatically using Microsoft.Kinect; using System.IO; using System.Threading; KinectSensor sensor; WriteableBitmap colorBitmap; byte[] colorPixels; // Skeleton[] totalSkeleton = new Skeleton[6]; Stream audioStream; string wavfilename = "c:\\temp\\kinectAudio.wav"; public MainWindow() { InitializeComponent(); Loaded += new RoutedEventHandler(WindowLoaded); }

Adding Code: WindowLoaded private void WindowLoaded(object sender, RoutedEventArgs e) { this.sensor = KinectSensor.KinectSensors[0]; this.sensor.SkeletonStream.Enable(); this.sensor.ColorStream.Enable(); this.sensor.AllFramesReady += allFramesReady; this.colorPixels = new byte[this.sensor.ColorStream.FramePixelDataLength]; this.colorBitmap = new WriteableBitmap(this.sensor.ColorStream.FrameWidth, this.sensor.ColorStream.FrameHeight, 96.0, 96.0, PixelFormats.Bgr32, null); this.image1.Source = this.colorBitmap; this.sensor.AudioSource.SoundSourceAngleChanged += soundSourceAngleChanged; this.sensor.AudioSource.BeamAngleChanged += beamAngleChanged; // start the sensor. this.sensor.Start(); }

Adding Code: AudioStream Start/Stop Stop/start via button clicks private void startAudioStreamBtn_Click(object sender, RoutedEventArgs e) { audioStream = this.sensor.AudioSource.Start(); } private void stopAudioStreamBtn_Click(object sender, RoutedEventArgs e) this.sensor.AudioSource.Stop();

Adding Code: Audio Event Handlers void beamAngleChanged(object sender, BeamAngleChangedEventArgs e) { this.soundBeamAngle.Text = e.Angle.ToString(); } void soundSourceAngleChanged(object sender, SoundSourceAngleChangedEventArgs e) this.soundSourceAngle.Text = e.Angle.ToString(); this.confidenceLevel.Text = e.ConfidenceLevel.ToString();

Adding Code: AllFramesReady Event Handler void allFramesReady(object sender, AllFramesReadyEventArgs e) { using (ColorImageFrame imageFrame = e.OpenColorImageFrame()) if (null == imageFrame) return; imageFrame.CopyPixelDataTo(colorPixels); int stride = imageFrame.Width * imageFrame.BytesPerPixel; this.colorBitmap.WritePixels( new Int32Rect(0, 0, this.colorBitmap.PixelWidth, this.colorBitmap.PixelHeight), this.colorPixels, stride, 0); }

Recording/Replay Button Click Events private void startBtn_Click(object sender, RoutedEventArgs e) { var audioThread = new Thread(new ThreadStart(RecordAudio)); audioThread.SetApartmentState(ApartmentState.MTA); audioThread.Start(); } private void playBtn_Click(object sender, RoutedEventArgs e) if (!string.IsNullOrEmpty(wavfilename) && File.Exists(wavfilename)) kinectaudioPlayer.Source = new Uri(wavfilename, UriKind.RelativeOrAbsolute); kinectaudioPlayer.LoadedBehavior = MediaState.Play; kinectaudioPlayer.UnloadedBehavior = MediaState.Close;

Handle Checkbox Events Must register both checked and unchecked event handlers for each checkbox private void noiseSuppression_Checked(object sender, RoutedEventArgs e) { this.sensor.AudioSource.NoiseSuppression = true; } private void echoCancellation_Checked(object sender, RoutedEventArgs e) this.sensor.AudioSource.EchoCancellationMode = EchoCancellationMode.CancellationOnly; this.sensor.AudioSource.EchoCancellationSpeakerIndex = 0; private void gainControl_Checked(object sender, RoutedEventArgs e) this.sensor.AudioSource.AutomaticGainControlEnabled = true;

Handle Checkbox Events private void gainControl_Unchecked(object sender, RoutedEventArgs e) { this.sensor.AudioSource.AutomaticGainControlEnabled = false; } private void echoCancellation_Unchecked(object sender, RoutedEventArgs e) this.sensor.AudioSource.EchoCancellationMode = EchoCancellationMode.None; private void noiseSuppression_Unchecked(object sender, RoutedEventArgs e) this.sensor.AudioSource.NoiseSuppression = false;

EEC492/693/793 - iPhone Application Development Challenge Tasks For advanced students, improve the KinectAudio app in the following ways: Add ProgressBar for recording and replaying Draw a vertical line in the color image indicating the audio source angle, and another line indicating the beam angle Manually specify the beam angle to a particular joint, such as the right hand Observe the sound source angle with the new beam angle 6/16/2018 EEC492/693/793 - iPhone Application Development