Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Listening in Silicon

Similar presentations

Presentation on theme: "Machine Listening in Silicon"— Presentation transcript:

1 Machine Listening in Silicon
Part of: “Accelerated Perception & Machine Learning in Stochastic Silicon” project

2 Who? UIUC: Students: M. Kim, J. Choi, A. Guzman-Rivera, G. Ko, S. Tsai, E. Kim. Faculty: Paris Smaragdis, Rob Rutenbar, Naresh Shanbhag Intel: Jeff Parkhurst Ryszard Dyrga, Tomasz Szmelczynski – Intel Technology Poland Georg Stemmer – Intel, Germany Dan Wartski, Ohad Falik – Intel Audio Voice and Speech (AVS), Israel Students / Intel contacts

3 Project overview Motivating ideas: Machine Listening component:
Make machines that can perceive Use stochastic hardware for stochastic software Discover new modes of computation Machine Listening component: Perceive == Listen Escape local optimum of Gaussian/MSE/ℓ2

4 Machine Listening? Making systems that understand sound
Think computer vision, but for sound Broad range of fundamentals and applications Machine learning, DSP, psychoacoustics, music, … Speech, media analysis, surveying, monitoring, … What can we gather from this?

5 Machine listening in the wild
Some of this work is already in place Mostly projects on recognition and detection More apps in medical, mechanical, geological, architectural, … Highlight discovery In videos Incident discovery in streets Surveillance for emergencies

6 And there’s more to come
The CrowdMic project “PhotoSynth for audio”, construct audio recordings from crowdsourced audio snippets Collaborative audio devices Harnessing the power of untethered open mics E.g. conf-call using all phones and laptops in room

7 The Challenge Today is all about small form factors
We all carry a couple of mics in our pockets, but we don’t carry the vector processors they need! Can we come up with new better systems? Which run on more efficient hardware? And perform just as well, or better?

8 The Testbed: Sound Mixtures
Sound has a pesky property, additivity We almost always observe sound mixtures Models for sound analysis are “monophonic” Designed for isolated, clean sounds So we like to first extract and then process + + =

9 Focusing on a single sound
There’s no shortage of methods (they all suck by the way) But these are computationally some of the most demanding algorithms in audio processing So we instead catered to a different approach that would be a good fit for hardware i.e. Rob told me that he can do MRFs fast

10 A bit of background We like to visualize sounds as spectrograms
2D representations of energy over time and frequency For multiple mics we observe level differences These are known as ILDs (Interaural Level Differences)

11 Finding sources For each spectrogram pixel we take an ILD
And plot their histogram Each sound/location will produce a mode

12 And we use these as labels
Assign each pixel to a source et voila But it looks a little ragged

13 Thus a Markov Random Field
Each pixel is a node that influences its neighbors Incorporates ILDs and smoothness constraints Makes my hardware friends happy

14 Binary Mask: Which freq’s belong to which source at each time point?
The whole pipeline Spectrograms Binary, pairwise MRF time freq Observe: ILDs RIGHT time freq source0 Inference LEFT Sequential tree-reweighted message passing, TRW-S ~15dB SIR boost Binary Mask: Which freq’s belong to which source at each time point? source1

15 Nodes: Data cost Edges: Smoothness cost
Reusing the same core Oh, and we use this for stereo vision too d s ( x s ) x s x t Iteration Obj. Sequential tree-reweighted message passing 3D depth map by MRF MAP inference Markov Random Field Nodes: Data cost Edges: Smoothness cost Per pixel depth info

16 Performance Result: Single Frame
It’s also pretty fast Our work outperforms up-to-date GPU implementations Tsukuba (384x288,16) Real-time BP [Yang 2006] Tile-based BP [Liang 2011] Fast BP [Xiang 2012] Our work GPU NVIDIA GeForce 7900 GTX NVIDIA GeForce 8800 GTS NVIDIA GeForce GTX 260 N/A # Iteration (4 scales) = (5,5,10,2) (B, TI, TO) = (12, 20, 5) (3 scales) = (9,6,2) TO = 5 Time (msec) 80.8 97.3 61.4 26.10 Min. Energy 396,953 393,434 Performance Result: Single Frame Sequential tree-reweighted message passing, TRW-S

17 Error Resilient MRF Inference via ANT
And we made it error resilient Algorithmic Noise Tolerance Power saving by ANT Complexity overhead = 45% Estim.: 42 % at Vdd = 0.75V Error Resilient MRF Inference via ANT

18 Back to source separation again
ILDs suffer front-back confusion and require some distance between the microphones So we also added Interaural Phase Differences (IPD)

19 Why add IPDs? They work best when ILDs fail
E.g. when sensors are far apart 30cm 1cm 15cm Input ILD IPD Joint

20 Adding one more element
Incorporated NMF-based denoisers Systems that learn by example what to separate

21 So what’s next? Porting the whole system in hardware
We haven’t ported the front-end yet Evaluating the results with speech recognition Extending this model to multiple devices As opposed to one device with multiple mics

22 Relevant publications
Kim, Smaragdis, Ko, Rutenbar. Stereophonic Spectrogram Segmentation Using Markov Random Fields, in IEEE Workshop for Machine Learning in Signal Processing, 2012 Kim & Smaragdis. Manifold Preserving Hierarchical Topic Models for Quantization and Approximation, in International Conference on Machine Learning, 2013 Kim & Smaragdis Single Channel Source Separation Using Smooth Nonnegative Matrix Factorization with Markov Random Fields, in IEEE Workshop for Machine Learning in Signal Processing, 2013 Kim & Smaragdis. Non-Negative Matrix Factorization for Irregularly-Spaced Transforms, in IEEE Workshop for Applications of Signal Processing in Audio and Acoustics, 2013 Traa & Smaragdis Blind Multi-Channel Source Separation by Circular-Linear Statistical Modeling of Phase Differences, in IEEE International Conference on Acoustics, Speech and Signal Processing,  2013 Choi, Kim, Rutenbar, Shanbhag. Error Resilient MRF Message Passing Hardware for Stereo Matching via Algorithmic Noise Tolerance, IEEE Workshop on Signal Processing Systems, 2013 Zhang, Ko, Choi, Tsai, Kim, Rivera, Rutenbar, Smaragdis, Park, Narayanan, Xin, Mutlu , Li, Zhao, Chen, Iyer. EMERALD: Characterization of Emerging Applications and Algorithms for Low-power Devices, 2013 IEEE International Symposium on Performance Analysis of Systems and Software, 2013

Download ppt "Machine Listening in Silicon"

Similar presentations

Ads by Google