Intel Labs Self Localizing sensors and actuators on Distributed Computing Platforms Vikas Raykar Igor Kozintsev Igor Kozintsev Rainer Lienhart
Motivation Many multimedia applications are emerging which use multiple audio/video sensors and actuators. Microphones Cameras Speakers Displays DistributedCapture Distributed Rendering Other Applications Number Crunching
Applications Audio/Video Surveillance Hands free voice communication MultiChannel Speech Enhancement Smart Conference Rooms Audio/Image Based Rendering Object Localization And tracking Meeting Recording Distributed Audio Video Capture Interactive Audio Visual Interfaces MultiChannel Echo Cancellation Speech Recognition Source separation and Deverberation
Additional Motivation Current work has focused on setting up all the sensors and actuators on a single dedicated computing platform. Dedicated infrastructure required in terms of the sensors, multi-channel interface cards and computing power. Computing devices such as laptops, PDAs, tablets, cellular phones, camcorders have become pervasive. Audio/video sensors on different laptops can be used to form a distributed network of sensors. On the other hand…
Problem formulation Put all the distributed audio-visual I/O capabilities into a common time and space. In this paper: Focus on providing a common space by means of actively estimating the 3D positions of the sensors (microphones) and actuators (speakers). Account for the errors due to lack of temporal synchronization among various sensors and actuators (A/Ds and D/As) on distributed general purpose computing platforms.
Our View of Distributed Sensor Network X Y Z
Localization with known positions of speakers Distances are not exact There are more speakers
If positions of speakers are unknown… Consider M Microphones and S speakers. What can we measure? Distance between each speaker and all microphones (Time Of Flight) MxS TOF matrix Assume TOF corrupted by AWGN: can derive the ML estimate. Calibration signal
Nonlinear Least Squares Find the coordinates which minimizes this
Maximum Likelihood (ML) Estimate.. More rigorously, we can define a noise model and derive the ML estimate i.e. maximize the likelihood ratio Gaussian noise If noise is iid Gaussian ML is same as Least squares
Reference Coordinate System X axis Positive Y axis Origin Similarly in 3D 1.Fix origin (0,0,0) 2.Fix X axis (x1,0,0) 3.Fix Y axis (x2,y2,0) 4.Fix positive Z axis x1,x2,y2>0 Which to choose? Later…
Intel Labs On a synchronized platform all is well..
However on a Distributed system..
Intel Labs PC platform overview PCI Slots CPU AGP MCH ICH ATA LAN USB AC97 ICH, hub, PCI, LAN, etc. CPU, MCH, FSB, memory Operating system Multimedia/multistream applications Audio/video I/O devices I/O bus
t t Signal Emitted by source j Signal Received by microphone i Capture Started Playback Started Time Origin Timing on distributed system
Speaker Emission Start Times S Microphone Capture Start Times M -1 Assume tm_1=0 Microphone and speaker Coordinates DM+DS - [ D(D+1)/2 ] MS TOF Measurements Joint Estimation
Formulation same as above but less number of parameters. Time Difference of Arrival (TDOA)
Levenberg Marquadrat method Multidimensional function. Unless we have a good initial guess may not converge to the global minima. Approximate initial guess required. Nonlinear least squares
dot product matrix Symmetric positive definite rank 3 Given B can you get X ?....Singular Value Decomposition Multi Dimensional Scaling
Clustering approximation
i j i j i j Clustering approximation
k i j How to get dot product from the pair wise distance matrix
Later shift it to our orignal reference Slightly perturb each location of GPC into two to get the initial guess for the microphone and speaker coordinates Centroid as the origin
Sample result in 2D
Approx Distance matrix between GPCs Approx ts Approx tm Clustering Dot product matrix Dimension and coordinate system MDS to get approx GPC locations perturb TOF matrix Approx. microphone and speaker locations TDOA based Nonlinear minimization Microphone and speaker locations tm Algorithm
Gives the lower bound on the variance of any unbiased estimator. Does not depends on the estimator. Just the data and the noise model. Basically tells us to what extent the noise limits our performance i.e. you cannot get a variance lesser than the CR bound. Jacobian Rank deficit: remove the known parameters Cramer-Rao bound
Performance comparison
Dependence on number of nodes
Geometry matters
Mic 3 Mic 1 Mic 2 Mic 4 Speaker 1 Speaker 4 Speaker 2 Speaker 3 X Z Room Length = 4.22 m Room Width = 2.55 m Room Height = 2.03 m Experimental setup: bias 0.08 cm sigma 3.8 cm
Intel Labs Summary General purpose computers can be used for distributed array processing General purpose computers can be used for distributed array processing It is possible to define common time and space for a network of distributed sensors and actuators. It is possible to define common time and space for a network of distributed sensors and actuators. For more information please see our two papers in ACM MM in November or contact For more information please see our two papers in ACM MM in November or contact Let us know if you will be interested in testing/using out time and space synchronization software for developing distributed algorithms on GPC (available in November) Let us know if you will be interested in testing/using out time and space synchronization software for developing distributed algorithms on GPC (available in November)
Intel Labs Backup
Calibration signal
Results (contd.)