Presentation is loading. Please wait.

Presentation is loading. Please wait.

“SoundSense: Scalable Sound Sensing for People-Centric Applications on Mobile Phones” Authors: Hong Lu, Wei Pan, Nicholas D. Lane, Tanzeem Choudhury and.

Similar presentations


Presentation on theme: "“SoundSense: Scalable Sound Sensing for People-Centric Applications on Mobile Phones” Authors: Hong Lu, Wei Pan, Nicholas D. Lane, Tanzeem Choudhury and."— Presentation transcript:

1 “SoundSense: Scalable Sound Sensing for People-Centric Applications on Mobile Phones” Authors: Hong Lu, Wei Pan, Nicholas D. Lane, Tanzeem Choudhury and Andrew T. Campbell, Department of Computer Science Dartmouth College Published in: ACM Conference on Mobile Systems, Applications, and Services (MobiSys '09), June 22-25, 2009 Benjamin Stokes, Presenter -- 3/28/11 For CS 546: Intelligent Embedded Systems

2 SoundSense: Overview …a framework for: real-time activity inference using audio without servers. Microphone sensor is often overlooked for inference (but heavily used for comm)

3 Design Goals 1.At Scale -- to a large number of users; avoid burdensome training requirements 2.With Diverse Audio Environments (robust) 3.While Respecting the Device – do not interfere with the phone’s operation

4 Noteworthy Constraints Privacy: people are sensitive to capturing phone calls –so limit server-side processing Audio codecs: optimized for human voice, typically sampled at ~8khz. Cannot capture above 4Khz Physics of audio: often layered and remixed –loudest (highest energy) sound dominates –TV audio v. Live audio -- not easily differentiated

5 What does it look like? http://www.youtube.com/watch?v=aAKplAaPAHE

6 (the same people who made CenseMe) Who is behind this?

7 Architecture

8 Pre-processing… drops frames that are too quiet, or too much entropy (e.g., white noise). non-overlapping frames: to minimize CPU burden. 64ms frame length (speech recognition is 25-46ms). Once event recognized, admit 5 seconds of silence – since this might be part of a conversation.

9 Coarse Category Classification Feature extraction: 8 methods Decision tree classifier: markov models for each of the three sound event categories. Train w/559 sound clips (1 GB) – labeled manually

10 Stage 2: Fine-grained (by category) Unsupervised learning to discover significant sounds (each into a “bin”) Additional classifiers, esp. Mel Frequency Cepstral Coefficient (MFCC) – mimics human ear SoundRank: to determine if a sound is “interesting” (frequency > 40 mins; duration) Allow users to label interesting ones; hide private Expunge old & uninteresting

11 Implementation: Refining the Decision Tree Classifier (First Layer) (1)Tree learned to 17-node with a depth of 6 levels – see right (2)Continuous 8 kHZ, 16-bit, mono audio samples. Each frame = 512 samples. 3- frame buffer. (3)Jail-broken iphones to allow for background processing (presumably now possible) (4)Power savings by reducing processing when silent to 1 in 10 frames (every.64 seconds)

12 Setting Buffer Sizes (1)Buffer for Markov – after decision tree classifier (buffer size of 5 is optimal) (2)MFCC frame length (second stage, classifier mimics human ear)

13 Evaluation CPU Total: SoundSense + other iPhone system software max was <60% CPU Memory context: iPhone allows 30 MB memory per app

14 Evaluation: Classification Performance Evaluate data separate from training. Each clip annotated by hand w/label.

15 Gender Classifier…

16 Evaluation: Classification Now add Markov…

17 SoundRank – Top Events Training: users wear iPhone around their neck (!!) for several days – 1 hr of sound a day

18 App: Audio Daily Diary Goal: –users can find out how much time they spend doing different things. Implementation: –Sound is continuously sampled. –Data from one participant over two weeks…

19 App: Audio Daily Diary

20 App: Music Detector Goal: –Crowdsource data collection on nearby music (participatory sensing using audio) Implementation: –When music detected, prompt users to take a photo for community website

21 Reflections Overall, their framework proposal makes sense – mixing some pre-training, with some automated learning. Feature extraction uses 8 methods – with some literature to support, but not fully convincing argument why 8 (not 7 or 9, or which). Two sample apps are well-chosen –Memory footprint is fine, but CPU consumption unacceptable (should have been discussed more). Of course, this will pass with time. Big Q: when will this be integrated into the mobile infrastructure, e.g., in operating system, or on its own chip? –Better “ground truth” needed, e.g., is classifier picking up car idle, or just when in motion? –Compared to “post-mortems,” few insights from the apps in terms of the design trade-offs others will likely encounter Do differently? Add the ability for user to intervene to compensate for missed opportunities, e.g., doorbell sound They claim to contribute a “Framework” with architecture and algorithms – but their code is private; most of value lost?

22 Find out more… SoundSense group online: http://metrosense.cs.dartmouth.edu/proje cts.html http://metrosense.cs.dartmouth.edu/proje cts.html

23 Extension Opportunity: Test a voice-recognition game: http://commandthebridge.com/ (3pm this Saturday @ Zemeckis) http://commandthebridge.com/


Download ppt "“SoundSense: Scalable Sound Sensing for People-Centric Applications on Mobile Phones” Authors: Hong Lu, Wei Pan, Nicholas D. Lane, Tanzeem Choudhury and."

Similar presentations


Ads by Google