Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.

Similar presentations


Presentation on theme: "Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski."— Presentation transcript:

1 Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA December 1, 2004

2 Overview Motivations for bat acoustic research Review bat call classification methods Contrast with 1970s human ASR –Machine learning vs. expert knowledge Experiments Conclusions and future work

3 Bat research motivations Bats are among: –the most diverse (25% of all mammal species), –the most endangered, –and the least studied mammals. Close relationship with insects –agricultural impact –disease vectors Acoustical research –non-invasive (compared to netting) –significant domain (echolocation)

4 More motivations Calls simple compared to human speech Same goals as human ASR –Detection –Feature extraction –Classification –Noise-robust performance Easier to design/develop models Domain between toy problems and ASR

5 Bat echolocation Ultrasonic, brief chirps (~active sonar) Determine range, velocity of nearby objects (clutter, prey, other bats) Tailored for task, environment Tadarida brasiliensis (Mexican free-tailed bat) Listen to 10x time-expanded search calls:

6 Echolocation calls Two characteristics –Frequency modulated (range information) –Constant frequency (velocity information) Features (holistic) –Freq. extrema –Duration –Shape –# harmonics –Call interval Mexican free-tailed calls, concatenated

7 Current classification methods Expert sonogram readers –Manual or automatic feature extraction Griffin 1958, Fenton and Bell 1981 –Comparison with exemplar sonograms –Decision trees Automatic classification –Discriminant function analysis By far the most popular method in literature Available in statistical software packages (SAS, SPSS) –Others Artificial neural networks, Parsons 2001 Spectrogram correlation, Pettersson Elektronik AB Parallels the 1970s acoustic-phonetic approach to human ASR.

8 Acoustic phonetics Bottom up paradigm –Frames, boundaries, groups, phonemes, words –Mimics techniques of expert spectrogram readers Manual or automatic feature extraction –Formants, voicing, duration, intensity, transitions Classification –Decision tree, discriminant functions, neural network, Gaussian mixture model, Viterbi path DH AH F UH T B AO L G EY EM IH Z OW V ER

9 Acoustic phonetics limitations Variability of conversational speech –Complex rules, difficult to train Boundaries difficult to define –Coarticulation, reduction Feature estimates brittle –Variable noise robustness Hard decisions, errors accumulate Shifted to machine learning paradigm of human ASR by 1980s: better able to account for variability of speech, noise.

10 Machine learning ASR Data-driven models –Non-parametric: dynamic time warp (DTW) –Parametric: hidden Markov model (HMM) Frame-based –Identical features from every frame –Expert information in feature extraction –Models account for feature, temporal variabilities Machine learning dominates state-of-the-art ASR.

11 Data collection UF Bat House, home to 60,000 bats –Mexican free-tailed bat (vast majority) –Evening bat –Southeastern myotis Continuous recording –90 minutes around sunset –~20,000 calls Equipment: –B&K mic (4939), 100 kHz –B&K preamp (2670) –Custom amp/AA filter –NI 6036E 200kS/s A/D card –Laptop, Matlab –Portable

12 Experiment design Hand labels as ground truth –Narrowband spectrogram –436 calls (2% of data) in 3 hours (80x real time) –Four classes, a priori: 34, 40, 20, 6% –All experiments on hand-labeled data only –No hand-labeled calls excluded from experiments 1 2 3 4

13 Methods Baseline, from the literature –Features Duration Zero crossing: Fmin, Fmax, Fmax_energy MUSIC super resolution frequency estimator –Classifier Discriminant function analysis, quadratic boundaries DTW and HMM –Features Frequency (MUSIC), log energy, Δs (HMM only) –HMM 5 states/model 4 Gaussian mixtures/state, diagonal covariances Tests –Leave one out –Repeated trials: 25% test data, 1000 trials –Test on train data (HMM only)

14 Results Baseline, zero crossing –Leave one out: 72.5% correct –Repeated trials: 72.5 ± 4% (mean ± std) Baseline, MUSIC –Leave one out: 79.1% –Repeated trials: 77.5 ± 4% DTW –Leave one out: 74.5 % –Repeated trials: 74.1 ± 4% HMM –Test on train: 85.3 %

15 Confusion matrices 1234 1107381272.3% 22113416476.6% 322957064.8% 44301872.0% 72.5% Baseline, zero crossingBaseline, MUSIC DTWHMM 1234 1110361174.3% 21214912285.1% 341866075.0% 43202080.0% 79.1% 1234 1115290477.7% 23213111174.9% 352063071.6% 45401664.0% 74.5% 1234 1118250579.7% 2101545688.0% 311275085.2% 400025100% 85.3%

16 Comments Experiments –Weakness: accuracy of class labels –No labeled calls excluded, realistic –HMM most accurate, but undertrained –MUSIC frequency estimate robust, but 1000x slower than ZCA (20x real time) Machine learning –Expert information still necessary Feature extraction (dimensionality reduction) Model parameters –DTW: fast training, slow classification –HMM: slow training, fast classification (real time)

17 Future work Ultimate goal –Real-time portable system for species ID –Commercial product possibilites Feature extraction –Robust Broadband noise Echos Unknown distance between bat and microphone –Chirp model, echo model –Faster frequency estimates –Match assumptions of classifiers

18 More future work Detection –Replace energy-based method with principled statistical methods using frame- based features Classification –Accurate class labels for training Netting Record from known bat roosts (preferred) –Pseudo-sinusoidal input Oscillator network Echo state network

19 Information markskow@cnel.ufl.edu http://www.cnel.ufl.edu/~markskow


Download ppt "Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski."

Similar presentations


Ads by Google