Real-time Recognition of Whale Calls using SoundID Neil J Boucher, SoundID. Australia Michihiro Jinnai, Nagoya Women’s University. Japan
SoundID www.soundid.net Founded 2002 Engineers specialising in sound recognition Aim to produce human expert and better quality recognition Designed to run in real-time Designed to handle terabytes of data with ease Have our own patented methods Who Are We?
1-d Recognition Results monitoring a Dawn Chorus over 70 minutes
The 1-d method works well for most bird calls as just shown. Frogs and bats also easy. Whales are harder because the 1/f noise and other background noises “smear” the 1-d image. Worse still, most of the whale noises are in-band. Birds are Easy, Whales are Hard
The “Bottom of the Coke Bottle” or “FFT-like” View of the Same Call at Frame-Width 513 Points
Euclidean Distance is the simple straight line (first order) distance between two points. Geometric Distance (measured in angular degrees) is the angle between two vectors that we use to describe the difference in shape. It is a fourth order measure, vaguely related to the skew of a standard distribution. Geometric Distance
As a measure of similarity we can use the GD between two n-dimensional shapes to measure their similarity. Two identical shapes have a GD = zero Two totally dissimilar shapes have a GD of 90 degrees. The spectra of two sounds that sound “similar” to the human ear can have a GD in the range 0 to about 6 degrees. The Measure of Geometric Distance
Once we have a library of reference sounds we can compare that library with the detected target sound. Each sound in the library is compared in turn with the target and the one with the lowest GD is declared the closest match. Then, if the lowest GD is small enough to be declared a match (typically GD<6 degrees) the target has been identified! Value of Geometric Distance
Notice now that the bigger the reference library the better the chance of having a matching reference and the greater the accuracy. Notice also that the reference library is not restricted to sounds that are in some way related and any group of sounds can comprise part of the library. More is Better
Comparison of Two Adjacent Calls. Note no Filter used or Needed!
Same Calls Zoomed In for a Closer Look. Note the smaller GD Freq x 3.3
A Cluster Analysis Based on GD = 6 degrees. Note 6 clusters!
Two Sweeps that are not Clustered (672 and 438 seconds) 672 x 3.3f 438 x 3.3f
Running a cluster of 526 Gunshots each with each other results in 275,625 comparisons being made. Sorting these into GD = 6 Clusters, we find they fall into 64 groups. A few are unique with only one member while large clusters of 120 or more are also found. Analysis of 524 Gunshot Sounds from File 20080126
An Example of Two Very Different Gunshots Identified with the Cluster Analysis 161500-58796 23450-863375
The 2-d Spectrum recognition module is complete and tested. It works well and can match a human expert for accuracy. The speed of recognition has been found to be fast enough for real-time recognition. The detection module is under development. Release date is late 2013. Progress to Date