Audio Fingerprinting Wes Hatch MUMT-614 Mar.13, 2003.

Slides:



Advertisements
Similar presentations
[1] AN ANALYSIS OF DIGITAL WATERMARKING IN FREQUENCY DOMAIN.
Advertisements

Matthias Gruhne, Page 1 Fraunhofer Institut Integrierte Schaltungen Robust Audio Identification for Commercial Applications Matthias.
T.Sharon-A.Frank 1 Multimedia Compression Basics.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Information Sources And Signals
Chapter 4: Representation of data in computer systems: Sound OCR Computing for GCSE © Hodder Education 2011.
Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.
DIGITAL COMMUNICATIONS.  The modern world is dependent on digital communications.  Radio, television and telephone systems were essentially analog in.
Extracting Noise-Robust Features from Audio Data Chris Burges, John Platt, Erin Renshaw, Soumya Jana* Microsoft Research *U. Illinois, Urbana/Champaign.
FINGER PRINTING BASED AUDIO RETRIEVAL Query by example Content retrieval Srinija Vallabhaneni.
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
School of Computing Science Simon Fraser University
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
T.Sharon-A.Frank 1 Multimedia Image Compression 2 T.Sharon-A.Frank Coding Techniques – Hybrid.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Representing Acoustic Information
Representation of Data in Computer Systems
Fundamentals of Digital Communication
Fundamentals Rawesak Tanawongsuwan
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Basics of Signal Processing. SIGNALSOURCE RECEIVER describe waves in terms of their significant features understand the way the waves originate effect.
Compression is the reduction in size of data in order to save space or transmission time. And its used just about everywhere. All the images you get on.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
CSC361/661 Digital Media Spring 2002
COMMUNICATION SYSTEM EEEB453 Chapter 5 (Part IV) DIGITAL TRANSMISSION.
CMPD273 Multimedia System Prepared by Nazrita Ibrahim © UNITEN2002 Multimedia System Characteristic Reference: F. Fluckiger: “Understanding networked multimedia,
Signal Digitization Analog vs Digital Signals An Analog Signal A Digital Signal What type of signal do we encounter in nature?
Audio Fingerprinting MUMT 611 Ichiro Fujinaga McGill University.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Basics of Neural Networks Neural Network Topologies.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
COMP135/COMP535 Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 2 Lecture 2 – Digital Representations.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Audio Fingerprinting MUMT 611 Philippe Zaborowski March 2005.
Audio Fingerprinting Overview: RARE Algorithms, Resources Chris Burges, John Platt, Jon Goldstein, Erin Renshaw
ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
Naifan Zhuang, Jun Ye, Kien A. Hua
GCSE COMPUTER SCIENCE Topic 3 - Data 3.3 Data Storage and Compression.
Binary Notation and Intro to Computer Graphics
Data Compression.
A review of audio fingerprinting (Cano et al. 2005)
ARTIFICIAL NEURAL NETWORKS
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Digital Communication
Overview Communication is the transfer of information from one place to another. This should be done - as efficiently as possible - with as much fidelity/reliability.
Data Compression.
Audio Fingerprinting Wes Hatch MUMT-614 Mar.13, 2003.
Musical Style Classification
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Govt. Polytechnic Dhangar(Fatehabad)
Advances in Deep Audio and Audio-Visual Processing
Presentation transcript:

Audio Fingerprinting Wes Hatch MUMT-614 Mar.13, 2003

What is Audio Fingerprinting? a small, unknown segment of audio data (it can be as short as just a couple of seconds) is used to identify the original audio file from which it came

Applications Broadcast monitoring playlist generation royalty collection ad verification Connected Audio general term for consumer applications Other Napster--use of fingerprinting systems to prohibit the transmission of copywritten materials Finding desired content efficiently in “an overwhelming amount of audio material”

“Benefits” Automated search of illegal content on the Internet –examines the real audio information rather than just tag information For the consumer –make the meta-data of songs in a library consistent, allowing for easy organization –can guarantee that what is downloaded is actually what it says it is –will allow consumer to record signatures of sound and music on small handheld devices

Two principle components Compute the fingerprint Compare it to a database of previously computed fingerprints –A text example: “…in a box. I will not eat them with a fox. I…”

Details to worry about Robustness (to noise, distortion) Reliability Fingerprint size (reduced dimensionality) Granularity Search speed and scalablity Computationally efficient Resulting features must be informative about the audio content Semantic or non-semantic features? Hash table or vector representation?

Computing the fingerprint Compare to hash functions…? –compare computed hash value with that stored in a database Drawback –need to worry about perceptual similarity and not mathematical similarity PCM audio vs. MP3: both sound alike but mathematically (i.e. spectral content) are quite different –perceptual similarity is not transitive not possible to design a system which computes mathematical fingerprints for perceptually similar objects

Techniques (general) Any ‘x’ number of seconds may be used to compute the fingerprint Audio gets separated into frames –Features computed for each frame: Fourier coefficients MFCC, LPC Spectral flatness sharpness “features mapped into a more compact representation by using …HMM, or quantization”

Techniques (Haitsma, Kalker) one 32-bit sub-fingerprint every 11.6 ms –A block consists of 256 sub-fingerprints Corresponds to a granularity of only 3 seconds –Large overlap (31/32), so subsequent sub- fingerprints are similar and vary slowly in time –worst-case scenario: the frame boundaries used during identification are 5.8 ms off with those in database

Techniques (Haitsma, Kalker) Data from each frame is sent through a filterbank –33 filters, logarithmically spaced (to correspond roughly to the Bark scale) between 300 and 2000Hz –phase is neglected (perceptual reasons)

System overview

Techniques (Burges, Platt) downsampled to kHz, split into frames with overlap of 2 –MCLT is then applied to each frame. A 128-sample log spectrum is generated by taking the log modulus of each MCLT coefficient

Techniques (Burges, Platt) Use prior knowledge to define form of the feature extractor Features computed by a “linear, convolutional” neural network convert signal into a feature vector –uses Pattern Classification and Scene Analysis (PCA) to find a set of projections –generates a vector of 128 values for every 11.6ms interval dimensional-reduction method (i.e. lots of math)

Techniques (Burges, Platt) 3 layers of Oriented PCA (OPCA) –operates on a frame of 128 values layer 1: generates 10 values for each frame layer 2: takes 42 ‘layer 1 outputs’ and produces 20 values layer 3: takes 40 ‘layer 2 outputs’ and produces 64 values (11K inputs --> 64 outputs)

Searching the Database Look for the most similar (not necessarily exact) fingerprint –10,000 5-min. songs  250 million sub- fingerprints –brute force takes in excess of 20 minutes on a very fast PC brute force computes bit-error rate for every possible position in the database

Searching the Database make assumption that at least 1 (of the 256) sub-fingerprints are error- free –then, use a hash table (as opposed to more memory-intensive look-up table) –800,000 times faster

Results false-positive rate of 3.6x10-2 (Haitsma, Kalker) On tests with a large (500,000) set of input traces –has a “low” false-positive and false-negative rate. (Burges, Platt) –didn’t test on time compression, expansion can withstand distortions occurring from transmission over mobile phones.