Audio Retrieval David Kauchak cs458 Fall 2012. Administrative Assignment 4 Two parts Midterm Average:52.8 Median:52 High:57 In-class “quiz”: 11/13.

Slides:



Advertisements
Similar presentations
Adam Diel.  In 1981 IBM PC 150 introduced the first PC Speaker.  Each game had to write support for it (sound cards were impractical during this time)
Advertisements

Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
Digital Signal Processing
Using Multimedia on the Web Enhancing a Web Site with Sound, Video, and Applets.
CNIT 132 – Week 9 Multimedia. Working with Multimedia Bandwidth is a measure of the amount of data that can be sent through a communication pipeline each.
4.1Different Audio Attributes 4.2Common Audio File Formats 4.3Balancing between File Size and Audio Quality 4.4Making Audio Elements Fit Our Needs.
Dale & Lewis Chapter 3 Data Representation Analog and digital information The real world is continuous and finite, data on computers are finite  need.
Part A Multimedia Production Rico Yu. Part A Multimedia Production Ch.1 Text Ch.2 Graphics Ch.3 Sound Ch.4 Animations Ch.5 Video.
Chapter 4: Representation of data in computer systems: Sound OCR Computing for GCSE © Hodder Education 2011.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Applications in Signal and Image Processing
I Power Higher Computing Multimedia technology Audio.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Department of Computer Science University of California, San Diego
Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification.
Audio Retrieval David Kauchak cs160 Fall 2009 Thanks to Doug Turnbull for some of the slides.
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Time and Frequency Representations Accompanying presentation Kenan Gençol presented in the course Signal Transformations instructed by Prof.Dr. Ömer Nezih.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Copyright Nov. 2002, George Tzanetakis Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
1 Music Classification Using SVM Ming-jen Wang Chia-Jiu Wang.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Digital Audio Multimedia Systems (Module 1 Lesson 1)
Image Processing David Kauchak cs458 Fall 2012 Empirical Evaluation of Dissimilarity Measures for Color and Texture Jan Puzicha, Joachim M. Buhmann, Yossi.
Introduction to Sound Sounds are vibrations that travel though the air or some other medium A sound wave is an audible vibration that travels through.
Digital Sound and Video Chapter 10, Exploring the Digital Domain.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Introduction to Interactive Media 10: Audio in Interactive Digital Media.
COMP Representing Sound in a ComputerSound Course book - pages
Multiresolution STFT for Analysis and Processing of Audio
Media Representations - Audio
The Story of Wavelets.
AUDIO MEDIA 1 Created } “Borrowed” } Microphone MIDI keyboard CD’s & flash drives Internet Audio Sources 2.
Signal Digitization Analog vs Digital Signals An Analog Signal A Digital Signal What type of signal do we encounter in nature?
COSC 1P02 Introduction to Computer Science 4.1 Cosc 1P02 Week 4 Lecture slides “Programs are meant to be read by humans and only incidentally for computers.
Overview of Multimedia A multimedia presentation might contain: –Text –Animation –Digital Sound Effects –Voices –Video Clips –Photographic Stills –Music.
Creating Web Documents alt attribute Good and bad uses of ‘multimedia’ Sound files Homework: Discuss with me AND post announcement of Project II. Forms.
Chapter 15 Recording and Editing Sound. 2Practical PC 5 th Edition Chapter 15 Getting Started In this Chapter, you will learn: − How sound capability.
Wavelet transform Wavelet transform is a relatively new concept (about 10 more years old) First of all, why do we need a transform, or what is a transform.
ECE472/572 - Lecture 13 Wavelets and Multiresolution Processing 11/15/11 Reference: Wavelet Tutorial
Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
Sound in Multimedia Psychology of sound what do you use it for? what techniques for its communication exist? Science of sound why does it exist? how it.
Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.
MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.
CSCI-100 Introduction to Computing Hardware Part II.
Section 1 Section 2Section 3Section 4Chapter 2.
Chapter 1 Background 1. In this lecture, you will find answers to these questions Computers store and transmit information using digital data. What exactly.
Fourier and Wavelet Transformations Michael J. Watts
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 4 – Audio and Digital Image Representation Klara Nahrstedt Spring 2010.
Introduction to Digital Audio
III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC.
Multimedia: Digitised Sound Data
Introduction to Digital Audio
Fourier and Wavelet Transformations
Multi-resolution analysis
Introduction to Digital Audio
Introduction to Digital Audio
Wavelet transform Wavelet transform is a relatively new concept (about 10 more years old) First of all, why do we need a transform, or what is a transform.
Introduction to Digital Audio
III Digital Audio III.6 (Mo Oct 22) The MP3 algorithm with PAC.
Govt. Polytechnic Dhangar(Fatehabad)
Introduction to Digital Audio
Digital Audio Application of Digital Audio - Selected Examples
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

Audio Retrieval David Kauchak cs458 Fall 2012

Administrative Assignment 4 Two parts Midterm Average:52.8 Median:52 High:57 In-class “quiz”: 11/13

Audio retrieval text retrieval corpus audio retrieval corpus

What do you want from an audio search engine? Name: You might know the name of the song or the artist Genre: You might try “Bebop,” “Latin Jazz,” or “Rock” Instrumentation: The tenor sax, guitar, and double bass are all featured in the song Emotion: The song has a “cool vibe” that is “upbeat“ with an “electric texture” Some other approaches to search: musicovery.com pandora.com (song similarity) Genius (collaborative filtering)

Current audio search engines What are they? What can you search by? How well do they work? How could they been improved? Challenges?

Text Index construction Documents to be indexed Friends, Romans, countrymen. indexer Inverted index friend roman countryman text preprocessing friend, roman, countrymen.

Audio Index construction Audio files to be indexed indexer Index audio preprocessing wav midi mp3 slow, jazzy, punk may be keyed off of text may be keyed off of audio features Today

Sound What is sound? A longitudinal compression wave traveling through some medium (often, air) Rate of the wave is the frequency You can think of sounds as a sum of sign waves

Sound How do people hear sound? The cochlea in the inner ear has hair cells that "wiggle" when certain frequency are encountered

Digital Encoding Like everything else for computers, we must represent audio signals digitally Encoding formats: WAV MIDI MP3 Others…

WAV Simple encoding Sample sound at some interval (e.g. 44 KHz). High sound quality Large file sizes

MIDI Musical Instrument Digital Interface MIDI is a language Sentences describe the channel, note, loudness, etc. 16 channels (each can be thought of and recorded as a separate instrument) Common for audio retrieval and classification applications

MP3 Common compression format 3-4 MB vs MB for uncompressed Perceptual noise shaping The human ear cannot hear certain sounds Some sounds are heard better than others The louder of two sounds will be heard Lossy or lossless? Lossy compression quality depends on the amount of compression like many compression algorithms, can have issues with randomness (e.g. clapping)

MP3 Example

Features Weight vectors - word frequency - count normalization - idf weighting - length normalization ?

Tools for Feature Extraction Fourier Transform (FT) Short Term Fourier Transform (STFT) Wavelets

Fourier Transform (FT) Time-domain Frequency-domain

Another FT Example Time Frequency

Problem? 

Problem with FT FT contains only frequency information No time information is retained Works fine for stationary signals Non-stationary or changing signals cause problems FT shows frequencies occurring at all times instead of specific times Ideas?

Short-Time Fourier Transform (STFT) Idea: Break up the signal into discrete windows Treat each signal within a window as a stationary signal Take FT over each part …

STFT Example amplitude frequency time

STFT Example

Problem: Resolution How do we pick the window size? Trade-offs? We can vary time and frequency accuracy Narrow window: good time resolution, poor frequency resolution Wide window: good frequency resolution, poor time resolution

Varying the resolution Ideas?

Wavelets Wave Wavelets

Wavelets Wavelets respond to signals that are similar

Wavelet response A wavelet responds to signals that are similar to the wavelet ?

Wavelet response Scale matters! ?

Wavelet Transform Idea: Take a wavelet and vary scale Check response of varying scales on signal

Wavelet Example: Scale 1

Wavelet Example: Scale 2

Wavelet Example: Scale 3

Wavelet Example Scale = 1/frequency Translation  Time

Discrete Wavelet Transform (DWT) Wavelets come in pairs (high pass and low pass filter) Split signal with filter and downsample

DWT cont. Continue this process on the low frequency portion of the signal

DWT Example signal low frequency high frequency

How did this solve the resolution problem? Higher frequency resolution at high frequencies Higher time frequency at low frequencies

Feature Extraction All these transforms help us understand how the frequencies changes over time Features extraction: Mel-frequency cepstral coefficients (MFCCs) Attempt to mimic human ear Surface features (texture, timbre, instrumentation) Capture frequency statistics of STFT Rhythm features (i.e the “beat”) Characteristics of low-frequency wavelets

rock, hip-hop, classical or jazz?

Music Classification Data Audio collected from radio, CDs and Web Speech vs. music Genres: classic, country, hiphop, jazz, rock 4-types of classical music 50 samples for each class, 30 sec. long Task is to predict the genre of the clip Approach Extract features Learn genre classifier

Music Classification Data Audio collected from radio, CDs and Web Speech vs. music Genres: classic, country, hiphop, jazz, rock 4-types of classical music 50 samples for each class, 30 sec. long Task is to predict the genre of the clip How well do you think we can do?

General Results Music vs. Speech GenresClassical Random50%16%25% Classifier86%62%76%

Results: Musical Genres ClassicCountryDiscoHiphopJazzRock Classic Country Disco Hiphop Jazz Rock Pseudo-confusion matrix

Results: Classical Confusion matrix ChoralOrchestralPianoString Choral Orchestral05325 Piano String017780

Thanks Robi Polikar for his old tutorial (