Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of Sheffield {j.eggink

Slides:



Advertisements
Similar presentations
© 2010 Pearson Education, Inc. Conceptual Physics 11 th Edition Chapter 21: MUSICAL SOUNDS.
Advertisements

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Int 2 Multimedia Revision. Digitised Sound Analogue sound recorded from person, or real instruments.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Timbre perception. Objective Timbre perception and the physical properties of the sound on which it depends Formal definition: ‘that attribute of auditory.
What makes a musical sound? Pitch n Hz * 2 = n + an octave n Hz * ( …) = n + a semitone The 12-note equal-tempered chromatic scale is customary,
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Identifying Frequencies. Terms: LoudnessPitchTimbre.
Pitch Perception.
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
Sept. 6/11. - Sound Sounds may be perceived as pleasant or unpleasant. What are these sounds that we hear? What is "sound"? What causes it, and how do.
SIMS-201 Characteristics of Audio Signals Sampling of Audio Signals Introduction to Audio Information.
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
Sound Chapter 13.
Adapted representations of audio signals for music instrument recognition Pierre Leveau Laboratoire d’Acoustique Musicale, Paris - France GET - ENST (Télécom.
Soundprism An Online System for Score-informed Source Separation of Music Audio Zhiyao Duan and Bryan Pardo EECS Dept., Northwestern Univ. Interactive.
Pitch Recognition with Wavelets Final Presentation by Stephen Geiger.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
Guitar Trainer Adam Janke CS 470 Final Presentation.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Error Propagation. Uncertainty Uncertainty reflects the knowledge that a measured value is related to the mean. Probable error is the range from the mean.
DEVON BRYANT CS 525 SEMESTER PROJECT Audio Signal MIDI Transcription.
EE2F2 - Music Technology 10. Sampling Early Sampling It’s not a real orchestra, it’s a Mellotron It works by playing tape recordings of a real orchestra.
© 2010 Pearson Education, Inc. Conceptual Physics 11 th Edition Chapter 21: MUSICAL SOUNDS Noise and Music Musical Sounds Pitch Sound Intensity and Loudness.
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Harmonics, Timbre & The Frequency Domain
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Harmonically Informed Multi-pitch Tracking Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio Lab,
1 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI LANGUAGE AND INTELLIGENCE U N I V E R S I T Y O F P I S A DEPARTMENT OF COMPUTER SCIENCE Automatic.
Lecture 1 Signals in the Time and Frequency Domains
Instrument Recognition in Polyphonic Music Jana Eggink Supervisor: Guy J. Brown University of Sheffield
A Time Based Approach to Musical Pattern Discovery in Polyphonic Music Tamar Berman Graduate School of Library and Information Science University of Illinois.
Standing Waves When an incident wave interferes with a reflected wave to form areas of constructive and destructive interference. When an incident wave.
Beats and Tuning Pitch recognition Physics of Music PHY103.
Subjective Sound Quality Assessment of Mobile Phones for Production Support Thorsten Drascher, Martin Schultes Workshop on Wideband Speech Quality in Terminals.
The Care and Feeding of Loudness Models J. D. (jj) Johnston Chief Scientist Neural Audio Kirkland, Washington, USA.
Student: Mike Jiang Advisor: Dr. Ras, Zbigniew W. Music Information Retrieval.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Rhythmic Transcription of MIDI Signals Carmine Casciato MUMT 611 Thursday, February 10, 2005.
Informing Multisource Decoding for Robust Speech Recognition Ning Ma and Phil Green Speech and Hearing Research Group The University of Sheffield 22/04/2005.
Sound Representation Digitizing Sound Sound waves through Air Different Voltages Voltage converted to numbers.
A statistical test for point source searches - Aart Heijboer - AWG - Cern june 2002 A statistical test for point source searches Aart Heijboer contents:
Instrument Classification in a Polyphonic Music Environment Yingkit Chow Spring 2005.
The Elements of Music.
Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.
Audio Tempo Extraction Presenter: Simon de Leon Date: February 9, 2006 Course: MUMT611.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Chapter 21 Musical Sounds.
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
Automatic Transcription System of Kashino et al. MUMT 611 Doug Van Nort.
Pitch What is pitch? Pitch (as well as loudness) is a subjective characteristic of sound Some listeners even assign pitch differently depending upon whether.
1 Automatic Music Style Recognition Arturo Camacho.
The Overtone Series Derivation of Tonic Triad – Tonal Model Timbre
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Zhiyao Duan, Changshui Zhang Department of Automation Tsinghua University, China Music, Mind and Cognition workshop.
12-3 Harmonics.
Elements of Classical Period. Elements Transition to classical period: (pre-classical period) Shift to more homophonic textures. Pioneers in.
BAROQUE AND CLASSICAL CHAMBER MUSIC – AOS2. This lesson… All of you will be able to name some features of Baroque and Classical Chamber music. All of.
A shallow description framework for musical style recognition Pedro J. Ponce de León, Carlos Pérez-Sancho and José Manuel Iñesta Departamento de Lenguajes.
Automatic Transcription of Polyphonic Music
CS 591 S1 – Computational Audio
MECH 373 Instrumentation and Measurements
EE513 Audio Signals and Systems
What is Sound?
Harmonically Informed Multi-pitch Tracking
Understanding Standards An overview of course assessment
The Classical Era Copyright © Frankel Consulting Services, Inc.
Understanding Standards: An overview of course assessment
Presentation transcript:

Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of Sheffield {j.eggink

Melody Extraction from Complex Audio2 / 16Jana Eggink, Sheffield, UK Task Extract the melody line from an audio recording flute Useful for: automatic music indexing and analysis, detection of copyright infringement, ‘query-by-humming’ systems... No clear definition of what is perceived as a melody by humans Working definition: F0s played by the solo instrument in accompanied sonatas and concertos Solo instrument is not necessarily always loudest F0 Therefore: include information about the instrument by which a specific F0 was produced

Melody Extraction from Complex Audio3 / 16Jana Eggink, Sheffield, UK Task I: Identify Solo Instrument Instrument sounds are harmonic, energy is concentrated in partials... flute clarinet oboe violin cello audio signal recog- niser features F0 and partials... which are least likely to be masked by other sounds Features based only on frequency position and power of lowest 15 partials Statistical recogniser (GMMs) trained on monophonic music

Melody Extraction from Complex Audio4 / 16Jana Eggink, Sheffield, UK Identify Solo Instrument Features Exact frequency position and normalised log-compressed power of first 15 partials   partials frequency (Hz)power (dB) Frame to frame differences (deltas and delta-deltas) within tones of continuous F0

Melody Extraction from Complex Audio5 / 16Jana Eggink, Sheffield, UK Results I Instrument Identification 94%6%0% 12%88%0% 18%82%0% 6%0%88%6% 0%25%0% 75% cello violin oboe clarinet flute celloviolinoboeclarinetflute response stimulus Solo instrument with accompaniment (piano or orchestra), commercially available CDs, 90 examples, 2-3 min. each Instrument 86% correct

Melody Extraction from Complex Audio6 / 16Jana Eggink, Sheffield, UK But... Estimated F0s not very accurate (as judged by manual inspection) Overall instrument classification very good, but only when averaged over a whole sound file, results not very accurate on a note-by-note or frame-by-frame basis More information is needed to find the melody!

Melody Extraction from Complex Audio7 / 16Jana Eggink, Sheffield, UK Task II: Find Melody (assuming the solo instrument is known) Extract multiple F0 candidates TEMPORAL KNOWLEDGE tone length interval transitions AUDIO F0 candidates find most likely ‘path’ through time- frequency space of F0 candidates F0 strength (~loudness) F0 likelihood (absolute frequency | instrument range) instrument likelihood (recogniser output) LOCAL KNOWLEDGE silence estimation (only accompaniment?) MELODY Include additional knowledge about instrument range, tone duration, likely interval transitions to pick correct candidate

Melody Extraction from Complex Audio8 / 16Jana Eggink, Sheffield, UK time frequency Knowledge Integration (Path Finding) Possible melody paths restricted by longer tones of continuous F0 All knowledge sources are normalised to equal mean and standard deviation Knowledge sources are summed along the current path N-best search for most likely path

Melody Extraction from Complex Audio9 / 16Jana Eggink, Sheffield, UK ‘Silence’ Estimation Solo instrument is not always continuously playing Use likelihoods for solo instrument along the estimated path Present threshold: median of likelihood values for solo instrument (assuming the solo instrument is present at least 50% of the time) Silent threshold: mean of likelihood values over all instruments Assign whole tones according to proximity to present/silent threshold and the state of their neighbours Impose minimum length on ‘present’ sections

Melody Extraction from Complex Audio10 / 16Jana Eggink, Sheffield, UK Evaluation: Test Material Realistic recordings do not provide information about ‘true’ F0s, even scores only approximation Use MIDI generated audio Real instrument samples, but only 3-4 per octave, provided by the sampler software 10 examples, for every solo instrument one piece with piano accompaniment, one with orchestra Solo instrument and accompaniment mixed at 0dB SNR Whole movements (or first 3 minutes) to ensure sufficient presence of the solo instrument; mixture of different styles and tempi

Melody Extraction from Complex Audio11 / 16Jana Eggink, Sheffield, UK Results F0 Estimation Comparing F0s estimated using harmonic sieves to search for prominent harmonic series with simply picking the highest spectral peak shows no advantage of the former 95%84%94%98%99%98%15 76%64%61%78%94%82%3 52%38%28%48%78%70% 1 (strongest) averagecelloviolinoboeclarinetfluteF0 candidates (based solely on sections were the solo instrument is present) Very unexpected, but might be caused by the very rich mixture of harmonically related tones, initial results show that other algorithms that search for harmonic series like e.g. YIN (autocorrelation based) do not do well either

Melody Extraction from Complex Audio12 / 16Jana Eggink, Sheffield, UK Results Instrument Identification Solo instrument without accompaniment: all examples correct, except one oboe mistaken for a flute Solo instrument with accompaniment: violin and cello still correct, but performance for woodwinds approaching random, even with true F0s provided Possible reasons: Sample-based music might be harder to identify, as it provides less instrument specific variation like e.g. vibrato Mixing level might be unfavourable with worse SNR than in realistic recordings Frequency regions that are dominated by the accompaniment might differ between realistic recordings and MIDI based audio

Melody Extraction from Complex Audio13 / 16Jana Eggink, Sheffield, UK Results Melody Extraction Baseline performance only strongest F0, no other knowledge 135% 76% 51% path 117% 72% 54% path+silence 321% 78% 40% strongest F0 spurious tones tones found correct frames Number of correct frames improved by 14%, with the number of spurious tones reduced to nearly a third, leading to significantly smoother melody lines Path finding and especially silence estimation likely to suffer from poor instrument identification performance with MIDI based audio

Melody Extraction from Complex Audio14 / 16Jana Eggink, Sheffield, UK Realistic Example Melody based on strongest F0 time (frames) F0 (Hz) Beginning of Mozart’s Clarinet Concerto, taken from a CD recording, manually annotated F0s (gray) and estimated melody (black) F0 (Hz) Melody based on knowledge integrating path finding time (frames)

Melody Extraction from Complex Audio15 / 16Jana Eggink, Sheffield, UK Conclusions and Future Work Audio generated from MIDI not necessarily good test material! Two short manually annotated realistic examples 10%-15% more correct frames than equivalent MIDI based examples Further work concentrating on realistic examples, requires manual labeling, or Automatic alignment of MIDI data to real recordings?!

Melody Extraction from Complex Audio16 / 16Jana Eggink, Sheffield, UK The End Any Questions?