Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of Sheffield {j.eggink

Similar presentations


Presentation on theme: "Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of Sheffield {j.eggink"— Presentation transcript:

1

2 Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of Sheffield {j.eggink g.brown}@dcs.shef.ac.uk

3 Melody Extraction from Complex Audio2 / 16Jana Eggink, Sheffield, UK Task Extract the melody line from an audio recording flute Useful for: automatic music indexing and analysis, detection of copyright infringement, ‘query-by-humming’ systems... No clear definition of what is perceived as a melody by humans Working definition: F0s played by the solo instrument in accompanied sonatas and concertos Solo instrument is not necessarily always loudest F0 Therefore: include information about the instrument by which a specific F0 was produced

4 Melody Extraction from Complex Audio3 / 16Jana Eggink, Sheffield, UK Task I: Identify Solo Instrument Instrument sounds are harmonic, energy is concentrated in partials... flute clarinet oboe violin cello audio signal recog- niser features F0 and partials... which are least likely to be masked by other sounds Features based only on frequency position and power of lowest 15 partials Statistical recogniser (GMMs) trained on monophonic music

5 Melody Extraction from Complex Audio4 / 16Jana Eggink, Sheffield, UK Identify Solo Instrument Features Exact frequency position and normalised log-compressed power of first 15 partials...+1+30...0+1 ...-3+50...+5+2 ...445060...658442220 partials frequency (Hz)power (dB) Frame to frame differences (deltas and delta-deltas) within tones of continuous F0

6 Melody Extraction from Complex Audio5 / 16Jana Eggink, Sheffield, UK Results I Instrument Identification 94%6%0% 12%88%0% 18%82%0% 6%0%88%6% 0%25%0% 75% cello violin oboe clarinet flute celloviolinoboeclarinetflute response stimulus Solo instrument with accompaniment (piano or orchestra), commercially available CDs, 90 examples, 2-3 min. each Instrument 86% correct

7 Melody Extraction from Complex Audio6 / 16Jana Eggink, Sheffield, UK But... Estimated F0s not very accurate (as judged by manual inspection) Overall instrument classification very good, but only when averaged over a whole sound file, results not very accurate on a note-by-note or frame-by-frame basis More information is needed to find the melody!

8 Melody Extraction from Complex Audio7 / 16Jana Eggink, Sheffield, UK Task II: Find Melody (assuming the solo instrument is known) Extract multiple F0 candidates TEMPORAL KNOWLEDGE tone length interval transitions AUDIO F0 candidates find most likely ‘path’ through time- frequency space of F0 candidates F0 strength (~loudness) F0 likelihood (absolute frequency | instrument range) instrument likelihood (recogniser output) LOCAL KNOWLEDGE silence estimation (only accompaniment?) MELODY Include additional knowledge about instrument range, tone duration, likely interval transitions to pick correct candidate

9 Melody Extraction from Complex Audio8 / 16Jana Eggink, Sheffield, UK time frequency Knowledge Integration (Path Finding) Possible melody paths restricted by longer tones of continuous F0 All knowledge sources are normalised to equal mean and standard deviation Knowledge sources are summed along the current path N-best search for most likely path

10 Melody Extraction from Complex Audio9 / 16Jana Eggink, Sheffield, UK ‘Silence’ Estimation Solo instrument is not always continuously playing Use likelihoods for solo instrument along the estimated path Present threshold: median of likelihood values for solo instrument (assuming the solo instrument is present at least 50% of the time) Silent threshold: mean of likelihood values over all instruments Assign whole tones according to proximity to present/silent threshold and the state of their neighbours Impose minimum length on ‘present’ sections

11 Melody Extraction from Complex Audio10 / 16Jana Eggink, Sheffield, UK Evaluation: Test Material Realistic recordings do not provide information about ‘true’ F0s, even scores only approximation Use MIDI generated audio Real instrument samples, but only 3-4 per octave, provided by the sampler software 10 examples, for every solo instrument one piece with piano accompaniment, one with orchestra Solo instrument and accompaniment mixed at 0dB SNR Whole movements (or first 3 minutes) to ensure sufficient presence of the solo instrument; mixture of different styles and tempi

12 Melody Extraction from Complex Audio11 / 16Jana Eggink, Sheffield, UK Results F0 Estimation Comparing F0s estimated using harmonic sieves to search for prominent harmonic series with simply picking the highest spectral peak shows no advantage of the former 95%84%94%98%99%98%15 76%64%61%78%94%82%3 52%38%28%48%78%70% 1 (strongest) averagecelloviolinoboeclarinetfluteF0 candidates (based solely on sections were the solo instrument is present) Very unexpected, but might be caused by the very rich mixture of harmonically related tones, initial results show that other algorithms that search for harmonic series like e.g. YIN (autocorrelation based) do not do well either

13 Melody Extraction from Complex Audio12 / 16Jana Eggink, Sheffield, UK Results Instrument Identification Solo instrument without accompaniment: all examples correct, except one oboe mistaken for a flute Solo instrument with accompaniment: violin and cello still correct, but performance for woodwinds approaching random, even with true F0s provided Possible reasons: Sample-based music might be harder to identify, as it provides less instrument specific variation like e.g. vibrato Mixing level might be unfavourable with worse SNR than in realistic recordings Frequency regions that are dominated by the accompaniment might differ between realistic recordings and MIDI based audio

14 Melody Extraction from Complex Audio13 / 16Jana Eggink, Sheffield, UK Results Melody Extraction Baseline performance only strongest F0, no other knowledge 135% 76% 51% path 117% 72% 54% path+silence 321% 78% 40% strongest F0 spurious tones tones found correct frames Number of correct frames improved by 14%, with the number of spurious tones reduced to nearly a third, leading to significantly smoother melody lines Path finding and especially silence estimation likely to suffer from poor instrument identification performance with MIDI based audio

15 Melody Extraction from Complex Audio14 / 16Jana Eggink, Sheffield, UK Realistic Example Melody based on strongest F0 time (frames) F0 (Hz) Beginning of Mozart’s Clarinet Concerto, taken from a CD recording, manually annotated F0s (gray) and estimated melody (black) F0 (Hz) Melody based on knowledge integrating path finding time (frames)

16 Melody Extraction from Complex Audio15 / 16Jana Eggink, Sheffield, UK Conclusions and Future Work Audio generated from MIDI not necessarily good test material! Two short manually annotated realistic examples 10%-15% more correct frames than equivalent MIDI based examples Further work concentrating on realistic examples, requires manual labeling, or Automatic alignment of MIDI data to real recordings?!

17 Melody Extraction from Complex Audio16 / 16Jana Eggink, Sheffield, UK The End Any Questions?


Download ppt "Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of Sheffield {j.eggink"

Similar presentations


Ads by Google