Automatic Music Transcription: Employing Hidden Markov Models to assist with Multiple Fundamental Pitch Estimation ASHWIN D. D’CRUZ Acting Supervisor:

Automatic Music Transcription: Employing Hidden Markov Models to assist with Multiple Fundamental Pitch Estimation ASHWIN D. D’CRUZ Acting Supervisor: Prof. Thomas Braunl Faculty: School of Electrical, Electronic and Computer Engineering Supervisor: Prof. Roberto Togneri

Automatic Music Transcription (AMT) is the process of taking an audio signal as input and providing a form of musical notation as output. Automatic Music Transcription

Genre classification [1] Computer accompaniment for music practice [2] Music communication and learning Applications [1] Nicolas Scaringella, Giorgio Zoia, and Daniel Mlynek. Automatic genre classification of music content: a survey. Signal Processing Magazine, IEEE, 23(2):133–141, 2006. [2] Roger B Dannenberg and Christopher Raphael. Music score alignment and computer accompaniment. Communications of the ACM, 49(8):38–43, 2006.

Note: Most basic unit of music Pitch: Perception of sound Chord: Collection of notes Triad: Specific subset of chords [1][2] Music terminology [1] MIREX 2014: Audio chord estimation, January 2014. URL http://www.music-ir.org/mirex/wiki/2014:Audio_Chord_Estimation. [2] J Stephen Downie. The music information retrieval evaluation exchange (2005–2007): A window into music information retrieval research. Acoustical Science and Technology, 29(4):247–255, 2008.

Harmonics: Overlapping harmonics of notes could be confused as notes State Space: A piano has 88 keys. How can be search through all combinations? [1] Issues [1] Meinard Muller, Daniel PW Ellis, Anssi Klapuri, and Gae ̈ l Richard. Signal processing for music analysis. Selected Topics in Signal Processing, IEEE Journal of, 5(6):1088–1110, 2011.

Implement a Hidden Markov Model (HMM), with focus on the appropriate feature vector, to estimate the chord being played. Using a chord label and existing chord data, determine the multiple fundamental pitches in the chord. Aims

Framework [1] Stephen W Hainsworth and Malcolm D Macleod. The automated music transcription problem, 2003.

Isolated chords are a form of isolated sounds and can be modelled as such: Isolated Chord Model Image taken from Kristoffer Jensen. Envelope model of isolated musical sounds. In Proceedings of the 2nd COST G-6 Workshop on Digital Audio Effects (DAFx99), 1999.

Hidden Markov Models

Hidden Markov Models – More complex stages [1] Lawrence Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, 1989.

1.Frame signal 2.Calculate power spectrum of each frame 3.Apply Mel filterbank to each spectrum, summing energies per filter. 4.Take logarithm of each filterbank energy 5.Take DCT of log of filterbank energies 6.Keep lower DCT coefficients, discard the rest Methodology – Mel Frequency Cepstral Coefficients (MFCCS) [1] Vibha Tiwari. Mfcc and its applications in speaker recognition. International Journal on Emerging Technologies, 1(1):19–22, 2010.

Methodology – Mel Frequency Cepstral Coefficients (MFCCS) 20-40ms taken every 10ms

Methodology – Mel Frequency Cepstral Coefficients (MFCCS) Melspectrum coefficients [1,2,3,…,26] Natural log coefficients [0,0.6931,1.0986,…,3.2581] [1] Top left image taken from Aldebaro Klautau. The mfcc. Technical report, LISA - Laboratório de Imagens, Sinais e Áudio (CIC/UnB), 2005. [2] Bottom left image taken from James Lyon. Mel frequency cepstral coefficient (mfcc) tutorial, 2012. URL http://practicalcryptography.com/miscellaneous/machine- learning/ guide-mel-frequency-cepstral-coefficients-mfccs/.

Discrete Cosine Transform Similar to the Discrete Fourier Transform Basis is (real-valued) cosine functions Can be used because signal is real Keep coefficients 2-13, discard the rest Methodology – Mel Frequency Cepstral Coefficients (MFCCS)

1.Frame signal 2.Calculate power spectrum of each frame 3.Divide spectrum into octaves [2] 4.Sum pitch strength from each octave for each pitch class [2] Methodology – Pitch Chromagrams [1] [1] Takuya Fujishima. Realtime chord recognition of musical sound: A system using common lisp music. In Proc. ICMC, volume 1999, pages 464–467, 1999. [2] Christian Scho ̈ rkhuber and Anssi Klapuri. Constant-q transform toolbox for music processing. In 7th Sound and Music Computing Conference, Barcelona, Spain, pages 3–64, 2010.

Methodology – Pitch Chromagrams Pitch chromagram vector [ C, C#, D, Eb, E, F, F#, G, G#, A, Bb, B] [ 83, 16, 36, 93, 63, 86, 72, 51, 12, 15, 17, 87] 64-125 Hz125-250 Hz 250-530 Hz 530-1000 Hz 1000-2000 Hz2000-4200 Hz

So far, feature vectors have information about one particular frame only. Dynamic feature vectors allow information between frames to be captured as well. Dynamic Feature Vectors 14.5000 14.0000 15.0000 16.9500 15.7000 18.2000 -2.8500 -5.3200 -0.3400 55.0000 73.0000 37.0000 14.5500 7.3000 21.8000 -6.4350 -9.3200 -3.4700 79.0000 63.0000 95.0000 3.9000 -6.7000 14.7000 -7.9350 -8.6400 -7.1400 55.0000 26.0000 84.0000 -8.7000 -19.7000 2.6000 -6.6300 -4.9900 -8.2000 34.0000 4.0000 65.0000 -11.1000 -14.0000 -7.9000 -3.2400 -0.8900 -5.5700 Average of Coefficients CoefficientsDelta Coefficients Acceleration Coefficients

Gaussian Model Probability 0 1 -2 9 Probability 0 1 -20 5

Gaussian Mixture Models Probability 0 1 -2 9

Chord Estimation Probability C Major B Major Chord ModelInternal RepresentationProbability 0.5 0.6 Example Input: [2] [12] [13] [78] [9] 2 12 13789 2 12 13789

Database (training and testing sets) was created via software: 1.Midi files were created using Matlab [1]. 2.FluidSynth [2] was used to convert midi files to wav files. 3.Matlab was used to convert wav files from stereo to mono. Reverberation was added to sets to introduce realistic variety. Methodology - Database Original Small Room SNR: 6 dB Large Room SNR: 2 dB [1] http://kenschutte.com/midihttp://kenschutte.com/midi [2] http://fluidsynth.elementsofsound.org/http://fluidsynth.elementsofsound.org/ C Major Chord Family

Results – Mel Feature Vectors 100% 98.96 %

Results – Mel Feature Vectors Dynamic with 7 states Static with 10 states 75.69% 18.75 %

Results – Time Measurements 5.541 s5.365 s

Results – Pitch Chromagram 52.08% 31.94 % Dynamic with 2 states Static with 3 states

[1] Yushi Ueda, Yuuki Uchiyama, Takuya Nishimoto, Nobutaka Ono, and Shigeki Sagayama. Hmm-based approach for automatic chord detection using refined acoustic features. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Con- ference on, pages 5518–5521. IEEE, 2010. [2] Juan Pablo Bello and Jeremy Pickens. A robust mid-level representation for harmonic content in music signals. In ISMIR, volume 5, pages 304–311, 2005. [3] Hélene Papadopoulos and Geoffroy Peeters. Large-scale study of chord estimation al- gorithms based on chroma representation and hmm. In Content-Based Multimedia Indexing, 2007. CBMI’07. International Workshop on, pages 53–60. IEEE, 2007. Comparing Results ResearcherAccuracyMethod Ueda et al (2010) [1] 79.81%Discrete Fourier Transform of Pitch Chromagram Vectors Bello and Pickens (2005) [2] 75.04%Pitch Chromagrams Papadopoulos and Peeters (2007) [3] 70.96%Pitch Chromagrams

Results – Training using more than 1 piano A vs A with 6 states ABC vs C with 9 states AB vs C with 10 states 99.7% 10.07% 100% [1] Emmanouil Benetos, Simon Dixon, Dimitrios Giannoulis, Holger Kirchhoff, and Anssi Klapuri. Automatic music transcription: challenges and future directions. Journal of Intelligent Information Systems, 41(3):407–434, 2013.

Results – Violin and Guitar 96.91% 94.17 % Guitar with 9 States Violin with 10 states

Magnitude square coherence is an estimate that outputs between 0 and 1 depending on how well two signals correspond to each other for a given frequency. Essentially is correlation coefficient of the frequency domain. Determining Exact Pitches using Coherence C?.wav FileC1.wavC2.wavC3.wavC4.wavC5.wavC6.wav FileC7.wavC8.wavC9.wavC0.wavCx.wavCy.wav Coherence measure 3134239123 Coherence measure 373457776514

MFCCs work well for this task providing up to 100% chord recognition accuracy for test data with small amounts of noise and up to 75% chord recognition accuracy for noisier test data. Pitch vectors do not provide results which are as promising although being able to achieve 52% accuracy is comparable to other systems. The system developed works on other instruments, achieving up to 96 % accuracy for guitar and 94% accuracy for violin. Summary 1 ProcessTime taken (s)Comments Extract training data4.671Carried out once Train chord models7.500Carried out once Extract test data0.041Carried out each test Estimate chord0.991Carried out each test Determine pitches2.230Carried out each test

Onset detection system Instrument recognition Future Work

Visit www.ashwindcruz.comwww.ashwindcruz.com Email: amt@ashwindcruz.com Information

Automatic Music Transcription: Employing Hidden Markov Models to assist with Multiple Fundamental Pitch Estimation ASHWIN D. D’CRUZ Acting Supervisor:

Similar presentations

Presentation on theme: "Automatic Music Transcription: Employing Hidden Markov Models to assist with Multiple Fundamental Pitch Estimation ASHWIN D. D’CRUZ Acting Supervisor:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automatic Music Transcription: Employing Hidden Markov Models to assist with Multiple Fundamental Pitch Estimation ASHWIN D. D’CRUZ Acting Supervisor:

Similar presentations

Presentation on theme: "Automatic Music Transcription: Employing Hidden Markov Models to assist with Multiple Fundamental Pitch Estimation ASHWIN D. D’CRUZ Acting Supervisor:"— Presentation transcript:

Similar presentations

About project

Feedback