1 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI LANGUAGE AND INTELLIGENCE U N I V E R S I T Y O F P I S A DEPARTMENT OF COMPUTER SCIENCE Automatic.

Slides:



Advertisements
Similar presentations
Decoding of Convolutional Codes  Let C m be the set of allowable code sequences of length m.  Not all sequences in {0,1}m are allowable code sequences!
Advertisements

Dynamic Bayesian Networks (DBNs)
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Introduction to Hidden Markov Models
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Phylogenetic Trees Presenter: Michael Tung
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Dynamic Time Warping Applications and Derivation
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
/14 Automated Transcription of Polyphonic Piano Music A Brief Literature Review Catherine Lai MUMT-611 MIR February 17,
Natural Language Understanding
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Isolated-Word Speech Recognition Using Hidden Markov Models
Educational Software using Audio to Score Alignment Antoine Gomas supervised by Dr. Tim Collins & Pr. Corinne Mailhes 7 th of September, 2007.
Graphical models for part of speech tagging
Hidden Markov Models for Sequence Analysis 4
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
NLP. Introduction to NLP Sequence of random variables that aren’t independent Examples –weather reports –text.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Performance Comparison of Speaker and Emotion Recognition
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
John Lafferty Andrew McCallum Fernando Pereira
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Catherine Lai MUMT-611 MIR February 17, 2005
Hidden Markov Models.
Statistical Models for Automatic Speech Recognition
Hidden Markov Autoregressive Models
Statistical Models for Automatic Speech Recognition
CONTEXT DEPENDENT CLASSIFICATION
LECTURE 15: REESTIMATION, EM AND MIXTURES
Dynamic Programming Search
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
Speech recognition, machine learning
Presentation transcript:

1 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI LANGUAGE AND INTELLIGENCE U N I V E R S I T Y O F P I S A DEPARTMENT OF COMPUTER SCIENCE Automatic Transcription of Piano Music Sara Corfini

2 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI INTRODUCTION Trascribing recordings of piano music into a MIDI rapresentation  MIDI provides a compact representation of musical data  Score-following for computer-human interactive performance  “Signal-to-score” problem A hidden Markov model approach to piano music transcription  A “state of nature” can be realized through a wide range of data configurations  Probabilistic data representation  Automatically learning this probabilistic relationship is more flexible than optimizing a particular model  Rules describing the musical structure can be more accurately represented as tendencies

3 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI THE MODEL The acustic signal is segmented into a sequence of frames (“snapshots” of sound) For each frame a feature vector y 1,…,y N is computed Goal  to assign a label to each frame describing its content A generative probabilistic framework (a hidden Markov model) outputs  the observed sequence of features vectors y 1,…,y N hidden variables  labels A Hidden Markov model is composed of two processes X = X 1,…,X N and Y = Y 1,…,Y N X is the hidden (or label) process and describes the way a sequence of frame labels can evolve (a Markov chain) We do not observe the X process directly, but rather the feature vector data The likelihood of a given feature vector depends only on the corresponding label

4 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI THE LABEL PROCESS GOAL  to assign a label to each frame where each label ∈ L Components of the label  the pitch configuration (chord)  “attack”, “sustain”, “rest” portions of a chord We define a random process (a Markov chain) X 1,…,X N that takes value in the label set L The probability of the process occupying a certain state (label) in a given frame depends only on the preceding state (label) where p(x’|x) is the transition probability matrix and X 1 n = (X 1,…,X N )

5 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI THE LABEL PROCESS Markov model for a single chord Markov model for recognition problem  the final state of each chord model is connected to the initial state of each chord model  a silence model is constructed for the recorder space before and after the performance

6 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI THE OBSERVABLE PROCESS Rather than observe the label process x 1,…,x N, we observe feature vector data y 1,…,y N (probabilistically related to labels) Assumption of HMM  each visited state X n produces a feature vector Y n from a distribution that is characteristic of that state Hence, given X n, Y n, is conditionally independent of all other frame labels and all other feature vectors

7 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI THE OBSERVABLE PROCESS We compute a vector of features for each frame y = (y 1,…,y K ) The components of this vector are conditionally independent given that state The state are tied  different states share the same feature distributions Where the T k (x) is constructed by hand Hence we have

8 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI THE OBSERVABLE PROCESS T k (x) can be clarified by describing the computed features y 1  measures the total energy in the signal (to distinguish between the times when the pianist plays and when there is silence)  T 1 (x) = 0 for the silence and rest states  T 1 (x) = 1 for the remaining states  Two probabilistic distributions:  p(y k |T 1 (x)=0)  p(y k |T 1 (x)=1)  Partition of the label set generated by  T 1 (x) : {x ∈ L : T 1 (x)=0}, {x ∈ L : T 1 (x)=1}

9 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI THE OBSERVABLE PROCESS y 2  measures the local burstiness of the signal (to distinguish between note “attacks” and steady state behaviour) y 2 computes several measures of burstiness (is a vector) For this features, states can be partioned in three groups  T 2 (x) = 0  states at the beginning of each note (high burstiness)  T 2 (x) = 1  states corresponding to steady state behaviour (relatively low burstiness)  T 2 (x) = 2  silence states

10 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI THE OBSERVABLE PROCESS y 3,…,y K  concerns the problem of distinguishing between the many possible pitch configuration Each features of y 3,…,y K is computed from a small frequency interval of the Fourier transformed frame data For each window we compute  the empirical mean  location of the harmonic (when there is a single harmonic in the window)  the empirical variance  to distinguish probabilistically when there is a single harmonic (low variance) and when there is not (high variance) State can be partinioned as  T k (x) = 0  states in which no notes contain energy in the window  T k (x) = 1  states having several harmonics in the window  T k (x) = t  states having a single harmonic at approximately the same frequency in the window

11 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI TRAINING THE MODEL Since the HMM formulation, the probability distribution can be trained in an unsupervised fashion An iterative procedure (Baum-Welch algorithm) allows to automatically train from signal-score pairs When the score is known, we can build a model for the hidden process The algorithm  Starts from a neutral starting place (we begin with uniformly distributed output distributions)  Iterates the process of finding a probabilistic correspondence between model states and data frames  Next, we retrain the probability distribution using this corrispondence

12 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI TRAINING THE MODEL Output distributions on feature vectors are represented through decision trees For each distribution p(y k |T k (x)) we form a binary tree  Each non terminal node corresponds to a question y k,v < c (where y k,v is the v th component of feature k )  An observation y k can be associated with a non terminal node by dropping the observation down the tree (evaluating the root question)  The process continues until it arrives at a terminal node, denoted by Q k (y k ) As the training procedure evolves, the trees are re-estimated at each iteration to produce more informative probability distributions

13 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI RECOGNITION The traditional HMM approaches to recognitions seeks the most likely labeling of frames, given the data, through dynamic programming This corresponds to find the best path through the graph, where the reward in going from state x n-1 to x n in the n th iteration is given by The Viterbi algorithm constructs the optimal paths of lenght n from the optimal paths of length n-1 The computational complexity grows with the square of the state-space which is completely intractable in this case The state space is on the order of 10 8 (under restrictive assumptions on the possible collection of pitches and the number of notes in a chord)

14 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI RECOGNITION We use the data model constructed in the training phase to produce a condensed version of the state graph For each frame n we perform a greedy search that seeks a plausible collection of state x ∈ L for that frame This is accomplished by searching for states x giving large value to p(y n |x). The search is performed by  Finding the mostly likely 1 -note hypotheses  Then considering 2 -note hypotheses and so on Each frame n will be associated with a possible collection of states A n The state are blended by letting The graph is constructed by restricting the full graph to the B n set Disadvantage  if the true state at frame n is not captured by B n, then it cannot be recovered during recognition

15 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI EXPERIMENTS The hidden Markov model has been trained by using data taken from various Mozart piano sonatas The result concerns a performance of Sonata 18, K.570 Objective measure of performance  edit distance Recognition erroe rates are reported as Note error rate  39% (184 substitutions, 241 deletions, 108 insertions)  If two adjacent recognized chords have a pitch in common, it is assumed that the note is not rearticulated  Inability to distinguish between chord homonyms

16 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI CONCLUSION Recognition results leave room for improvements Results may be useful in a number of Music Information Retrieval applications tolerant of errorful representaions The current system works with no knowledge of the plausibility of various sequences of chord probability of chord sequence  Probabilistic model that models the likelihood of chord sequences The current system makes almost no effort to model the acoustic characteristics of the highly informative note onsets  A more sophisticated “attack” model would help in recognizing the many repeated notes which the system currently misses

17 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI REFERENCES Christopher Raphael Automatic transcription of piano music In Proceedings of the 3rd Annual International Symposium on Music Information Retrieval (ISMIR), Michael Fingerhut, Ed., pp , IRCAM - Centre Pompidou, Paris, France, October 2002.