Stochastic Text Models for Music Categorization Carlos Pérez-Sancho, José M. Iñesta, David Rizo Pattern Recognition and Artificial Intelligence group Department of Software and Computing Systems University of Alicante, Spain
SSPR Outline ► Introduction ► Music encoding Melody Harmony ► Experiments Plain classification Classifier ensembles Hierarchical classification ► Conclusions
SSPR Introduction ► Premise: music content can be used to model musical style ► We use language modeling techniques to classify symbolic digital scores ► For that, digital scores need to be encoded into sequences of symbols
SSPR Music encoding ► Two different sources of information Melody Harmony
SSPR Melody encoding ► Polyphonic sequences are reduced to monophonic using skyline ► Pitch intervals and duration ratios are computed for each pair of consecutive notes ► Numeric values are encoded into ASCII symbols (2,×½) → Bf (1,×1) → AZ (-1,×2) → aF (-2,×1) → bZ (2,×1) → BZ (-4,×1) → dZ (-2,×1) → bZ (4,×1) → DZ Bf AZ aF bZ BZ dZ bZ DZ ( interval, duration ratio ) input string
SSPR Harmony encoding ► Chords are encoded as degrees relative to the tonality for transposition invariance ► Only chord changes are encoded Key: E flat VIm V I input string
SSPR Experiments ► Dataset: music from 3 genres and 9 sub-genres (around 60 hours of music) ► Classification techniques Naïve Bayes Language modeling (n-grams), classifying by lowest perplexity
SSPR Experimental setup ► First step: plain classification 80% of the dataset used 10-fold cross validation ► Second step: hierarchical classification using classifier ensembles Weights of the ensembles adjusted using previous results Remaining 20% of dataset used for validation
SSPR Classification results ► Best classification rates in the 3-classes problem were obtained using harmonic information ► When classifying sub-genres, melody usually performs better ► Naïve Bayes performs better most of the times ► No significant differences for different context sizes in n-grams. Chords (harmony)Melody 2-grams 3-grams 4-grams N.B.
SSPR Confussion matrix ► Misclassifications occur more frequently within broad domains ► Try to prevent intra-domain errors by using a hierarchical classifier ×100 %
SSPR Hierarchical classification
SSPR Hierarchical classification ► Harmony (chord progressions) is used at the first level ► Melody is used at the second level ► Instead of using single classifiers, an ensemble of classifiers is used at each level to increase robustness
SSPR Classifier ensembles ► Decisions are made by weighted majority vote ► Two weighting schemes Linear best-worst weighting vote Quadratic best-words weighting vote # errors weight
SSPR Hierarchical classification results ► With the remaining 20% of the dataset Single classifiers 2-grams3-grams4-gramsN.Bayes 3 classes Using harmony classes Using melody Best singleLinearQuadratic 3 classes 88.4 (melody 2-grams) 90.1 Hierarchical classification 9 classes st level (3 classes): harmony 2-grams 2nd level (3x3 classes): academic – melody NB jazz – melody NB popular – melody 4-grams Classifier ensembles
SSPR Conclusions ► Harmony and melody are suitable features for music genre classification ► Harmony is better for classifying broad musical domains, while melody is better for distinguishing sub-genres ► Misclassifications occur more frequently within broad domains ► Hierarchical classification and classifier ensembles outperformed the best single classifiers
Stochastic Text Models for Music Categorization Carlos Pérez-Sancho, José M. Iñesta, David Rizo Pattern Recognition and Artificial Intelligence group Department of Software and Computing Systems University of Alicante, Spain