Polyphonic music information retrieval based on multi-label cascade classification system presented by Zbigniew W. Ras University of North Carolina, Charlotte,

Polyphonic music information retrieval based on multi-label cascade classification system presented by Zbigniew W. Ras University of North Carolina, Charlotte, NC College of Computing and Informatics www.kdd.uncc.edu http//:www.mir.uncc.edu

Student: Wenxin Jiang Advisor: Dr. Zbigniew W. Ras Polyphonic music information retrieval based on multi-label cascade classification system

43 MIR systems 43 MIR systems Most are pitch estimation-based melody and rhythm match Most are pitch estimation-based melody and rhythm match This presentation will focus on timbre estimation This presentation will focus on timbre estimation http://mirsystems.info/ http://mirsystems.info/ Survey of MIR- http://mirsystems.info/ http://mirsystems.info/

Outcome: Musical Database [music pieces indexed by instruments and emotions]. Resulting Database will be represented as FS-tree guarantying efficient storage and retrieval. MIRAI - Musical Database (mostly MUMS) [music pieces played by 59 different music instruments] Goal: Design and Implement a System for Automatic Indexing of Music by Instruments (objective task) and Emotions (subjective task)

What is needed? Database of monophonic and polyphonic music signals and their descriptions in terms of new features (including temporal) in addition to the standard MPEG7 features. These signals are labeled by instruments and emotions forming additional features called decision features. Automatic Indexing of Music Why is needed? To build classifiers for automatic indexing of musical sound by instruments and emotions.

… … … MIRAI - Cooperative Music Information Retrieval System based on Automatic Indexing User … … … Instruments … Query Indexed Audio Database Query Adapter Durations Empty Answer? Music Objects

Binary File Binary File PCM : PCM : Sampling Rate Sampling Rate 44.1K Hz 16 bits 2,646,000 values/min. Raw data--signal representation PCM (Pulse Code Modulation) - the most straightforward mechanism to store audio. Analog audio is sampled & individual samples are stored sequentially in binary format.

Data source organizationvolumeTypeQuality Traditional data StructuredModestDiscrete,CategoricalClean Audio data Unstructured Very large Continuous,NumericNoise The nature and types of raw data Challenges to applying KDD in MIR

Feature Database traditional pattern recognition Feature Extraction lower level raw data form Higher level representations classificationclusteringregression Amplitude values at each sample point manageable Feature extractions

MPEG7 features Instantaneous Harmonic Spectral Centroid Instantaneous Harmonic Spectral Deviation Signal Hamming Window STFT Signal envelope Fundamental Frequency Harmonic Peaks Detection Instantaneous Harmonic Spectral Spread Temporal Centroid Power Spectrum Spectral Centroid Log Attack Time Instantaneous Harmonic Spectral Variation Hamming Window STFT NFFT FFT points

Derived Database Extended MPEG7 features Other features & new features FeatureDurationsSub-TotalTotal Tristimulus Parameters41040 Spectrum Centriod/Spread II428 Flux414 Roll Off414 Zero Crossing414 MFCC44x13208 Spectrum Centroid/Spread I326 Harmonic Parameters3412 Flatness34x24288 Durations313 Total577 Feature Harmonic Upper Limit1 Harmoni Ratio1 Basis Functions190 Log Attack Time1 Temporal Centroid1 Spectral Centroid1 Spectrum Centroid/Spread I2 Harmonic Parameters4 Flatness24x4 Total 297

Hierarchical Classification Schema I

Schema II - Hornbostel Sachs AerophoneChordophoneMembranophoneIdiophone FreeSingle ReedSideLip Vibration Whip Alto Flute FluteC Trumpet French Horn Tuba Oboe Bassoon

Schema III - Play Methods MutedPizzicatoBowedPicked PiccoloFluteBassoonAlto Flute ShakenBlow ……

Xin Cynthia Zhang17 Xin Cynthia Zhang17 Database Table Obj Classification Attributes Decision Attributes CA 1 … CA n Hornbostel SachsPlay Method 10.22… 0.28 [Aerophone, Side, Alto Flute][Blown, Alto Flute] 20.31… 0.77 [Idiophone, Concussion, Bell][Concussive, Bell] 30.05… 0.21 [Chordophone, Composite, Cello] [Bowed, Cello] 40.12… 0.11 [Chordophone, Composite, Violin] [Martele, Violin]

Example Xabcd x1a[1]b[2]c[1]d[3] x2a[1]b[1]c[1]d[3,1] x3a[1]b[2]c[2,2]d[1] x4a[2]b[2]c[2]d[1] 12 12 C[1]C[2] C[2,1]C[2,2] 12 12 d[1]d[2] d[3,1]d[3,2] 3 d[3]Level I Level II Classification AttributesDecision Attributes

Classification 90% training, 10% testing. 10 folds. Hierarchical (Schema I) vs none hierarchical. Compare with different classifiers. – J48 tree – Naïve Baysian

J48-TreeNaïveBaysian All70.4923%68.5647% MPEG65.7256%56.9824% Results of the none-hierarchical Classification

Results of the hierarchical Classification (Schema I) with MPEG7 features J48-TreeNaïveBaysian Family86.434%64.7041% No-pitch73.7299%66.2949% Percussion85.2484%84.9379% String72.4272%61.8447% Wind67.8133%

Results of the hierarchical Classification (Schema I) with all features J48-TreeNaïveBaysian Family91.726%72.6868% No-pitch77.943%75.2169% Percussion86.0465%88.3721% String76.669%66.6021% Woodwind75.761%78.0158%

Classification Results J48-TreeWith new FeaturesWithout new Features AccuracyRecallAccuracyRecall Con-clarinet100.060.083.3100.0 Electricbass100.073.393.3 Flute100.050.060.075.0 Steel Drums100.066.750.066.7 Tuba100.0 85.7 Vibraphone87.593.378.673.3 Cello87.095.286.761.9 Violin84.077.866.759.3 Piccolo83.350.060.0 Marimba82.487.583.393.8 Ctrumpet81.376.587.582.4 Alto Flute80.0 English Horn80.057.142.9

. Polyphonic Sound Polyphonic Sound segmentationsegmentation Feature extraction Classifier Get Instrument Sound separation Polyphonic sounds – how to handle? 1.Single-label classification Based on Sound Separation 2.Multi-labeled classifiers Get frame Problems ? subtraction Information loss during the signal subtraction Sound Separation Flowchart

This presentation will focus on timbre estimation in polyphonic sounds and designing multi-labeled classifiers timbre relevant descriptors timbre relevant descriptors Spectrum Centroid, Spread Spectrum Centroid, Spread Spectrum Flatness Spectrum Flatness Band Coefficients Harmonic Peaks Harmonic Peaks Mel frequency cepstral coefficients (MFCC) Mel frequency cepstral coefficients (MFCC) Tristimulus

Sub-pattern of single instrument in mixture Feature extraction

Feature Extraction Feature Extraction Features Classifier instrumentconfidence Candidate 1 70% Candidate 2 50%...... Candidate N 10% 40ms Timbre estimation based on multi-label classifiersegmentation Acoustic descriptors instrumentconfidence Candidate 1 70% Candidate 2 50%...... Candidate N 10%instrumentconfidence Candidate 1 70% Candidate 2 50%...... Candidate N 10% Single label database

Polyphonic Sound Polyphonic Sound Get frame Feature extraction Perform multiple classifying Finish all the Frames estimation Voting process based on context Voting process based on context Get Final winners Multiple labels Flowchart of multi-label classification system

Timbre Estimation Results based on different methods [Instruments - 45, Training Data (TD) - 2917 single instr. sounds from MUMS, Testing on 308 mixed sounds randomly chosen from TD, window size – 1 sec, frame size – 120ms, hop size – 40ms, MFCC extracted from each frame (following MPEG-7)] experiment # pitch based Sound Separation N(Labels) max RecallPrecisionF-score 1Yes 154.55% 39.2%45.60% 2Yes 261.20% 38.1%46.96% 3YesNO264.28% 44.8%52.81% 4YesNO467.69% 37.9%48.60% 5YesNO868.3%36.9%47.91% Threshold 0.4 controls the total number of estimations for each index window.

Polyphonic Sound (window) Polyphonic Sound (window) Get frame Feature extraction Classifiers Multiple labels Compressed representations of the signal: Harmonic Peaks, Mel Frequency Ceptral Coefficients (MFCC), Spectral Flatness, …. Irrelevant information (inharmonic frequencies or partials) is removed. Violin and viola have similar MFCC patterns. The same is with double-bass and guitar. It is difficult to distinguish them in polyphonic sounds. More information from the raw signal is needed. Polyphonic Sounds

Short Term Power Spectrum – low level representation of signal (calculated by STFT) Power Spectrum patterns of flute & trombone can be seen in the mixture Spectrum slice – 0.12 seconds long

Experiment: Middle C instrument sounds (pitch equal to C4 in MIDI notation, frequency -261.6 Hz Training set: Power Spectrum from 3323 frames - extracted by STFT from 26 single instrument sounds: electric guitar, bassoon, oboe, B-flat, clarinet, marimba, C trumpet, E-flat clarinet, tenor trombone, French horn, flute, viola, violin, English horn, vibraphone, Accordion, electric bass, cello, tenor saxophone, B-flat trumpet, bass flute, double bass, Alto flute, piano, Bach trumpet, tuba, and bass clarinet. Testing Set: Fifty two audio files are mixed (using Sound Forge ) by two of these 26 single instrument sounds. Classifier – (1) KNN with Euclidean distance (spectrum match based classification); (2) Decision Tree (multi label classification based on previously extracted features)

Timbre Pattern Match Based on Power Spectrum experiment #descriptionRecall PrecisionF-score 1Feature-based + Decision Tree (n=2)64.28%44.8%52.81% 2Spectrum Match + KNN (k=1;n=2)79.41% 50.8%61.96% 3Spectrum Match + KNN (k=5;n=2)82.43% 45.8%58.88% 4 Spectrum Match + KNN (k=5;n=2) without percussion instrument 87.1% n – number of labels assigned to each frame; k – parameter for KNN

Hierarchical structure FluteEnglish Horn ViolinViola

Instrument granularity classifiers which are trained at each level of the hierarchical tree Hornbostel/Sachs

Modules of cascade classifier for single instrument estimation --- Hornboch /Sachs Pitch 3B 91.80% 96.02% 98.94% = 95.00% * >

New Experiment: Middle C instrument sounds (pitch equal to C4 in MIDI notation, frequency - 261.6 Hz Training set: 2762 frames extracted from the following instrument sounds: electric guitar, bassoon, oboe, B-flat, clarinet, marimba, C trumpet, E-flat clarinet, tenor trombone, French horn, flute, viola, violin, English horn, vibraphone, Accordion, electric bass, cello, tenor saxophone, B-flat trumpet, bass flute, double bass, Alto flute, piano, Bach trumpet, tuba, and bass clarinet. Classifiers – WEKA: (1) KNN with Euclidean distance (spectrum match based classification); (2) Decision Tree (classification based on previously extracted features) Confidence – ratio of the correct classified instances over the total number of instances

Classification on different Feature Groups Group Feature description ClassifierConfidence A 33 Spectrum Flatness Band Coefficients KNN Decision Tree 99.23% 94.69% B 13 MFCC coefficients KNN Decision Tree 98.19% 93.57% C 28 Harmonic Peaks KNN Decision Tree 86.60% 91.29% D 38 Spectrum projection coefficients KNN Decision Tree 47.45% 31.81% E Log spectral centroid, spread, flux, rolloff, zerocrossing KNN Decision Tree 99.34% 99.77%

Feature and classifier selection at each level of cascade system NodefeatureClassifier chordophone Band CoefficientsKNN aerophone MFCC coefficientsKNN idiophone Band CoefficientsKNN NodefeatureClassifier chrd_composite Band Coefficients KNN aero_double-reed MFCC coefficients KNN aero_lip-vibrated MFCC coefficients KNN aero_side MFCC coefficients KNN aero_single-reed Band Coefficients Decision Tree idio_struck Band Coefficients KNN KNN + Band Coefficients

Classification on the combination of different feature groups Classification based on KNNClassification based on Decision Tree

From those two experiments, we see that: 1)KNN classifier works better with feature vectors such as spectral flatness coefficients, projection coefficients and MFCC. 2)Decision tree works better with harmonic peaks and statistical features. Simply adding more features together does not improve the classifiers and sometime even worsens classification results (such as adding harmonic to other feature groups).

Feature and classifier selection at each level of Cascade System - Hornbostel/Sachs hierarchical tree Feature and classifier selection at top level

Feature and classifier selection at second level

Feature and classifier selection at third level

NodeFeatureClassifier chordophoneFlatness coefficientsKNN aerophoneMFCC coefficientsKNN idiophoneFlatness coefficientsKNN Feature and Classifier Selection Table for Level 1 NodeFeatureClassifier chrd_compositeFlatness coefficientsKNN aero_double-reedMFCC coefficientsKNN aero_lip-vibratedMFCC coefficientsKNN aero_sideMFCC coefficientsKNN Aero single-reedFlatness coefficientsDecision Tree Idio StruckFlatness coefficientsKNN Feature and Classifier Selection Table for Level 2 Feature and Classifier Selection

HIERARCHICAL STRUCTURE BUILT BY CLUSTERING ANALYSIS Common method to calculate the distance or similarity between clusters: single linkage (nearest neighbor), complete linkage (furthest neighbor), unweighted pair-group method using arithmetic averages (UPGMA), weighted pair-group method using arithmetic averages (WPGMA), unweighted pair-group method using the centroid average (UPGMC), weighted pair-group method using the centroid average (WPGMC), Ward's method. Most common distance functions: Euclidean, Manhattan, Canberra (examines the sum of series of a fraction differences between coordinates of a pair of objects), Pearson correlation coefficient (PCC) – measures the degree of association between objects, Spearman's rank correlation coefficient. Clustering algorithm – HCLUST (Agglomerative hierarchical clustering) – R Package

Testing Datasets (MFCC, flatness coefficients, harmonic peaks) : The middle C pitch group which contains 46 different musical sound objects. Each sound object is segmented into multiple 0.12s frames and each frame is stored as an instance in the testing dataset. There are totally 2884 frames We also extract three different features (MFCC, flatness coefficients, and harmonic peaks) from those sound objects. Each feature produces one dataset of 2884 frames for clustering. Clustering: When the algorithm finishes the clustering process, a particular cluster ID is assigned to each single frame.

Cluster 1…Cluster j…Cluster n Instrument 1 X 11 … X 1 j … X1nX1n ……………… Instrument i Xi1Xi1 … X ij … X in ……………… Instrument n X n1 … X nj … X nn Contingency Table derived from clustering result

Featuremethodmetric α wscore Flatness Coefficientswardpearson87.3%3732.30 Flatness Coefficientswardeuclidean85.8%3731.74 Flatness Coefficientswardmanhattan85.6%3630.83 mfccwardkendall81.0%3629.18 mfccwardpearson83.0%3529.05 Flatness Coefficientswardkendall82.9%3529.03 mfccwardeuclidean80.5%3528.17 mfccwardmanhattan80.1%3528.04 mfccwardspearman81.3%3427.63 Flatness Coefficientswardspearman83.7%3327.62 Flatness Coefficientswardmaximum86.1%3227.56 mfccwardmaximum79.8%3427.12 Flatness Coefficientsmcquittyeuclidean88.9%3026.67 mfccaveragemanhattan87.3%3026.20 Evaluation result of Hclust algorithm (14 results which yield the highest score among 126 experiments w – number of clusters, α - average clustering accuracy of all the instruments, score= α*w

Clustering result from Hclust algorithm with Ward linkage method and Pearson distance measure; Flatness coefficients are used as the selected feature “ctrumpet” and “batchtrumpet” are clustered in the same group. “ctrumpet_harmonStemOut” is clustered in one single group instead of merging with “ctrumpet”. Bassoon is considered as the sibling of the regular French horn. “French horn muted” is clustered in another different group together with “English Horn” and “Oboe”.

Comparison between non-cascade classification and cascade classification with different hierarchical schemas ExperimentClassification methodDescriptionRecallPrecisionF-Score 1non-cascadeFeature-based64.3%44.8%52.81% 2non-cascadeSpectrum-Match79.4%50.8%61.96% 3CascadeHornbostel/Sachs75.0%43.5%55.06% 4Cascadeplay method77.8%53.6%63.47% 5Cascademachine learned87.5%62.3%72.78%

We evaluate the classification system by the mixture sounds which contain two single instrument sounds. We also create 49 polyphonic sounds by randomly selecting three different single instrument sounds and mixing them together. We then test those three-instrument mixtures with five different classification methods (experiment 2 to 6) which are described in the previous two-instrument mixture experiments. Single-label classification based on the sound separation method is also tested on the mixtures (experiment 1). KNN (k=3) is used as the classifier for each experiment.

Exp# Classifier Method Recall Precision F-Score 1Non-Cascade Single-label based on sound separation 31.48%43.06%36.37% 2Non_Cascade Feature-based multi-label classification Spectrum-Match69.44%58.64%63.59% 3Non_Cascademulti-label classification85.51%55.04%66.97% 4Cascade(hornbostel)multi-label classification64.49%63.10%63.79% 5Cascade(playmethod)multi-label classification66.67%55.25%60.43% 6Cascade(machine Learned)multi-label classification63.77%69.67%66.59% Classification results of 3-instrument mixtures with different algorithms

He is looking for a particular piece of music Mozart, 40 th Symphony User entering query Yes, but I’m sad today, play the same song but make it sadder. Modified Mozart, 40 th Symphony User is not satisfied and he is entering a new query - Action Rules System

Action Rule Action rule is defined as a term Information System conjunction of fixed condition features shared by both groups proposed changes in values of flexible features desired effect of the action [(ω) ∧ (α → β)] →(ϕ→ψ)

"Action Rules Discovery without pre-existing classification rules", Z.W. Ras, A. Dardzinska, Proceedings of RSCTC 2008 Conference, in Akron, Ohio, LNAI 5306, Springer, 2008, 181-190 http://www.cs.uncc.edu/~ras/Papers/Ras-Aga-AKRON.pdf

Auto indexing system for musical instruments Auto indexing system for musical instruments Auto indexing system for musical instruments Auto indexing system for musical instruments intelligent query answering system for music instruments intelligent query answering system for music instruments intelligent query answering system for music instruments intelligent query answering system for music instruments WWW.MIR.UNCC.EDU

Polyphonic music information retrieval based on multi-label cascade classification system presented by Zbigniew W. Ras University of North Carolina, Charlotte,

Similar presentations

Presentation on theme: "Polyphonic music information retrieval based on multi-label cascade classification system presented by Zbigniew W. Ras University of North Carolina, Charlotte,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Polyphonic music information retrieval based on multi-label cascade classification system presented by Zbigniew W. Ras University of North Carolina, Charlotte,

Similar presentations

Presentation on theme: "Polyphonic music information retrieval based on multi-label cascade classification system presented by Zbigniew W. Ras University of North Carolina, Charlotte,"— Presentation transcript:

Similar presentations

About project

Feedback