Presentation is loading. Please wait.

Presentation is loading. Please wait.

Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras http//:www.mir.uncc.edu CCI, UNC-Charlotte.

Similar presentations


Presentation on theme: "Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras http//:www.mir.uncc.edu CCI, UNC-Charlotte."— Presentation transcript:

1

2 Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras www.kdd.uncc.edu http//:www.mir.uncc.edu CCI, UNC-Charlotte Research sponsored by NSF IIS-0414815, IIS-0968647

3 Collaborators: Alicja Wieczorkowska (Polish-Japanese Institute of IT, Warsaw, Poland) Krzysztof Marasek (Polish-Japanese Institute of IT, Warsaw, Poland) PhD students supported by two NSF Grants: Elzbieta Kubera (Maria Curie-Sklodowska University, Lublin, Poland ) Rory Lewis Rory Lewis (University of Colorado at Colorado Springs, USA) Wenxin Jiang (Fred Hutchinson Cancer Research Center in Seattle, USA) Xin Zhang (University of North Carolina, Pembroke, USA) Jacek Grekow Jacek Grekow (Bialystok University of Technology, Poland) Amanda Cohen-Mostafavi (InfoBelt LCC, Charlotte, USA)

4 Outcome: Musical Database indexed by instruments. MIRAI - Musical Database (mostly MUMS) [music pieces played by 57 different music instruments] Goal: Design and Implement a System for Automatic Indexing of Music by Instruments

5 Alto Flute, Bach-trumpet, bass-clarinet, bassoon, bass-trombone, Bb trumpet, b-flat clarinet, cello, cello-bowed, cello-martele, cello-muted, cello-pizzicato, contrabassclarinet, contrabassoon, crotales, c-trumpet, ctrumpet-harmonStemOut, doublebass-bowed, doublebass-martele, doublebass-muted, doublebass-pizzicato, eflatclarinet, electric-bass, electric-guitar, englishhorn, flute, frenchhorn, frenchHorn-muted, glockenspiel, marimba-crescendo, marimba-singlestroke, oboe, piano-9ft, piano-hamburg, piccolo, piccolo-flutter, saxophone-soprano, saxophone-tenor, steeldrums, symphonic, tenor-trombone, tenor-trombone-muted, tuba, tubular-bells, vibraphone-bowed, vibraphone-hardmallet, viola-bowed, viola-martele, viola-muted, viola-natural, viola-pizzicato, violin-artificial, violin-bowed, violin-ensemble, violin-muted, violin-natural-harmonics, xylophone. MIRAI - Musical Database [music pieces played by 57+ different music instruments (see below) and described by over 910 attributes]

6 What is needed & where is the problem? Database of monophonic and polyphonic music signals and their descriptions in terms of the standard MPEG7 features and new features (including temporal). These signals are labeled by instruments forming additional feature called the decision feature. Automatic Indexing of Polyphonic Music Why is needed? To build classifiers for automatic indexing of musical sound by instruments.

7 Automatic Indexing of Music

8 … … … MIRAI - Cooperative Music Information Retrieval System based on Automatic Indexing User … … … Instruments … Query Indexed Audio Database Query Adapter Durations Empty Answer? Music Objects

9 Feature Database traditional pattern recognition Feature Extraction lower level raw data Higher level representations classificationclusteringregression Signal Data Sampling 0.12s frame size 0.04s hop size manageable Feature extractions MATLAB frame

10 MPEG7 features Instantaneous Harmonic Spectral Centroid Instantaneous Harmonic Spectral Deviation Signal Hamming Window STFT Signal envelope Fundamental Frequency Harmonic Peaks Detection Instantaneous Harmonic Spectral Spread Temporal Centroid Power Spectrum Spectral Centroid Log Attack Time Instantaneous Harmonic Spectral Variation Hamming Window STFT NFFT FFT points

11 Derived Database MPEG7 features Non-MPEG7 features & new temporal features Roll-Off Flux Mel frequency cepstral coefficients (MFCC) Tristimulus and similar parameters (contents of odd and even partials- Od, Ev) Mean frequency deviation for low partials Changing ratios of spectrum spread Changing ratios of spectrum centroid Spectrum Centroid Spectrum Spread Spectrum Flatness Spectrum Basic Functions Spectrum Projection Functions Log Attack Time Harmonic Peaks ……………..

12 S’(i) = [S(i+1) – S(i)]/S(i) ; C’(i) = [C(i+1) – C(i)]/C(i) where S(i+1), S(i) and C(i+1), C(i) are the spectrum spread and spectrum centroid of two consecutive frames: frame i+1 and frame i. The changing ratios of spectrum spread and spectrum centroid for two consecutive frames are considered as the first derivatives of the spread and spectrum centroid. Following the same method we calculate the second derivatives: S’’(i) = [S’(i+1) – S’(i)]/S’(i) ; C’’(i) = [C’(i+1) – C’(i)]/C’(i) New Temporal Feat ures – S’(i), C’(i), S’’(i), C’’(i) Remark: Sequence [S(i), S(i+1), S(i+2),….., S(i+k)] can be approximated by polynomial p(x)=a 0 +a 1 *x+a 2 *x 2 + a 3 *x 3 + ……… ; new features: a 0, a 1, a 2, a 3, ……

13 Experiment Features Classifier Confidence 1 S, C Decision Tree 80.47% 2 S, C, S’, C’ Decision Tree 83.68% 3 S, C, S’, C’, S’’, C’’ Decision Tree 84.76% 4 S,C KNN 80.31% 5 S, C, S’, C’ KNN 84.07% 6 S, C, S’, C’, S’’, C’’ KNN 85.51% Classification confidence with temporal features Experiment with WEKA: 19 instruments [flute, piano, violin, saxophone, vibraphone, trumpet, marimba, french-horn, viola, basson, clarinet, cello, trombone, accordian, guitar, tuba, english-horn, oboe, double-bass], J48 with 0.25 confidence factor for pruning tree, minimum number of instances per leaf – 10; KNN – number of neighbors – 3 Euclidean distance is used as similarity function.

14 Confusion matrices: left is from Experiment 1, right is from Experiment 3. The correctly classified instances are highlighted in green and the incorrectly classified instances are highlighted in yellow

15 Precision of the decision tree for each instrument Recall of the decision tree for each instrument F-score of the decision tree for each instrument

16 . Polyphonic Sound Polyphonic Sound segmentationsegmentation Feature extraction Classifier Get Instrument Sound separation Polyphonic sounds – how to handle? 1.Single-label classification Based on Sound Separation 2.Multi-labeled classifiers 3.Training classifiers on polyphonic sounds ? Get frame Problems ? subtraction Information loss during the signal subtraction Sound Separation Flowchart

17 Features Extraction N Classifiers instrumentconfidence Candidate 1 70% Candidate 2 50%...... Candidate N 10% Multi-label classifier [collection of N classifiers]instrumentconfidence Candidate 1 70% Candidate 2 50%...... Candidate N 10%instrumentconfidence Candidate 1 70% Candidate 2 50%...... Candidate N 10% 1 second window window segmentation frame – 0.12s 22 – frames with 0.04s hop size Get frame N – number of instruments 85% 80% 70% 55% 45% 16% 12% ……

18 Schema I - Hornbostel Sachs AerophoneChordophoneMembranophoneIdiophone FreeSingle ReedSideLip Vibration Whip Alto Flute FluteC Trumpet French Horn Tuba Oboe Bassoon

19 Schema II - Play Methods MutedPizzicatoBowedPicked PiccoloFluteBassoonAlto Flute ShakenBlow ……

20 Instrument granularity classifiers which are trained at each level of the hierarchical tree Hornbostel/Sachs We do not include membranophones because instruments in this family usually do not produce harmonic sound so that they need special techniques to be identified

21 Modules of cascade classifier for single instrument estimation --- Hornboch /Sachs Pitch 3B 91.80% 96.02% 98.94% = 95.00% * >

22 HIERARCHICAL STRUCTURE BUILT BY CLUSTERING ANALYSIS Seven common method to calculate the distance or similarity between clusters: single linkage (nearest neighbor), complete linkage (furthest neighbor), unweighted pair-group method using arithmetic averages (UPGMA), weighted pair-group method using arithmetic averages (WPGMA), unweighted pair-group method using the centroid average (UPGMC), weighted pair-group method using the centroid average (WPGMC), Ward's method. Six most common distance functions: Euclidean, Manhattan, Canberra (examines the sum of series of a fraction differences between coordinates of a pair of objects), Pearson correlation coefficient (PCC) – measures the degree of association between objects, Spearman's rank correlation coefficient, Kendal (counts the number of pairwise disagreements between two lists) Clustering algorithm – HCLUST (Agglomerative hierarchical clustering) – R Package

23 Clustering result from Hclust algorithm with Ward linkage method and Pearson distance measure; Flatness coefficients are used as the selected feature “ctrumpet” and “batchtrumpet” are clustered in the same group. “ctrumpet_harmonStemOut” is clustered in one single group instead of merging with “ctrumpet”. Bassoon is considered as the sibling of the regular French horn. “French horn muted” is clustered in another different group together with “English Horn” and “Oboe”.

24 Exp# Classifier Method Recall PrecisionF-Score 1Non-Cascade Single-label based on sound separation 31.48%43.06%36.37% 2Non_Cascademulti-label classification85.51%55.04%66.97% 3Cascade (Hornbostel)multi-label classification64.49%63.10%63.79% 4Cascade (Playmethod)multi-label classification66.67%55.25%60.43% 5Cascade (Machine Learned)multi-label classification63.77%69.67%66.59% Looking for optimal [classification method  data representation] in polyphonic music Testing Data: 49 polyphonic sounds are created by selecting three different single instrument sounds from the training database and mixing them together. KNN (k=3) is used as the classifier for each experiment.

25 Auto indexing system for musical instruments Auto indexing system for musical instruments Auto indexing system for musical instruments Auto indexing system for musical instruments intelligent query answering system for music instruments intelligent query answering system for music instruments intelligent query answering system for music instruments intelligent query answering system for music instruments WWW.MIR.UNCC.EDU

26

27 He is looking for a particular piece of music Mozart, 40 th Symphony User entering query Yes, but I’m sad today, play the same song but make it sadder. Modified Mozart, 40 th Symphony User is not satisfied and he is entering a new query - Action Rules System

28 Action Rule Action rule is defined as a term Information System conjunction of fixed condition features shared by both groups proposed changes in values of flexible features desired effect of the action [(ω) ∧ (α → β)] →(ϕ→ψ)

29 Action Rules Discovery Meta-actions based decision system S(d)=(X,A  {d}, V ), with A= {A 1,A 2,…,A m } A1A1 A2A2 A3A3 A4A4 …..AmAm M1M1 E 11 E 12 E 13 E 14 E 1m M2M2 E 21 E 22 E 23 E 24 E 2m M3M3 E 31 E 32 E 33 E 34 E 3m M4M4 E 41 E 42 E 43 E 44 E 4m ….. MnMn E m1 E m2 E m3 E m4 E mn Influence Matrix r = [(A 1, a 1  a 1 ’)  (A 2, a 2  a 2 ’)  (A 4, a 4  a 4 ’)])  (d, d 1  d 1 ’) Candidate action rule - if E 32 = [a 2  a 2 ’], then E 31 = [a 1  a 1 ’], E 34 = [a 4  a 4 ’] Rule r is supported & covered by M 3

30 "Action Rules Discovery without pre-existing classification rules", Z.W. Ras, A. Dardzinska, Proceedings of RSCTC 2008 Conference, in Akron, Ohio, LNAI 5306, Springer, 2008, 181-190 http://www.cs.uncc.edu/~ras/Papers/Ras-Aga-AKRON.pdf

31 Since the window diminishes the signal on both edges, it leads to information loss due to the narrowing of frequency spectrum. In order to preserve this information, those consecutive analysis frames have overlap in time. The empirical experiments show the best overlap is two third of window size Time ABAAAA

32 Windowing Hamming window spectral leakage


Download ppt "Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras http//:www.mir.uncc.edu CCI, UNC-Charlotte."

Similar presentations


Ads by Google