Presentation is loading. Please wait.

Presentation is loading. Please wait.

Term Project Fall 2004 Emotion Detection in Music

Similar presentations

Presentation on theme: "Term Project Fall 2004 Emotion Detection in Music"— Presentation transcript:

1 11-751 Term Project Fall 2004 Emotion Detection in Music
Vitor R. Carvalho & Chih-yu Chao

2 Problem Tackled Using machine learning techniques to automatically detect emotion in music Define a good set of emotion categories Select the feature set Classification problem 2

3 Related Work Feng & Zhuang – ISMIR 2003 and IEEE/WIC-03
Music Information Retrieval Conferences (ISMIR) - Li & Mitsunori - ISMIR 2003 Liu, Lu & Zhang - ISMIR 2003 Feng & Zhuang – ISMIR 2003 and IEEE/WIC-03 3

4 Taxonomy of emotion classification
5-point Likert scale 1 stands for very sad 2 stands for sad 3 neutral, not happy and not sad 4 stands for happy 5 stands for very happy Easy, simple, and many practical applications (search, personalization, etc) 4

5 Data and Labeling Process
Music dataset 201 popular songs from Brazil, Taiwan, Japan, Africa, and the United States Two people manually labeling the data Human voice expresses emotion, but the lyrics were not considered (no semantics) One emotion per song (no segmentation) Inter-annotator agreement 5

6 Songs Authors List Aerosmith, African, Agalloch, Alanis Morissette, A-mei, Anathema, Angelique Kidjo, Beth Carvalho, Billy Gilman, Blossom Dearie, Bluem of Youth, Boyz II Men, Caetano Veloso, Cai Chun Jia, Cesaria Evora, Chen Guan Qian, Chen Yi Xun, Chico Buarque, Ciacia, Comadre Florzinha, Dave Matthews Band, David Huang, Djavan, Dog’s Eye View, Dream Theater, Dreams Come True, D’sound, Ed Motta, Edu Lobo, Elegy, For Real, Gal Costa, George Michael, Gilberto Gil, Goo Goo Dolls, Green Carnation, Hanson, Ian Moore, Ivan Lins, Jackopierce, Jamie Cullum, Jason Maraz, Jeff Buckley, Jiang Mei Qi, João Donato, John Mayer, John Pizzarelli, JS, Landy Wen, Lisa, Lisa Ono, Lizz Wright, Luna Sea, Maria Bethania, Marisa Monte, Matchbox 20, Matsu Takako, Mexericos, Misia, Natalie Imbruglia, Nina Simone, Nine Inch Nails, Nirvana, Norah Jones, Sticky Rice, Olodum, Opeth, Pink Floyd, Porcupine Tree, Radiohead, REM, Rick Price, Rosa Passos, Salif Keita, Sarah McLachlan, Shawn Colvin, Shawn Stockman, Shino, The Smiths, Staind, Sting, Yanzi, Tanya Chua, Terry Lin, The Badlees, Timbalada, Tom Jobim & , Elis Regina, Toni Braxton, Train, Tribalistas, Tyrese, Faye Wang, Xiao Yuan You Hui, Yo-yo Ma & Rosa Passos, Zeca Baleiro, Zelia Duncan 6

7 Inter-annotator agreement
Pearson's correlation (r) -1 (total disagreement) to +1 (total agreement) r=0.643 Both average ratings are 3.23 (3: neutral) “happier” bias 7

8 Feature Extraction Attempts
Tool for extracting useful features from music data? ESPS  - speech only, not music Praat  - speech only MARSYAS good features, but not stable MARSYAS-0.2 !!! 8

9 Feature Sets in Marsyas
MARSYAS: written mostly by George Tzanetakis ( ) In Marsyas-0.2, there are 4 sets of features: STFT-based, centroid, rolloff, flux, zeroCrossing, etc Spectral Flatness Measure (SFM) features Spectral Crest Factor (SCF) features Mel-Frequency Cepstral Coefficients (MFCC) At every 20ms, all features are calculated. The final features are their means and standard deviations, obtained over a window of 1 second, or 50 time-frames. 9

10 Final Feature Representation
EleanorRigby.wav sad f1=0.2, f2=…, f3= …, … EleanorRigby.wav sad f1=0.24, f2=…, f3=…, … EleanorRigby.wav sad f1=0.79, f2=…, f3=…, … * girlFromIpanema.wav happy f1=0.21, f2=…, f3= …, … girlFromIpanema.wav happy f1=0.64, f2=…, f3=…, … girlFromIpanema.wav happy f1=0.99, f2=…, f3=…, … girlFromIpanema.wav happy f1=0.49, f2=…, f3=…, … girlFromIpanema.wav happy f1=0.93, f2=…, f3=…, … NeMeQuittePas.wav verySad f1=0.82, f2=…, f3= …, … NeMeQuittePas.wav verySad f1=0.14, f2=…, f3=…, … NeMeQuittePas.wav verySad f1=0.999, f2=…, f3=…, … 5 seconds 10

11 Still on the Final Representation
The entire collection had to be turned into the WAV format, with the following specifications: Hz PCM sampling, 16-bit, mono. Final feature files were huge, reaching 81 MB of text only (52000 lines) 11

12 Experiments: 2 Types: 5-fold (or 2-fold) cross-validation
Binary Classification: Happy versus Sad Multi-Class problem: 5-label classification 5-fold (or 2-fold) cross-validation Majority vote to decide the final label Minorthird classification package – CMU ( ) 12

13 Results: Happy versus Sad
The StackLearner makes the final decision in two steps. In the second step, the examples are augmented with decisions of the previous step classifier. 13

14 Results: Happy versus Sad
What’s the most informative feature set? (Decision Tree Classifier, 5-fold cross-validation) 14

15 Results: 5-label classification

16 Results: 5-label classification
What’s the most informative feature set? (Maximum Entropy Classifier, 2-fold CV) 16

17 Confusion Matrix 17

18 Lessons Learned There are many sw packages for voice processing, but only a few for music processing. Using Marsyas was more complicated than expected (poor documentation, limited number of input formats, etc). 18

19 Conclusion New taxonomy for music classification
Labeled more than 200 songs Reasonable/Good inter-annotator agreement Using features from every second of song Classification results (accuracy): Over 86% in a Happy versus Sad experiments Over 36% in 5-label classification experiments 19

20 Future Work Improve feature sets Song segmentation
Melody Rhythm Chord Key Song segmentation “No data is like more data” More careful choice of classifier Better error measure 20

21 Questions? 21

Download ppt "Term Project Fall 2004 Emotion Detection in Music"

Similar presentations

Ads by Google