Popular Music Vocal Analysis

Popular Music Vocal Analysis
游陳叡 Department of Computer Sciecnce and Imformation Engineering, National Chung Cheng University,2018 1.Take the Fourier transform of (a windowed excerpt of) a signal. 2.Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows. 3.Take the logs of the powers at each of the mel frequencies. 4.Take the discrete cosine transform of the list of mel log powers, as if it were a signal. 5.The MFCCs are the amplitudes of the resulting spectrum. Fast Fourier transform(FFT): A fast Fourier transform (FFT) is an algorithm that samples a signal over a period of time (or space) and divides it into its frequency components. These components are single sinusoidal oscillations at distinct frequencies each with their own amplitude and phase.We can used the frequency components to determine whether the music contain vocal or not. Results Using MFCCs as feature labels: We first performed the Decision Tree and Support Vector Machine (SVM) for vocal detection training, using the Meyer Frequency Cepstral Coefficients (MFCCs) to find the characteristics of the sound.As for SVM,we used two different type to analize the data.One is RBF kernel (Guassian Mixture Model,GMM) ,the other is linear kernel. Under the premise of the same song, the training result of decsion tree is about 90% of the correctness rate (Vocal 100%, unVocal 85%), and the SVM with linear kernel is about 100% correct,but the SVM with RBF kernel is about 20% correctness. Under the premise of the same singer, the correct rate of deciosion tree is around 80% (Vocal 80%, unVocal 70%), SVM with linear kernelis about 60% correctness,SVM with RBF kernel is only about 10% correctness.(Table 1) Using FFT as feature labels: As the Fourier Transform (FFT) part.We find the training result of the decsion tree is 100% correctness as the same data,so did the SVM with linear kernel.But SVM with RBF kernel is 0% correctness.Under the premise of the same song, the training result of decsion tree is about 85% of the correctness rate , and the SVM is about 90% correct. Under the premise of the same singer, the correct rate of deciosion tree is around 70% , SVM with linear kernel is about 50% correctness. Introduction Music has become an important part of people's daily life.We hope to separate which contain human voices and which only have music sounds through the analysis of music. We divded the music into several parts and labeled the vocal part and music part by 0 and 1 as machine learning data. And we used Mel frequency cepstral coefficients (MFCCs) and Fourier transform (FFT) as feature.As for machine learing. We used Support Vector Machine (SVM) and Decision Tree (DCT) as the modules. After feeding labeled data,We used unknown music files to do machine learning and checked its accuracy. Conclusions From the experimental results, we can see that the results of MFCCs are better than FFT. We think that MFCCs can determine the vocals more accuracy.Since FFT is only the frequency response. If the frequency of the instrument and the vocals is similar to the vocals, then the results of FFT of the certain frequency will be very high which may make SVM or Decsion Tree determine the wrong label.As for SVM with RBF.We found that wasn't a good module for vocal analysis.Because RBF can determine in N-dimension. But vocal is only in two dimension.So we think it is the main reason why RBF kernel is the worst one. MFCCs as same data Vocal Music SVM with linear kernel 100% Decsion Tree 81.333% SVM with RBF kernel 19.333% 1 MFCCs as same Singer Vocal Music SVM with linear kernel 59.999% 62.333% Decsion Tree 79.5% 71.333% SVM with RBF kernel 19.333% Materials and methods support vector machines(SVM): support vector machines (SVM) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.We categorized into two part,vocal and music. an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier.It is to find a decision boundary that maximizes the margins between the two classes so that they can be perfectly separated. Decision Tree: A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.A decision tree is a flowchart- like structure in which each internal node represents a test on an attribute , each branch represents the outcome of the test, and each leaf node represents a class label.The paths from root to leaf represent classification rules. Mel-frequency cepstrum(MFC): In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip.The difference between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal cepstrum. Step: Referrence [1] T. L. Nwe, Y. Wang, "Automatic detection of vocal segments in popular songs", Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR 2004), pp , 2004. [2] Po-Kai Yang, Chung-Chien Hsu, Jen-Tzung Chien, "Bayesian singing-voice separation", Proc. ISMIR, pp , 2014. [3] P.-S. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, "Singing-Voice Separation From Monaural Recordings Using Deep Recurrent Neural Networks" in International Society for Music Information Retrieval Conference (ISMIR) 2014. [4] Yi Luo, Zhuo Chen, Jonathan Le Roux, John Hershey, Daniel P.W Ellis, ““Deep Clustering For Singing Voice Separation”, MIREX, task ofSinging Voice Separation, 2016. [5] Jeju Machine Learning Camp 2017,” Deep Neural Network for Music Source Separation in Tensorflow” from FFT as same data Vocal Music SVM with linear kernel 100% Decsion Tree SVM with RBF kernel 0% FFT as same singer Vocal Music SVM with linear kernel 50.5% 49.333% Decsion Tree 69.5% 66.333% SVM with RBF kernel 0% Table 1 We can find the the worst result is RBF kernel,and the decision tree is the best result. And we can find that datas analyized by MFCCs Feature is better than FFT.

Popular Music Vocal Analysis

Similar presentations

Presentation on theme: "Popular Music Vocal Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Popular Music Vocal Analysis

Similar presentations

Presentation on theme: "Popular Music Vocal Analysis"— Presentation transcript:

Similar presentations

About project

Feedback