Improving Musical Genre Classification with RBF Networks Douglas Turnbull Department of Computer Science and Engineering University of California, San.

Improving Musical Genre Classification with RBF Networks Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego June 4, 2003

motivation: goal: The goal of this project is to improve automatic musical classification by genre. previous work: A method proposed by Tzanetakis and Cook extract high level features from a large database of songs and then use Gaussian Mixture Model (GMM) and K- nearest neighbor (KNN) classifiers to decide the genre of a novel song. idea: Use the existing audio feature extraction technology but improve the classification accuracy using Radial Basis Function (RBF) networks.

motivation: secondary goal: Find techniques for improving RBF network performance. previous work: An RBF network is commonly used classifier in machine learning. We would like to explore ways to improving their ability to classify novel data. ideas: Merge supervised and unsupervised initialization methods for basis functions parameters. Use with feature subset selection methods to eliminate unnecessary features.

audio feature extraction: feature vector: …1001011001… music: digital signal: feature extraction: MARSYAS Digital Signal Processing

MARSYAS: x1x1 xDxD xixi Extraction of 30 features from a 30-second audio tracks Timberal Texture (19): music-speech discrimination Rhythmic Content (6): beat strength, amplitude, tempo analysis Pitch Content (5): frequency of dominant chord, pitch intervals For this application, the dimension D of our feature vector is 30.

radial basis functions: Inputs: x1x1 xDxD xixi Basis Functions: Φ ΦjΦj ΦMΦM Φ1Φ1 A radial basis function measure how far an input vector (x) is from a prototype vector (μ). We use Gaussians for our M basis functions. We will see three method for initializing the parameters – (μ, σ).

linear discriminant: Basis Functions: Φ ΦjΦj ΦMΦM Φ1Φ1 Outputs: y ykyk yCyC y1y1 w kj w 11 Weights: W The output vector is a weighted sum of the basis function: We find the optimal set of weights (W) by minimizing the sum of squares error function using a training set of data: Where the target value,, is 1 if the n th data point belongs to the k th class. Otherwise, is 0.

a radial basis function network: Inputs: x x1x1 xDxD xixi Basis Functions: Φ ΦjΦj ΦMΦM Φ1Φ1 Outputs: y ykyk yCyC y1y1 Targets: t t1t1 tktk tCtC w kj w 11 Weights: W

constructing RBF networks: 1.number of basis functions Too few make it hard to separate data Too many can cause over-fitting Depends on initialization method 2.initializing parameters to basis functions - (μ, σ). unsupervised 1. K-means clustering (KM) supervised 2. Maximum Likelihood for Gaussian (MLG) 3. In-class K-means clustering (ICKM) use above methods together 3.improving parameters of the basis functions - (μ, σ). Use gradient descent

gradient descent on μ, σ : We differentiate our error function with respect to σ j and m ji We then update σ j m ji: by moving down the error surface: The learning rate scale factors, η 1 and η 2, decrease each epoch.

constructing RBF networks: 1.number of basis functions 2.initializing parameters to basis functions - (μ, σ). 3.improving parameters of the basis functions - (μ, σ). 4.feature subset selection There exists noisy and/or harmful features that hurt network performance. By isolating and removing these feature, we can find better networks. We also may wish to sacrifice accuracy to create a more robust network requiring less computation during training. Three heuristics for ranking features Wrapper Methods Growing Set (GS) Ranking Two-Tuple (TT) Ranking Filter Method Between-Class Variance (BCV) Ranking

growing set (GS) ranking: A greedy heuristic that adds next best feature to a growing set of features: This method requires the training of |D| 2 /2 RBF network where the first D networks use 1 feature, the next D-1 networks use 2 features, …

two-tuple (TT) ranking: This greedy heuristic that finds the classification accuracy for network that uses every combination of two features. We select that first two feature that produce the best classification result. The next feature is the feature that has the largest minimum accuracy when used with the first two features, and so on. This method also requires the training of |D| 2 /2 RBF network, but all networks are trained using only 2 features.

between-class variance (BCV) ranking: Unlike the previous two method, it does not require training RBF networks. It can be compute in a matter of seconds as opposed to matter of minutes. f bad f good This method that ranks based on the between-class variance. The assumption is that if class averages are far from the average across all of the data for a particular feature, that feature will be useful for separating novel data.

music classification with RBF networks: experimental setup: 1000 30-second songs – 100 song per genre 10 genres - classical, country, disco, hip hop, jazz, rock, blues, reggae, pop, metal 30 feature extracted / song – timbral texture, rhythmic content, pitch content 10-fold cross validation results: a.comparison of initialization method (KM, MLG, ICKM) with and without using gradient descent. b.comparison of feature ranking methods (GS, TT, BCV). c.table of best classification results

basis function initialization methods: MLG does as well as the other method with fewer basis functions

feature ranking methods: Growing Set (GS) ranking outperforms the other methods

results table: observations: 1.Multiple initialization method produces better classification than using only one initialization method. 2.Gradient descent boosts performance. 3.Subsets of feature produce better results than using all of the features.

comparison with previous results: RBF networks: 70.9%* (std 0.063) GMM with 3 Gaussians per class (Tzanetakis & Cook 2001): 61% (std 0.04) Human classification in similar experiment (Tzanetakis & Cook 2001): 70% Support Vector Machine (SVM) (Li & Tzanetakis 2003): 69.1% (std 0.053) Linear Discriminant Analysis (LDA) (Li & Tzanetakis 2003): 71.1% (std 0.073) *(Found construction a network with MLG using 26 features (Experiment J) with gradient descent for 100 epochs)

discussion: 1.created more flexible musical labels In is not our opinion that music classification is limited to ~70% but rather that the data set used is the limiting factor. The next steps are to find a better system for labeling music and then to create data set that uses the new labeling system. This involves working with experts such as musicologists. However, two initial ideas are: 1.Non-mutually exclusive genres 2.A rating system based on the strength of relationship is between a song and each genre These ideas are cognitively plausible in that we naturally classify music into a number of genres, streams, movements and generation that are neither mutually exclusive nor always agreed upon. Both of these ideas can be easily be added handled by RBF network by altering the target vectors.

discussion: 2. larger features sets and feature subset selection Borrowing from computer vision, one technique that has been successful is to automatically extract tens of thousands of features and then use features subset selection for find a small set (~30) of good features. Computer Vision Features: select sub-images of different sizes an locations alter resolution and scale factors. apply filters (e.g. Gabor filters) Computer Audition Analogs: select sound samples of different lengths and starting locations alter pitches and tempos within the frequency domain apply filters (e.g. comb filters) Future work will involve extracting new features and improving existing feature subset selection algorithms.

The End

Improving Musical Genre Classification with RBF Networks Douglas Turnbull Department of Computer Science and Engineering University of California, San.

Similar presentations

Presentation on theme: "Improving Musical Genre Classification with RBF Networks Douglas Turnbull Department of Computer Science and Engineering University of California, San."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Improving Musical Genre Classification with RBF Networks Douglas Turnbull Department of Computer Science and Engineering University of California, San.

Similar presentations

Presentation on theme: "Improving Musical Genre Classification with RBF Networks Douglas Turnbull Department of Computer Science and Engineering University of California, San."— Presentation transcript:

Similar presentations

About project

Feedback