Improving Musical Genre Classification with RBF Networks Douglas Turnbull Department of Computer Science and Engineering University of California, San.

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

For Wednesday Read chapter 19, sections 1-3 No homework.
Support Vector Machines
Support Vector Machines and Margins
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Supervised Learning Recap
Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Sparse vs. Ensemble Approaches to Supervised Learning
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Radial Basis Functions
Speaker Adaptation for Vowel Classification
Deep Belief Networks for Spam Filtering
The Automatic Musicologist Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego UCSD AI Seminar April 12,
Prediction Networks Prediction –Predict f(t) based on values of f(t – 1), f(t – 2),… –Two NN models: feedforward and recurrent A simple example (section.
Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Optimal Adaptation for Statistical Classifiers Xiao Li.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
CS Instance Based Learning1 Instance Based Learning.
Aula 4 Radial Basis Function Networks
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Radial-Basis Function Networks
Radial Basis Function Networks
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
This week: overview on pattern recognition (related to machine learning)
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Chapter 4 Supervised learning: Multilayer Networks II.
Artificial Neural Networks Shreekanth Mandayam Robi Polikar …… …... … net k   
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000.
Non-Bayes classifiers. Linear discriminants, neural networks.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Linear Models for Classification
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Fast Learning in Networks of Locally-Tuned Processing Units John Moody and Christian J. Darken Yale Computer Science Neural Computation 1, (1989)
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Improving Music Genre Classification Using Collaborative Tagging Data Ling Chen, Phillip Wright *, Wolfgang Nejdl Leibniz University Hannover * Georgia.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Semi-Supervised Clustering
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Machine Learning Today: Reading: Maria Florina Balcan
Musical Style Classification
Feature Selection Methods
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Lecture 16. Classification (II): Practical Considerations
Support Vector Machines 2
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Improving Musical Genre Classification with RBF Networks Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego June 4, 2003

motivation: goal: The goal of this project is to improve automatic musical classification by genre. previous work: A method proposed by Tzanetakis and Cook extract high level features from a large database of songs and then use Gaussian Mixture Model (GMM) and K- nearest neighbor (KNN) classifiers to decide the genre of a novel song. idea: Use the existing audio feature extraction technology but improve the classification accuracy using Radial Basis Function (RBF) networks.

motivation: secondary goal: Find techniques for improving RBF network performance. previous work: An RBF network is commonly used classifier in machine learning. We would like to explore ways to improving their ability to classify novel data. ideas: Merge supervised and unsupervised initialization methods for basis functions parameters. Use with feature subset selection methods to eliminate unnecessary features.

audio feature extraction: feature vector: … … music: digital signal: feature extraction: MARSYAS Digital Signal Processing

MARSYAS: x1x1 xDxD xixi Extraction of 30 features from a 30-second audio tracks Timberal Texture (19): music-speech discrimination Rhythmic Content (6): beat strength, amplitude, tempo analysis Pitch Content (5): frequency of dominant chord, pitch intervals For this application, the dimension D of our feature vector is 30.

radial basis functions: Inputs: x1x1 xDxD xixi Basis Functions: Φ ΦjΦj ΦMΦM Φ1Φ1 A radial basis function measure how far an input vector (x) is from a prototype vector (μ). We use Gaussians for our M basis functions. We will see three method for initializing the parameters – (μ, σ).

linear discriminant: Basis Functions: Φ ΦjΦj ΦMΦM Φ1Φ1 Outputs: y ykyk yCyC y1y1 w kj w 11 Weights: W The output vector is a weighted sum of the basis function: We find the optimal set of weights (W) by minimizing the sum of squares error function using a training set of data: Where the target value,, is 1 if the n th data point belongs to the k th class. Otherwise, is 0.

a radial basis function network: Inputs: x x1x1 xDxD xixi Basis Functions: Φ ΦjΦj ΦMΦM Φ1Φ1 Outputs: y ykyk yCyC y1y1 Targets: t t1t1 tktk tCtC w kj w 11 Weights: W

constructing RBF networks: 1.number of basis functions Too few make it hard to separate data Too many can cause over-fitting Depends on initialization method 2.initializing parameters to basis functions - (μ, σ). unsupervised 1. K-means clustering (KM) supervised 2. Maximum Likelihood for Gaussian (MLG) 3. In-class K-means clustering (ICKM) use above methods together 3.improving parameters of the basis functions - (μ, σ). Use gradient descent

gradient descent on μ, σ : We differentiate our error function with respect to σ j and m ji We then update σ j m ji: by moving down the error surface: The learning rate scale factors, η 1 and η 2, decrease each epoch.

constructing RBF networks: 1.number of basis functions 2.initializing parameters to basis functions - (μ, σ). 3.improving parameters of the basis functions - (μ, σ). 4.feature subset selection There exists noisy and/or harmful features that hurt network performance. By isolating and removing these feature, we can find better networks. We also may wish to sacrifice accuracy to create a more robust network requiring less computation during training. Three heuristics for ranking features Wrapper Methods Growing Set (GS) Ranking Two-Tuple (TT) Ranking Filter Method Between-Class Variance (BCV) Ranking

growing set (GS) ranking: A greedy heuristic that adds next best feature to a growing set of features: This method requires the training of |D| 2 /2 RBF network where the first D networks use 1 feature, the next D-1 networks use 2 features, …

two-tuple (TT) ranking: This greedy heuristic that finds the classification accuracy for network that uses every combination of two features. We select that first two feature that produce the best classification result. The next feature is the feature that has the largest minimum accuracy when used with the first two features, and so on. This method also requires the training of |D| 2 /2 RBF network, but all networks are trained using only 2 features.

between-class variance (BCV) ranking: Unlike the previous two method, it does not require training RBF networks. It can be compute in a matter of seconds as opposed to matter of minutes. f bad f good This method that ranks based on the between-class variance. The assumption is that if class averages are far from the average across all of the data for a particular feature, that feature will be useful for separating novel data.

music classification with RBF networks: experimental setup: second songs – 100 song per genre 10 genres - classical, country, disco, hip hop, jazz, rock, blues, reggae, pop, metal 30 feature extracted / song – timbral texture, rhythmic content, pitch content 10-fold cross validation results: a.comparison of initialization method (KM, MLG, ICKM) with and without using gradient descent. b.comparison of feature ranking methods (GS, TT, BCV). c.table of best classification results

basis function initialization methods: MLG does as well as the other method with fewer basis functions

feature ranking methods: Growing Set (GS) ranking outperforms the other methods

results table: observations: 1.Multiple initialization method produces better classification than using only one initialization method. 2.Gradient descent boosts performance. 3.Subsets of feature produce better results than using all of the features.

comparison with previous results: RBF networks: 70.9%* (std 0.063) GMM with 3 Gaussians per class (Tzanetakis & Cook 2001): 61% (std 0.04) Human classification in similar experiment (Tzanetakis & Cook 2001): 70% Support Vector Machine (SVM) (Li & Tzanetakis 2003): 69.1% (std 0.053) Linear Discriminant Analysis (LDA) (Li & Tzanetakis 2003): 71.1% (std 0.073) *(Found construction a network with MLG using 26 features (Experiment J) with gradient descent for 100 epochs)

discussion: 1.created more flexible musical labels In is not our opinion that music classification is limited to ~70% but rather that the data set used is the limiting factor. The next steps are to find a better system for labeling music and then to create data set that uses the new labeling system. This involves working with experts such as musicologists. However, two initial ideas are: 1.Non-mutually exclusive genres 2.A rating system based on the strength of relationship is between a song and each genre These ideas are cognitively plausible in that we naturally classify music into a number of genres, streams, movements and generation that are neither mutually exclusive nor always agreed upon. Both of these ideas can be easily be added handled by RBF network by altering the target vectors.

discussion: 2. larger features sets and feature subset selection Borrowing from computer vision, one technique that has been successful is to automatically extract tens of thousands of features and then use features subset selection for find a small set (~30) of good features. Computer Vision Features: select sub-images of different sizes an locations alter resolution and scale factors. apply filters (e.g. Gabor filters) Computer Audition Analogs: select sound samples of different lengths and starting locations alter pitches and tempos within the frequency domain apply filters (e.g. comb filters) Future work will involve extracting new features and improving existing feature subset selection algorithms.

The End