Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.

Slides:



Advertisements
Similar presentations
DCSP-12 Jianfeng Feng
Advertisements

Regional Processing Convolutional filters. Smoothing  Convolution can be used to achieve a variety of effects depending on the kernel.  Smoothing, or.
ACHIZITIA IN TIMP REAL A SEMNALELOR. Three frames of a sampled time domain signal. The Fast Fourier Transform (FFT) is the heart of the real-time spectrum.
Computational Rhythm and Beat Analysis Nick Berkner.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Han Q Le© ECE 3336 Introduction to Circuits & Electronics Lecture Set #10 Signal Analysis & Processing – Frequency Response & Filters Dr. Han Le ECE Dept.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
TRANSMISSION FUNDAMENTALS Review
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Unit 9 IIR Filter Design 1. Introduction The ideal filter Constant gain of at least unity in the pass band Constant gain of zero in the stop band The.
1 Machine learning for note onset detection. Alexandre Lacoste & Douglas Eck.
Techniques in Signal and Data Processing CSC 508 Fourier Analysis.
Pitch Recognition with Wavelets Final Presentation by Stephen Geiger.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
6/9/2015Digital Image Processing1. 2 Example Histogram.
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Invariance and context Nothing in the real world is interpreted on its own. –Low-level vision Completely different energy spectra are interpreted as the.
Fall 2004EE 3563 Digital Systems Design Audio Basics  Analog to Digital Conversion  Sampling Rate  Quantization  Aliasing  Digital to Analog Conversion.
Spatial-based Enhancements Lecture 3 prepared by R. Lathrop 10/99 updated 10/03 ERDAS Field Guide 6th Ed. Ch 5: ;
Lecture 1 Signals in the Time and Frequency Domains
Instrument Recognition in Polyphonic Music Jana Eggink Supervisor: Guy J. Brown University of Sheffield
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
The Wavelet Tutorial: Part3 The Discrete Wavelet Transform
Presented by Tienwei Tsai July, 2005
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Part 1: Basic Principle of Measurements
Chapter 9 – Classification and Regression Trees
Harvestworks Part 3 : Audio analysis & machine learning Rebecca Fiebrink Princeton University 1.
Jacob Zurasky ECE5526 – Spring 2011
Image Classification 영상분류
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Basics of Neural Networks Neural Network Topologies.
Similarity Matrix Processing for Music Structure Analysis Yu Shiu, Hong Jeng C.-C. Jay Kuo ACM Multimedia 2006.
Chapter 6 Spectrum Estimation § 6.1 Time and Frequency Domain Analysis § 6.2 Fourier Transform in Discrete Form § 6.3 Spectrum Estimator § 6.4 Practical.
Audio Tempo Extraction Presenter: Simon de Leon Date: February 9, 2006 Course: MUMT611.
Music Genre Classification Alex Stabile. Example File
Music Classification Using Neural Networks Craig Dennis ECE 539.
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Acoustic Phonetics 3/14/00.
Feel the beat: using cross-modal rhythm to integrate perception of objects, others, and self Paul Fitzpatrick and Artur M. Arsenio CSAIL, MIT.
Design of a Guitar Tab Player in MATLAB Summary Lecture Module 1: Modeling a Guitar Signal.
Eeng360 1 Chapter 2 Linear Systems Topics:  Review of Linear Systems Linear Time-Invariant Systems Impulse Response Transfer Functions Distortionless.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.
Chapter 2 Ideal Sampling and Nyquist Theorem
Segmenting Popular Music Sentence by Sentence Wan-chi Lee.
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
[1] National Institute of Science & Technology Technical Seminar Presentation 2004 Suresh Chandra Martha National Institute of Science & Technology Audio.
ARTIFICIAL NEURAL NETWORKS
Session 7: Face Detection (cont.)
Advanced Wireless Networks
Brian Whitman Paris Smaragdis MIT Media Lab
Urban Sound Classification with a Convolution Neural Network
General Functions A non-periodic function can be represented as a sum of sin’s and cos’s of (possibly) all frequencies: F() is the spectrum of the function.
Edge Detection The purpose of Edge Detection is to find jumps in the brightness function (of an image) and mark them.
Outline Linear Shift-invariant system Linear filters
Signals and Systems Networks and Communication Department Chapter (1)
Creating Data Representations
EE513 Audio Signals and Systems
8.5 Modulation of Signals basic idea and goals
9.4 Enhancing the SNR of Digitized Signals
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems

Progress Presentation by: Asma Amir Nazia Zaman

Detailed System Architecture

Sampling and DSP functions A.wav file is broken into twenty.5 second windows. The DSP functions are called for each of the twenty windows. Each one of these twenty windows is analyzed by seven DSP functions: Bandwidth Power Spectral Density Total Power (L-2 norm / L-infinity norm) Spectrogram Smoothness High Pass Filter Beat Detection Frequency Cutoff

The DSP vector The values returned from each of these functions is averaged over all twenty windows to give an average value for each song as well as a standard deviation, which tells us how these qualities change over time. That way, our DSP vector has some measure of how each of the functions changed with time. First, the neural network should be trained with around 120 songs, 20 of each genre. After we train the neural network, we give it songs it has never seen, and the output of the system is the classification of genre that the neural network determines.

Bandwidth After taking the shifted FFT of windows of the music vector, we find the last frequency component above a certain cutoff threshold, which is the bandwidth of the signal. Because classical music is composed of harmonic instruments, its bandwidth will be smaller and it will have fewer frequency components. However, hard music like punk or rap has lots of non-sinusoidal drumbeats, which will create more frequency components and their bandwidth will be larger

Bandwidth

Frequency Cut-off A more telling relationship is isolated when the Standard Deviations of frequency outputs are analyzed. It becomes difficult to isolate any one genre but it does separate them into two main categories: Classical, Punk and Country Techno, Jazz, and Rap Group one consists of the genres who retained only coefficients above the thresh hold, while the genres of group two consistently preserved at least 90 coefficients per sample. This wide gap between them should paint a fairly clear picture of the differences between genres with respect to their cutoff frequencies

Frequency Cut-off A more telling relationship is isolated when the Standard Deviations of these outputs are analyzed. It becomes difficult to isolate any one genre but it does separate them into two main categories: Classical, Punk and Country Techno, Jazz, and Rap Group one consists of the genres who retained only coefficients above the thresh hold, while the genres of group two consistently preserved at least 90 coefficients per sample. This wide gap between them should paint a fairly clear picture of the differences between genres with respect to their cutoff frequencies

Beat detection Beat detection emphasizes the sudden impulses of sound in the song. It convolves a signal with itself and finds frequency peaks. Then it measures the distance between these frequency peaks. This is done by breaking the signal into frequency bands, extracting the envelope of these frequency-banded signals, differentiating them to emphasize sudden changes in sound, and running the signals through a filter to choose the highest energy result as the tempo. The filter can only separate rap from all other genres effectively, because it has the steadiest backbeat, consistent across the genre! Classical and jazz have too much variability, which makes sense, considering that each piece is often long and divided into sections.

Beat detection

High Pass Filter When a high pass filter is used, as expected classical would have the smallest error of any genre tested since it uses the lower frequency part of the spectrum. Conversely, punk and jazz have the highest amount of error, which is a good indication of higher frequencies being utilized. Somewhere between these two extremes are techno, rap and country. The filter has an especially tough time telling the difference between the latter two. Overall, while the filter cannot explicitly identify the different genres, it does give the user a starting point to isolate between two main groups.

High Pass Filter

Power Spectral Density The time-domain signal is broken into windows and the norm squared of the FFT of each window is computed. The magnitude squared of the FFT coefficients of each window are averaged and represented in decibels. We then have a vector that represents the power in the frequency domain. This is a measure of exactly what frequencies are present and at what magnitude.

Power Spectral Density

Total Power The power in a signal is the norm squared of the frequency components of the signal. It measures how many harmonics are present in the signal and how much of each harmonic. In our case, the music samples will have a wide range of total power: classical piano has low power with few harmonics, whereas punk has high power

Total Power

Use of the Extracted Features A vector is made out of these features. The value along each dimension of the vector space is computed as an output from a pattern classifier which is trained to measure a particular feature. For this an M Neural Network can be trained to recognize membership in the M feature classes

Once Neural Network has recognized the membership it establishes a resultant vector, based on which classification is made. The classification is recognized as shifted delta functions. e.g country music is denoted as [ ], jazz is denoted as [ ], etc.

Further Enhancement… Development towards final algorithm. Incorporation of GAs and Fuzzy System if possible.