Download presentation

Presentation is loading. Please wait.

Published byDiana Phillips Modified over 2 years ago

1
An Auditory Classifier Employing a Wavelet Neural Network Implemented in a Digital Design Thesis Defense Jonathan Hughes Department of Computer Engineering Rochester Institute of Technology Rochester, NY August 11 th, 2006

2
Overview Introduction Wavelets Feature Extraction Neural Networks Wavelet Neural Networks Results Conclusions Future Work Acknowledgements Questions Demonstration

3
Introduction Computer Systems –Can replace or improve upon human operators Auditory Processing –Voice recognition –Speaker recognition –Multimedia indexing –Sonar Analysis

4
Introduction (cont.) Problem –Classify audio samples as either Voice or Music –But, how to classify a time-series? Wavelets –The wavelet transform reveals details of a time series with both time and frequency localization Feature Extraction –Need to extract meaningful features from wavelet coefficients Artificial Neural Networks –Excellent at classification tasks –Can classify the audio samples from the extracted features Wavelet Neural Network

5
Wavelets Signal analysis –Benefit from the Fourier Transform and more recently the Wavelet Transform Fourier Transform (early 1800s) –Superposition of sine and cosine functions –Reveal frequency information of time-series –Not very useful for localized non-periodic signal analysis

6
Wavelets (cont.) Wavelet Transform (early 1900s) –Multi-resolution analysis –Small details are represented, as well as the gross features, and all scales in between –Uses a Mother Wavelet (Analyzing Wavelet) as the prototype function –Approximate the target function Dilation Translation

7
Wavelets (cont.)

8

9
Time-Frequency components are generated by sets of scaling and wavelet functions –Low- and High-Pass filters –Down-sampled high-pass filter outputs become wavelet detail coefficients –Down-sampled low-pass filter outputs are processed by the next level of filters –Final stage low-pass filter output is the wavelet approximation coefficient

10
Wavelets (cont.) Digital Design Implementation –Designed to accept uncompressed digital audio –Sampled at kHz –16 bits per sample –Processed in blocks of 256 samples or approximately 23 milliseconds –Requires log 2 (256) = 8 levels of filters to transform the audio data into wavelet coefficients

11
Wavelets (cont.)

12

13
Feature Extraction Data provided by the wavelet decomposition process –Multi-resolution and time- and frequency-localized –Not of an acceptable form for neural network processing Feature extractor –Generates meaningful features from the wavelet coefficients –Stefan Pittner and Sagar V. Kamarthi

14
Feature Extraction (cont.) Feature extraction process –Requires initial setup steps that involve processing a training set of data –Customized for the data space of interest –Depends on the use of clusters to identify groups of wavelet coefficients Cluster Generation

15
Feature Extraction (cont.) Extracting the features –For each cluster, U, the feature, u, is calculated by taking the square root of the sum of the squares of each coefficient, v, in the cluster –Euclidean norm

16
Feature Extraction (cont.) Digital Design Implementation –Resulted in discovering 34 Clusters –The feature extractor module accepts wavelet coefficients as inputs –Allocates to the cluster processors according to the cluster boundaries

17
Feature Extraction (cont.)

18

19
Neural Networks Neural networks can be applied to solve classification problems by means of a learning process –solution to the classification problem can be found without the need for complex, often slow and inaccurate, algorithms Varieties of different types of neural networks –Multi-layer perceptron, which has its basis in neural biology

20
Neural Networks (cont.) Multi-layer perceptron –Basic building block is the Neuron (perceptron) Synapses are the inputs to the neuron –Each with its own weight in order to adjust the strength of the input Adder component combines the inputs of the neuron, and multiplies each by its respective weight –This weighted sum is called the activation potential – The activation function then applies a squashing function to the activation potential –Limits the permissible amplitude range of the output signal –

21
Neural Networks (cont.) Multi-layer perceptron (cont.) –The number of layers and the number of neurons in each layer determine the number of decision regions that a multi-layer perceptron can define One or more hidden layers (one of which is also known as the input layer) Output layer –First trained with a known training set

22
Neural Networks (cont.) Multi-layer perceptron Training –Apply the known input to the input layer –Forward propagate the results through the other layers Weights remain constant –Results from the output layer are then collected, and compared to the desired response –An error signal is calculated

23
Neural Networks (cont.) Multi-layer perceptron Training (cont.) –Error signal is then back propagated through the neural network, against the direction of the synaptic connections –Weights in the neural network are modified via back-propagation error-correction –Training continues until the weights of the neural network produce outputs that converge

24
Neural Networks (cont.) Digital Design Implementation –2-layer multi-layer perceptron 34 input neurons 2 output neurons –The 34 wavelet-features were fully connected to the 34 input neurons –Which were in turn, fully connected to the output neurons –Each output neuron corresponds to each of the two result classes, voice and music

25
Neural Networks (cont.) The weights in the neural network were designed to be uploaded –Eliminates the need for training hardware in the design –Training was instead performed in a software simulation model Novel algorithm Main neural network module –contains 34 input neuron modules –2 output neuron modules –synchronization hardware –result generator module Generates VOICE, MUSIC, OTHER, and VALID signals Neuron module –Weight registers –Multiplier –Activation functions look-up based comparators

26
Neural Networks (cont.)

27

28
Wavelet Neural Networks Applications of Wavelet Neural Networks are nearly as varied as their possible configurations –Function approximators –Signal classifiers Improve upon Artificial Neural Networks –limited ability to characterize local features of a time series –A wavelet function is used to condition the inputs to the neural network, such that only vital information about the signal is processed by the network

29
Wavelet Neural Networks (cont.) Digital Design Implementation –Discreet Haar wavelet processor –Feature extractor processor –2-layer multi-layer perceptron consisting of 34 input neurons, and 2 output neurons –Synchronization hardware –Mode Selection Upload Weights Classify Input –Imposed limitations to reduce hardware size 4-bit data paths (where possible) 8-bit weight registers (neural network)

30
Wavelet Neural Networks (cont.) Data Flow –256 samples of 16-bit digital audio applied to the wavelet processor Converted to bit wavelet coefficients –256 wavelet coefficients applied to the feature extractor processor Resulting in 34 4-bit wavelet-features –34 wavelet-features applied to the neural network processor Wavelet-features were fully connected to the 34 input neurons Which were in turn, fully connected to the 2 output neurons Each output neuron corresponds to each of the two result classes, voice and music

31
Wavelet Neural Networks (cont.)

32
Results The wavelet neural network was originally constructed as a software model –To experiment with network parameters –To determine neural network weight values –To determine ideal results –To provide a reference to verify correct hardware operation To conserve hardware resources, two additional models were created –8-bit data paths, 8-bit weights (software sim only) –4-bit data paths, 8-bit weights (digital design / sim)

33
Results (cont.) Wavelet Neural Network system was modeled in VHDL Synthesized with Synplicity Synplify –Actel ProASICPlus APA600 synthesized library cells –Target clock frequency of KHz –96,265 cells –Estimated max clock frequency of 15.6 MHz

34
Results (cont.) VoiceMusicOther Voice Training Set98.76%0.00%1.24% Voice Test Set91.93%2.88%5.19% Music Training Set0.39%98.26%1.35% Music Test Set2.49%90.23%7.28% Ideal Model

35
Results (cont.) VoiceMusicOther Voice Training Set 88.59%7.79%3.61% Voice Test Set 86.74%10.95%2.31% Music Training Set 13.44%80.85%5.71% Music Test Set 10.15%86.40%3.45% 8-bit weights 8-bit data

36
Results (cont.) VoiceMusicOther Voice Training Set 80.23%17.68%2.09% Voice Test Set 70.61%28.82%0.58% Music Training Set 20.79%77.47%1.74% Music Test Set 22.61%76.44%0.96% 8-bit weights 4-bit data

37
Conclusions The trained digital design wavelet neural network was effective in correctly classifying the test data sets The novel design of the wavelet transform processor produced an efficient hardware design that was also a high performance pipeline The novel neural network training algorithm was effective in determining weight values that produced excellent classification results

38
Conclusions (cont.) The design of the hardware modules was straightforward to model in VHDL The synthesis was simplistic due to the low clock operating speed The ideal model of the wavelet neural network demonstrates what can be achieved with much larger hardware sizes A hardware implementation offers advantages over a purely software implementation

39
Future Work Implement the digital design in an FPGA with supporting circuitry on a PCB Add training hardware to the design Wavelet Neural Network configuration –Network parameters –Number of layers, number of neurons –Wavelet filter type –Feature extraction method Increase robustness of classifier by training with wider variety of audio samples

40
Future Work (cont.) Future areas of possible research –Speech recognition –Speaker recognition –Content based music genre classification

41
Acknowledgements Dr. Kenneth W. Hsu (Computer Engineering) Dr. Pratapa V. Reddy (Computer Engineering) Dr. Marcin Łukowiak (Computer Engineering)

42
Acknowledgements (cont.) Dr. Roger Gaborski (Computer Science Department) Dr. Albert Titus (University at Buffalo) Anne DiFelice (Computer Engineering) Pam Steinkirchner (Computer Engineering) André Botha (Microsoft Corporation) Paul Brown (Intel Corporation) Stefan Pittner and Sagar V. Kamarthi (Northeastern University) –allowing the use of their research in feature extraction

43
Acknowledgements (cont.) Heather Hughes Mr. John Hughes Mrs. Suellen Hughes Zachary Hughes Wendy Hughes

44
Questions

45
Demonstration

46

47
Demonstration Results VoiceMusicOther Voice Training Set 80.23%17.68%2.09% Voice Test Set 70.61%28.82%0.58% Music Training Set 20.79%77.47%1.74% Music Test Set 22.61%76.44%0.96% 8-bit weights 4-bit data

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google