The Beatbox Voice-to-Drum Synthesizer A BSTRACT The Beatbox is a real time voice-to-drum synthesizer intended primarily for the entertainment of small.

Slides:

Advertisements

Similar presentations

Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:

Advertisements

Building an ASR using HTK CS4706

CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.

KARAOKE FORMATION Pratik Bhanawat (10bec113) Gunjan Gupta Gunjan Gupta (10bec112)

An Approach to ECG Delineation using Wavelet Analysis and Hidden Markov Models Maarten Vaessen (FdAW/Master Operations Research) Iwan de Jong (IDEE/MI)

Hyeonsoo, Kang. ▫ Structure of the algorithm ▫ Introduction 1.Model learning algorithm 2.[Review HMM] 3.Feature selection algorithm ▫ Results.

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson

Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.

Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.

Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.

Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.

2004/11/161 A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition LAWRENCE R. RABINER, FELLOW, IEEE Presented by: Chi-Chun.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Natural Language Processing - Speech Processing -

Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

COMP 4060 Natural Language Processing Speech Processing.

Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.

A PRESENTATION BY SHAMALEE DESHPANDE

Representing Acoustic Information

Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.

Signal ProcessingES & BM MUET1 Lecture 2. Signal ProcessingES & BM MUET2 This lecture Concept of Signal Processing Introduction to Signals Classification.

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

Isolated-Word Speech Recognition Using Hidden Markov Models

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,

7-Speech Recognition Speech Recognition Concepts

1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.

EE302 Lesson 19: Digital Communications Techniques 3.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.

Real-Time D igital S ignal P rocessing Lab Development using Matlab, Simulink, and the TI TMS320C6711 DSK David Rolando & Jonathan Kellerman Graduate Advisor:

Jacob Zurasky ECE5526 – Spring 2011

Csc Lecture 7 Recognizing speech. Geoffrey Hinton.

Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.

Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.

CSCI-100 Introduction to Computing Hardware Part II.

Performance Comparison of Speaker and Emotion Recognition

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.

By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.

ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.

Sound Controlled Smoke Detector Group 67 Meng Gao, Yihao Zhang, Xinrui Zhu 1.

بسم الله الرحمن الرحيم Lecture (1) Introduction to DSP Dr. Iman Abuel Maaly University of Khartoum Department of Electrical and Electronic Engineering.

Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.

Spectrum Analysis and Processing

ARTIFICIAL NEURAL NETWORKS

Artificial Intelligence for Speech Recognition

Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.

Wavelets : Introduction and Examples

Statistical Models for Automatic Speech Recognition

Statistical Models for Automatic Speech Recognition

ECE Computer Engineering Design Project

NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &

Music Signal Processing

Presentation transcript:

The Beatbox Voice-to-Drum Synthesizer A BSTRACT The Beatbox is a real time voice-to-drum synthesizer intended primarily for the entertainment of small children. It accepts speech input limited to a small dictionary of sounds the system is pre-trained to recognize. Each sound in the dictionary has a pre-determined corresponding drumbeat, which is then played back to the user. In this manner, someone without knowledge of the drums can effectively “play” the instrument with his/her mouth. Speech recognition is a key tool in the design of the next-generation user- friendly computer application. A major obstacle remaining in the way of this goal is the detection of stop consonants, sounds created by stopping the flow of air in the mouth and letting it go into a burst (ex. b, d, g, k, p & t). Telling stop consonants apart is a difficult problem due to their similarity. Building a system to distinguish stop consonants may help bring speech recognition one step closer to reality. M OTIVATION Digital Sampling by sound card Pattern Recognition using Hidden Markov Models Audio Playback and Visual Feedback Digital Signal Processing Voice Input S YSTEM F LOW D IAGRAM D IGITAL S IGNAL P ROCESSING U NIT Fast Fourier Transform Logarithmic Compression Cepstrum Critical Band Integration Windowing Input Signal When a user’s voice input triggers the underlying engine, it is converted into a digital signal and passed onto the DSP unit. The signal is divided into tiny windows ~25 ms long and multiplied by the Hanning window, before its FFT is taken. Our system is modeled as a filter bank like the human ear, allowing for compression of information contained in the signals. Further redundancies are eliminated through Cepstral analysis before handing over the processed signal to the Pattern Recognition subsystem. P ATTERN R ECOGNITION U NIT Estimate parameters of /k/ HMM Viterbi algorithm infers N state sequences N samples of /k/ received from DSP unitConvergence Iterate till convergence Training: The recognition/ classification system is based on the theory of Hidden Markov Models (HMMs). Given the observation sequence, we infer the most likely underlying “hidden state” sequence using the Viterbi algorithm. We then iteratively estimate the parameters of the HMM till convergence is achieved. Testing: Each dictionary sound has a pre-trained HMM corresponding to it. The signal is passed through each HMM; the input is classified in favor of the HMM with the highest estimated likelihood. AUTHORS: PRIYADARSHINI ROUTH ARYEH LEVINE ARYEH LEVINE RAPHAEL LEVY RAPHAEL LEVY ADVISOR: PROF. LAWRENCE SAUL The Beatbox system is comprised of three main components:  The DSP unit accepts voice input, then cleans and analyzes the incoming signal  The Pattern Recognition subsystem uses frequency characteristics to probabilistically determine the most likely match for the input data  The Demonstration system is a GUI that controls the audio and visual feedback given to the user D EMONSTRATION 90% accuracy on dictionary sounds if trained on the same user. 80% accuracy if pre-existing training set is used. D SP F LOWCHART H OW T HE S YSTEM H EARS Y OU ! Waveforms and Spectrograms of /k/, /pff/, /t/ T RAINING T HE /K/ H MM