Speech, Perception, & AI Artificial Intelligence CMSC 25000 February 13, 2003.

Slides:

Advertisements

Similar presentations

Building an ASR using HTK CS4706

Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

Angelo Dalli Department of Intelligent Computing Systems

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

Hidden Markov Models Dave DeBarr

Hidden Markov Models Theory By Johan Walters (SR 2003)

1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.

Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.

Bayes Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read the.

Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.

1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):

Application of HMMs: Speech recognition “Noisy channel” model of speech.

Speech Recognition. What makes speech recognition hard?

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Probabilistic Pronunciation + N-gram Models CSPP Artificial Intelligence February 25, 2004.

Learning, Uncertainty, and Information Big Ideas November 8, 2004.

1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.

Dynamic Time Warping Applications and Derivation

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

ISSUES IN SPEECH RECOGNITION Shraddha Sharma

Introduction to Automatic Speech Recognition

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.

7-Speech Recognition Speech Recognition Concepts

Perception Vision, Sections Speech, Section 24.7.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.

Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.

Csc Lecture 7 Recognizing speech. Geoffrey Hinton.

Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,

IRCS/CCN Summer Workshop June 2003 Speech Recognition.

Entropy & Hidden Markov Models Natural Language Processing CMSC April 22, 2003.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Sequence Models With slides by me, Joshua Goodman, Fei Xia.

22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.

UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.

Hidden Markov Models: Probabilistic Reasoning Over Time Natural Language Processing.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.

Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.

Hidden Markov Models: Probabilistic Reasoning Over Time Artificial Intelligence CMSC February 26, 2008.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.

Hidden Markov Models: Probabilistic Reasoning Over Time Natural Language Processing CMSC February 22, 2005.

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.

Probabilistic Pronunciation + N-gram Models CMSC Natural Language Processing April 15, 2003.

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.

Learning, Uncertainty, and Information: Learning Parameters

Artificial Intelligence for Speech Recognition

Statistical Models for Automatic Speech Recognition

Uncertain Reasoning over Time

Speech Processing Speech Recognition

CS 188: Artificial Intelligence Fall 2008

Statistical Models for Automatic Speech Recognition

LECTURE 15: REESTIMATION, EM AND MIXTURES

CPSC 503 Computational Linguistics

Speech recognition, machine learning

Speech Recognition: Acoustic Waves

Speech recognition, machine learning

Presentation transcript:

Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003

Agenda Speech –AI & Perception –Speech applications Speech Recognition –Coping with uncertainty

AI & Perception Classic AI –Disembodied Reasoning: Think of an action Expert systems Theorem Provers Full knowledge planners Contemporary AI –Situated reasoning: Thinking & Acting Spoken language systems Image registration Behavior-based robots

Speech Applications Speech-to-speech translation Dictation systems Spoken language systems Adaptive & Augmentative technologies –E.g. talking books Speaker identification & verification Ubiquitous computing

Speech Recognition Goal: –Given an acoustic signal, identify the sequence of words that produced it –Speech understanding goal: Given an acoustic signal, identify the meaning intended by the speaker Issues: –Ambiguity: many possible pronunciations, –Uncertainty: what signal, what word/sense produced this sound sequence

Decomposing Speech Recognition Q1: What speech sounds were uttered? –Human languages: phones Basic sound units: b, m, k, ax, ey, …(arpabet) Distinctions categorical to speakers –Acoustically continuous Part of knowledge of language –Build per-language inventory –Could we learn these?

Decomposing Speech Recognition Q2: What words produced these sounds? –Look up sound sequences in dictionary –Problem 1: Homophones Two words, same sounds: too, two –Problem 2: Segmentation No “space” between words in continuous speech “I scream”/”ice cream”, “Wreck a nice beach”/”Recognize speech” Q3: What meaning produced these words? –NLP (But that’s not all!)

Signal Processing Goal: Convert impulses from microphone into a representation that –is compact –encodes features relevant for speech recognition Compactness: Step 1 –Sampling rate: how often look at data 8KHz, 16KHz,(44.1KHz= CD quality) –Quantization factor: how much precision 8-bit, 16-bit (encoding: u-law, linear…)

(A Little More) Signal Processing Compactness & Feature identification –Capture mid-length speech phenomena Typically “frames” of 10ms (80 samples) –Overlapping –Vector of features: e.g. energy at some frequency –Vector quantization: n-feature vectors: n-dimension space –Divide into m regions (e.g. 256) –All vectors in region get same label - e.g. C256

Speech Recognition Model Question: Given signal, what words? Problem: uncertainty –Capture of sound by microphone, how phones produce sounds, which words make phones, etc Solution: Probabilistic model –P(words|signal) = – P(signal|words)P(words)/P(signal) –Idea: Maximize P(signal|words)*P(words) P(signal|words): acoustic model; P(words): lang model

Probabilistic Reasoning over Time Issue1: Only static models so far –Speech is dynamic sequence changing over time Solution1: Develop model for dynamic Issue2: Only discrete models so far –Speech is continuously changing –How do we make observations? States? Solution2: Discretize –“Time slices”: Make time discrete –Observations, States associated with time: Ot, Qt

Modelling Processes over Time Issue: New state depends on preceding states –Analyzing sequences Problem 1: Possibly unbounded # prob tables –Observation+State+Time Solution 1: Assume stationary process –Rules governing process same at all time Problem 2: Possibly unbounded # parents –Markov assumption: Only consider finite history –Common: 1 or 2 Markov: depend on last couple

Language Model Idea: some utterances more probable Standard solution: “n-gram” model –Typically tri-gram: P(wi|wi-1,wi-2) Collect training data –Smooth with bi- & uni-grams to handle sparseness –Product over words in utterance

Acoustic Model P(signal|words) –words -> phones + phones -> vector quantiz’n Words -> phones –Pronunciation dictionary lookup Multiple pronunciations? –Probability distribution »Dialect Variation: tomato »+Coarticulation –Product along path towmaaeytow 0.5 towmaaeytow 0.5 ax

Acoustic Model P(signal| phones): –Problem: Phones can be pronounced differently Speaker differences, speaking rate, microphone Phones may not even appear, different contexts –Observation sequence is uncertain Solution: Hidden Markov Models –1) Hidden => Observations uncertain –2) Probability of word sequences => State transition probabilities –3) 1 st order Markov => use 1 prior state

Hidden Markov Models (HMMs) An HMM is: –1) A set of states Q=q1,...,qk –2) A set of transition probabilities: A=a01....amn Where aij is the probability of transition qi -> qj –3)Observation probabilities: B=bi(ot), The probability of observing ot in state i –4) An initial probability dist over states: pi_i The probability of starting in state i –5) A set of accepting states

Acoustic Model 3-state phone model for [m] –Use Hidden Markov Model (HMM) –Probability of sequence: sum of prob of paths OnsetMidEndFinal C1: 0.5 C2: 0.2 C3: 0.3 C3: 0.2 C4: 0.7 C5: 0.1 C4: 0.1 C6: 0.5 C6: 0.4 Transition probabilities Observation probabilities

Viterbi Algorithm Find BEST word sequence given signal –Best P(words|signal) –Take HMM & VQ sequence => word seq (prob) Dynamic programming solution –Record most probable path ending at a state i Then most probable path from i to end O(bMn) –Maximize: v[i,t]=v[i,t-1]*a(qt-1,qt)*bi(ot)

Learning HMMs Issue: Where do the probabilities come from? Solution: Learn from data –Signal + word pairs –Baum-Welch algorithm learns efficiently See CMSC 35100

Does it work? Yes: –99% on isolate single digits –95% on restricted short utterances (air travel) –80+% professional news broadcast No: –55% Conversational English –35% Conversational Mandarin –?? Noisy cocktail parties

Speech Recognition as Modern AI Draws on wide range of AI techniques –Knowledge representation & manipulation Optimal search: Viterbi decoding –Machine Learning Baum-Welch for HMMs Nearest neighbor & k-means clustering for signal id –Probabilistic reasoning/Bayes rule Manage uncertainty in signal, phone, word mapping Enables real world application