1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.

Slides:



Advertisements
Similar presentations
Primer on Probability Sushmita Roy BMI/CS 576 Sushmita Roy Oct 2nd, 2012 BMI/CS 576.
Advertisements

Random Variables ECE460 Spring, 2012.
Chapter 4 Probability and Probability Distributions
Continuous Random Variables. L. Wang, Department of Statistics University of South Carolina; Slide 2 Continuous Random Variable A continuous random variable.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models in NLP
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Albert Gatt Corpora and Statistical Methods Lecture 8.
DEPARTMENT OF HEALTH SCIENCE AND TECHNOLOGY STOCHASTIC SIGNALS AND PROCESSES Lecture 1 WELCOME.
Introduction to stochastic process
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Chapter 2: Probability.
Random Variables Dr. Abdulaziz Almulhem. Almulhem©20022 Preliminary An important design issue of networking is the ability to model and estimate performance.
Probability Distributions
Probability By Zhichun Li.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Probability and Statistics Review
Probability Distributions Random Variables: Finite and Continuous Distribution Functions Expected value April 3 – 10, 2003.
1 Chapter 7 Probability Basics. 2 Chapter 7 Introduction to Probability Basics Learning Objectives –Probability Theory and Concepts –Use of probability.
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Chapter 6 Probability.
Statistics 303 Chapter 4 and 1.3 Probability. The probability of an outcome is the proportion of times the outcome would occur if we repeated the procedure.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Ex St 801 Statistical Methods Probability and Distributions.
1 Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS AND ENGINEERS Systems.
1 CS 552/652 Speech Recognition with Hidden Markov Models Spring 2010 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.
Some Probability Theory and Computational models A short overview.
1 Lecture 4. 2 Random Variables (Discrete) Real-valued functions defined on a sample space are random vars. determined by outcome of experiment, we can.
OPIM 5103-Lecture #3 Jose M. Cruz Assistant Professor.
5.3 Random Variables  Random Variable  Discrete Random Variables  Continuous Random Variables  Normal Distributions as Probability Distributions 1.
2.1 Introduction In an experiment of chance, outcomes occur randomly. We often summarize the outcome from a random experiment by a simple number. Definition.
Fundamentals of Data Analysis Lecture 3 Basics of statistics.
One Random Variable Random Process.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 CHAPTERS 14 AND 15 (Intro Stats – 3 edition) PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY.
CS433 Modeling and Simulation Lecture 03 – Part 01 Probability Review 1 Dr. Anis Koubâa Al-Imam Mohammad Ibn Saud University
Courtesy of J. Akinpelu, Anis Koubâa, Y. Wexler, & D. Geiger
Basic Concepts of Probability CEE 431/ESS465. Basic Concepts of Probability Sample spaces and events Venn diagram  A Sample space,  Event, A.
EAS31116/B9036: Statistics in Earth & Atmospheric Sciences Lecture 3: Probability Distributions (cont’d) Instructor: Prof. Johnny Luo
LECTURE 17 THURSDAY, 22 OCTOBER STA 291 Fall
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
CS Statistical Machine learning Lecture 24
Essential Statistics Chapter 91 Introducing Probability.
CHAPTER 10 Introducing Probability BPS - 5TH ED.CHAPTER 10 1.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
2. Introduction to Probability. What is a Probability?
Sixth lecture Concepts of Probabilities. Random Experiment Can be repeated (theoretically) an infinite number of times Has a well-defined set of possible.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Random Variables Ch. 6. Flip a fair coin 4 times. List all the possible outcomes. Let X be the number of heads. A probability model describes the possible.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
9/14/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing Probability AI-Lab
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Chapter 4 - Random Variables Todd Barr 22 Jan 2010 Geog 3000.
1 Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS AND ENGINEERS Systems.
Topic Overview and Study Checklist. From Chapter 7 in the white textbook: Modeling with Differential Equations basic models exponential logistic modified.
CHAPTER 2 RANDOM VARIABLES.
Continuous Random Variables
Probability.
Random Variable.
3.1 Expectation Expectation Example
Chapter 4 – Part 3.
Probability Review for Financial Engineers
Advanced Artificial Intelligence
Random Variable.
AP Statistics Chapter 16 Notes.
7.1: Discrete and Continuous Random Variables
Experiments, Outcomes, Events and Random Variables: A Revisit
Presentation transcript:

1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul Hosom Lecture 3 January 10 Review of Probability & Statistics; Markov Models

2 Review of Probability and Statistics Random Variables: “variable” because different values are possible “random” because observed value depends on outcome of some experiment discrete random variables: set of possible values is a discrete set continuous random variables: set of possible values is an interval of numbers usually a capital letter is used to denote a random variable.

3 Probability Density Functions: If X is a continuous random variable, then the p.d.f. of X is a function f(x) such that so that the probability that X has a value between a and b is the area of the density function from a to b. Note: f(x)  0 for all x area under entire graph = 1 Example 1: Review of Probability and Statistics f(x)f(x) x a b

4 Probability Density Functions: Example 2: Review of Probability and Statistics f(x)f(x) x a=0.25 b=0.75 Probability that x is between 0.25 and 0.75 is from Devore, p. 134

5 Cumulative Distribution Functions: cumulative distribution function (c.d.f.) F(x) for c.r.v. X is: example: Review of Probability and Statistics f(x)f(x) x b=0.75 C.D.F. of f(x) is

6 Expected Values: expected (mean) value of c.r.v. X with p.d.f. f(x) is: example 1 (discrete): example 2 (continuous): Review of Probability and Statistics E(X) = 2·0.05+3·0.10+ … +9·0.05 = Or, take 3 numbers: 45, 20, and 12 from some population of numbers. The mean is ( )/4=27. The expected value is ¼×45 + ¼×20 + ¼×12 + ¼×31 = 27, since the probability of any of these 4 values is equally likely at 25%. So the mean and the expected value are the same.

7 Review of Probability and Statistics The Normal (Gaussian) Distribution: the p.d.f. of a Normal distribution is where μ is the mean and σ is the standard deviation μ σ σ 2 is called the variance.

8 Review of Probability and Statistics The Normal (Gaussian) Distribution: Any arbitrary p.d.f. can be constructed by summing N weighted Gaussians (mixtures of Gaussians) w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 This is what is meant by a “Gaussian Mixture Model” (GMM)

9 Review of Probability and Statistics Conditional Probability: event space the conditional probability of event A given that event B has occurred: the multiplication rule: A B

10 Conditional Probability: Example (from Devore, p.52) 3 equally-popular airlines (1,2,3) fly from LA to NYC. Probability of 1 being delayed: 40% Probability of 2 being delayed: 50% Probability of 3 being delayed: 70% probability of selecting an airline=A, probability of delay=B Review of Probability and Statistics P(A 1 ) = 1/3 P(B|A 3 ) = 7/10 P(A 3  B) = 1/3 × 7/10 = 7/30 P(B’|A 3 ) = 3/10 Late = B Not Late = B’ A 3 = Airline 3 P(B|A 1 ) = 4/10 P(A 1  B) = 1/3 × 4/10 = 4/30 P(B’|A 1 ) = 6/10 Late = B Not Late = B’ A 1 = Airline 1 P(A 3 ) = 1/3 P(B|A 2 ) = 5/10 P(A 2  B) = 1/3 × 5/10 = 5/30 P(B’|A 2 ) = 5/10 Late = B Not Late = B’ A 2 = Airline 2 P(A 2 ) = 1/3

11 Conditional Probability: Example (from Devore, p.52) What is probability of choosing airline 1 and being delayed on that airline? What is probability of being delayed? Given that the flight was delayed, what is probability that the airline is 1? Review of Probability and Statistics

12 Review of Probability and Statistics Law of Total Probability: for independent events A 1, A 2, … A n and any other event B: Bayes’ Rule: for independent events A 1, A 2, … A n and any other event B, with P(A i ) > 0 and P(B) > 0:

13 Review of Probability and Statistics Independence: events A and B are independent iff from multiplication rule or from Bayes’ rule, from multiplication rule and definition of independence, events A and B are independent iff

14 A Markov Model (Markov Chain) is: similar to a finite-state automata, with probabilities of transitioning from one state to another: What is a Markov Model? S1S1 S5S5 S2S2 S3S3 S4S transition from state to state at discrete time intervals can only be in 1 state at any given time 1.0

15 Elements of a Markov Model (Chain): clock t = {1, 2, 3, … T} N states Q = {1, 2, 3, … N} the single state j at time t is referred to as q t N events E = {e 1, e 2, e 3, …, e N } initial probabilities π j = P[q 1 = j]1  j  N transition probabilities a ij = P[q t = j | q t-1 = i]1  i, j  N What is a Markov Model?

16 Elements of a Markov Model (chain): the (potentially) occupied state at time t is called q t a state can referred to by its index, e.g. q t = j 1 event corresponds to 1 state: At each time t, the occupied state outputs (“emits”) its corresponding event. Markov model is generator of events. each event is discrete, has single output. in typical finite-state machine, actions occur at transitions, but in most Markov Models, actions occur at each state. What is a Markov Model?

17 Transition Probabilities: no assumptions (full probabilistic description of system): P[q t = j | q t-1 = i, q t-2 = k, …, q 1 =m] usually use first-order Markov Model: P[q t = j | q t-1 = i] = a ij first-order assumption: transition probabilities depend only on previous state (and time) a ij obeys usual rules: sum of probabilities leaving a state = 1 (must leave a state) What is a Markov Model?

18 S1S1 S2S2 S3S Transition Probabilities: example: What is a Markov Model? a 11 = 0.0a 12 = 0.5a 13 = 0.5a 1Exit =0.0  =1.0 a 21 = 0.0a 22 = 0.7a 23 = 0.3a 2Exit =0.0  =1.0 a 31 = 0.0a 32 = 0.0a 33 = 0.0a 3Exit =1.0  =

19 Transition Probabilities: probability distribution function: What is a Markov Model? S1S1 S2S2 S3S p(being in state S 2 exactly 1 time) = 0.6 = p(being in state S 2 exactly 2 times) = 0.4 ·0.6 = p(being in state S 2 exactly 3 times) = 0.4 ·0.4 ·0.6 = p(being in state S 2 exactly 4 times) = 0.4 ·0.4 ·0.4 ·0.6 = = exponential decay (characteristic of Markov Models)

20 Transition Probabilities: What is a Markov Model? S1S1 S2S2 S3S p(being in state S 2 exactly 1 time) = 0.1 = p(being in state S 2 exactly 2 times) = 0.9 ·0.1 = p(being in state S 2 exactly 3 times) = 0.9 ·0.9 ·0.1 = p(being in state S 2 exactly 5 times) = 0.9 ·0.9 ·... ·0.1 = a 22 =0.9 a 22 = 0.5 (note: in graph, no multiplication by a 23 ) a 22 =0.7 prob. of being in state length of time in same state

21 Transition Probabilities: can construct second-order Markov Model: P[q t = j | q t-1 = i, q t-2 = k] What is a Markov Model? S1S1 S3S3 S2S2 q t-2 =S 2 : 0.15 q t-2 =S 3 : 0.25 q t-2 =S 1 :0.3 q t-2 =S 1 :0.25 q t-2 =S 1 :0.2 q t-2 =S 2 :0.1 q t-2 =S 3 :0.2 q t-2 =S 2 :0.2 q t-2 =S 2 :0.3 q t-2 =S 3 :0.35 q t-2 =S 1 :0.10 q t-2 =S 3 :0.30

22 Initial Probabilities: probabilities of starting in each state at time 1 denoted by π j π j = P[q 1 = j]1  j  N What is a Markov Model?

23 Example 1: Single Fair Coin What is a Markov Model? S1S1 S2S2 0.5 S 1 corresponds to e 1 = Headsa 11 = 0.5a 12 = 0.5 S 2 corresponds to e 2 = Tailsa 21 = 0.5a 22 = 0.5 Generate events: H T H H T H T T T H H corresponds to state sequence S 1 S 2 S 1 S 1 S 2 S 1 S 2 S 2 S 2 S 1 S 1

24 Example 2: Single Biased Coin (outcome depends on previous result) What is a Markov Model? S1S1 S2S S 1 corresponds to e 1 = Headsa 11 = 0.7a 12 = 0.3 S 2 corresponds to e 2 = Tailsa 21 = 0.4a 22 = 0.6 Generate events: H H H T T T H H H T T H corresponds to state sequence S 1 S 1 S 1 S 2 S 2 S 2 S 1 S 1 S 1 S 2 S 2 S 1

25 Example 3: Portland Winter Weather What is a Markov Model? S1S1 S2S S3S

26 Example 3: Portland Winter Weather (con’t) S 1 = event 1 = rain S 2 = event 2 = clouds A = {a ij } = S 3 = event 3 = sun what is probability of {rain, rain, rain, clouds, sun, clouds, rain}? Obs. = {r, r, r, c, s, c, r} S ={S 1, S 1, S 1, S 2, S 3, S 2, S 1 } time = {1, 2, 3, 4, 5, 6, 7} (days) = P[S 1 ] P[S 1 |S 1 ] P[S 1 |S 1 ] P[S 2 |S 1 ] P[S 3 |S 2 ] P[S 2 |S 3 ] P[S 1 |S 2 ] = 0.5 · 0.7 · 0.7 · 0.25 · 0.1 · 0.7 · 0.4 = What is a Markov Model? π 1 = 0.5 π 2 = 0.4 π 3 = 0.1

27 Example 3: Portland Winter Weather (con’t) S 1 = event 1 = rain S 2 = event 2 = clouds A = {a ij } = S 3 = event 3 = sunny what is probability of {sun, sun, sun, rain, clouds, sun, sun}? Obs. = {s, s, s, r, c, s, s} S ={S 3, S 3, S 3, S 1, S 2, S 3, S 3 } time = {1, 2, 3, 4, 5, 6, 7} (days) = P[S 3 ] P[S 3 |S 3 ] P[S 3 |S 3 ] P[S 1 |S 3 ] P[S 2 |S 1 ] P[S 3 |S 2 ] P[S 3 |S 3 ] = 0.1 · 0.1 · 0.1 · 0.2 · 0.25 · 0.1 · 0.1 = 5.0x10 -7 What is a Markov Model? π 1 = 0.5 π 2 = 0.4 π 3 = 0.1

28 Example 4: Marbles in Jars (lazy person) What is a Markov Model? Jar 1 Jar 2 Jar 3 S1S1 S2S S3S (assume unlimited number of marbles)

29 Example 4: Marbles in Jars (con’t) S 1 = event 1 = black S 2 = event 2 = white A = {a ij } = S 3 = event 3 = grey what is probability of {grey, white, white, black, black, grey}? Obs. = {g, w, w, b, b, g} S ={S 3, S 2, S 2, S 1, S 1, S 3 } time = {1, 2, 3, 4, 5, 6} = P[S 3 ] P[S 2 |S 3 ] P[S 2 |S 2 ] P[S 1 |S 2 ] P[S 1 |S 1 ] P[S 3 |S 1 ] = 0.33 · 0.3 · 0.6 · 0.2 · 0.6 · 0.1 = What is a Markov Model? π 1 = 0.33 π 2 = 0.33 π 3 = 0.33

30 Example 4A: Marbles in Jars What is a Markov Model? Jar 1 Jar 2 Jar 3 S1S1 S2S S3S S1S1 S2S S3S3 Same data, two different models... “lazy”“random”

31 Example 4A: Marbles in Jars What is probability of: {w, g, b, b, w} given each model (“lazy” and “random”)? S = {S 2, S 3, S 1, S 1, S 2 } time = {1, 2, 3, 4, 5} “lazy”“random” = P[S 2 ] P[S 3 |S 2 ] P[S 1 |S 3 ] P[S 1 |S 1 ] P[S 2 |S 1 ] = 0.33 · 0.2 · 0.1 · 0.6 · 0.3 = 0.33 · 0.33 · 0.33 · 0.33 · 0.33 = = {w, g, b, b, w} has greater probability if generated by “random.”  “random” model more likely to generate {w, g, b, b, w}. What is a Markov Model?

32 Notes: Independence is assumed between events that are separated by more than one time frame, when computing probability of sequence of events (for first-order model). Given list of observations, we can determine exact state sequence that generated those observations.  state sequence not hidden. Each state associated with only one event (output). Computing probability given a set of observations and a model is straightforward. Given multiple Markov Models and an observation sequence, it’s easy to determine the M.M. most likely to have generated the data. What is a Markov Model?