Some Probability Theory and Computational models A short overview
Basic Probability Theory We will only use discrete probability spaces over boolean events A Probability distribution maps a set of events to [0,1] – P(A) is the probability that A is true – The fraction of “worlds” in which A holds “Possible worlds” interpretation
Axioms
Conditional Probability and Independence
Bayes Rule
Example Consider two “language models” of French and English Assume that the probability of observing a word w is – 0.01 in English text – 0.05 in French text Assume the number of english and french texts are roughly equal What is the probability that w is in french?
Some Computational Models Finite State Machines Context Free Grammars Probabilistic Variants
Finite State Machines States and transitions Symbols on transitions Acceptors vs. generators
Markov Chains Finite State Machines with transitions governed by probabilistic events – In conjunction with / instead of external input Markovian property: Every transition is independent of the past, given the present state – Probability of following a path is the multiplication of probabilities of individual transitions
Context Free Grammars Context Free Grammars are a more natural model for Natural Language Syntax rules are very easy to formulate using CFGs Provably more expressive than Finite State Machines – E.g. Can check for balanced parentheses
Context Free Grammars Non-terminals Terminals Production rules – V → w where V is a non-terminal and w is a sequence of terminals and non-terminals
Context Free Grammars Can be used as acceptors Can be used as a generative model Similarly to the case of Finite State Machines How long can a string generated by a CFG be?
Stochastic Context Free Grammar Non-terminals Terminals Production rules associated with probability – V → w where V is a non-terminal and w is a sequence of terminals and non-terminals – Markovian property is typically assumed
Chomsky Normal Form Every rule is of the form V → V1V2 where V,V1,V2 are non-terminals V → t where V is a non-terminal and t is a terminal Every (S)CFG can be written in this form Makes designing many algorithms easier