 # Introduction to Conditional Random Fields John Osborne Sept 4, 2009.

## Presentation on theme: "Introduction to Conditional Random Fields John Osborne Sept 4, 2009."— Presentation transcript:

Introduction to Conditional Random Fields John Osborne Sept 4, 2009

Overview Useful Definitions Background – HMM – MEMM Conditional Random Fields – Statistical and Graph Definitions Computation (Training and Inference) Extensions – Bayesian Conditional Random Fields – Hierarchical Conditional Random Fields – Semi-CRFs Future Directions

Useful Definitions Random Field (wikipedia) – In probability theory, let S = {X 1,..., X n }, with the X i in {0, 1,..., G − 1} being a set of random variables on the sample space Ω = {0, 1,..., G − 1} n. A probability measure π is a random field if, for all ω in Ω, π(ω) > 0.probability theoryrandom variablessample space Markov Process (chain if finite sequence) – Stochastic process with Markov property Markov Property – The probability that a random variable assumes a value depends on the other random variables only through the ones that are its immediate neighbors – “memoryless” Hidden Markov Model (HMM) – Markov Model where the current state is unobserved Viterbi Algorithm – Dynamic programming technique to discover the most likely sequence of states required to explain the observed states in an HMM – Determine labels Potential Function == Feature Function – In CRF the potential function scores the compatibility of y t, y t-1 and w t (X)

Background Interest in CRFs arose from Richa’s work with gene expression Current literature shows them performing better on NLP tasks than other commonly used NLP approaches like Support Vector Machines (SVM), neural networks, HMMs and others – Termed coined by Lafftery in 2001 Predecessor was HMM and maximum entropy Markov models (MEMM)

HMM – Definition Markov Model where the current state is unobserved – Generative Model – To examine all input X would be prohibitive, hence Markov property looking at only current element in the sequence – No multiple interacting features, long range dependencies

MEMMs – McCallum et al, 2000 – Non-generative finite- state model based on next-state classifier – Directed graph – P(YjX) = ∏ t P(y t | y t-1 w t (X)) where wt(X) is a sliding window over the X sequence

Label Bias Problem Transitions leaving a given state complete only against each other, rather than against all other transitions in the model Implies “Conversation of score mass” (Bottou, 1991) Observations can be ignored, Viterbi decoding can’t downgrade a branch CRF will solve this problem by having a single exponential model for the joint probability of the ENTIRE SEQUENCE OF LABELS given the observation sequence

Big Picture Definition Wikipedia Definition (Aug 2009) – A conditional random field (CRF) is a type of discriminative probabilistic model most often used for the labeling or parsing of sequential data, such as natural language text or biological sequences.discriminativeprobabilisticparsing natural language Probabilistic model is a statistical model, in math terms “a pair (Y,P) where Y is the set of possible observations and P the set of possible probability distributions on Y” – In statistics terms this means the objective is to infer (or pick) the distinct element (probability distribution) in the set “P” given your observation Y Discriminative model meaning it models the conditional probability distribution P(y|x) which can predict y given x. – It can not do it the other way around (produce x from y) since it does not a generative model (capable of generating sample data given a model) as it does not model a joint probability distribution – Similar to other discriminative models like support vector machines and neural networks When analyzing sequential data a conditional model specifies the probabilities of possible label sequences given an observation sequence

CRF Graphical Definition Definition from Lafferty Undirected graphical model Let g = (V,E) be a graph such that Y = (Y v ) vεV, so that Y is indexed by the vertices of G. Then (X,Y) is a conditional random field in case, when conditioned on X, the random variables Y v obey the Markov property with respect to the graph: p(Y v |X,Y w,w≠v)=p(Y v |X,Y w,w~v), where w~v means that w and v are neighbors in G CRF Undirected Graph

Computation of CRF Training – Conditioning – Calculation of Feature Function – P(Y|X) = 1/Z(X)exp ∑ t PSI (y t, y t-1 and w t (X)) Z is normalizing factor Potential Function in paratheses Inference – Viterbi Decoding – Approximate Model Averaging – Others?

Training Approaches CRF is supervised learning so can train using – Maximum Likehood (original paper) Used iterative scaling method, was very slow – Gradient Assent Also slow when naïve – Mallet Implementation used BFGS algorithm http://en.wikipedia.org/wiki/BFGS Broyden-Fletcher-Goldfarb – Shanno Approximate 2 nd order algorithm – Stochastic Gradient Method (2006) accelerated via Stochastic Meta Descent – Gradient Tree Boosting (variant of a 2001 http://jmlr.csail.mit.edu/papers/volume9/dietterich08a/dietterich08a.pdf Potential functions are sums of regression trees – Decision trees using real values Published 2008 Competitive with Mallet – Bayesian (estimate posterior probability)

Conditional Random Field Extensions Semi-CRF Semi-CRF – Instead of assigning labels to each member of sequence, labels are assigned to sub-sequences – Advantage – “features for semi-CRF can measure properties of segments, and transition within a segment can be non-Markovian” – http://www.cs.cmu.edu/~wcohen/postscript/semi CRF.pdf http://www.cs.cmu.edu/~wcohen/postscript/semi CRF.pdf

Bayesian CRF Qi et al, (2005) http://www.cs.purdue.edu/homes/alanqi/pap ers/Qi-Bayesian-CRF-AIstat05.pdf http://www.cs.purdue.edu/homes/alanqi/pap ers/Qi-Bayesian-CRF-AIstat05.pdf Replacement for ML method of Lafferty Reducing over-fitting “Power EP Method”

Hierarchical CRF (HCRF) http://www.springerlink.com/content/r84055k27 54464v5/ http://www.springerlink.com/content/r84055k27 54464v5/ http://www.cs.washington.edu/homes/fox/posts cripts/places-isrr-05.pdf http://www.cs.washington.edu/homes/fox/posts cripts/places-isrr-05.pdf GPS motion, for surveillance, tracking, dividing people’s workday into labels of work, travel, sleep, etc.. Less work

Future Directions Less work on conditional random fields in biology – PubMed hits Conditional Random Field - 21 Conditional Random Fields - 43 – CRF variants & promoter/regulatory element shows no hits CRF and ontology show no hits Plan – Implement CRF in Java, apply to biology problems, try to find ways to extend?

Useful Papers Link to original paper and review paper – http://www.inference.phy.cam.ac.uk/hmw26/crf/ http://www.inference.phy.cam.ac.uk/hmw26/crf/ – Review paper: http://www.inference.phy.cam.ac.uk/hmw26/papers/crf_intro.pdf Another review – http://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf http://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf Review slides – http://www.cs.pitt.edu/~mrotaru/comp/nlp/Random%20Fields/ Tutorial%20CRF%20Lafferty.pdf http://www.cs.pitt.edu/~mrotaru/comp/nlp/Random%20Fields/ Tutorial%20CRF%20Lafferty.pdf The boosting paper has a nice review – http://jmlr.csail.mit.edu/papers/volume9/dietterich08a/dietteri ch08a.pdf http://jmlr.csail.mit.edu/papers/volume9/dietterich08a/dietteri ch08a.pdf

Download ppt "Introduction to Conditional Random Fields John Osborne Sept 4, 2009."

Similar presentations