Conditional Random Fields Representation Probabilistic Graphical Models Markov Networks Conditional Random Fields
Motivation Observed variables X Target variables Y
CRF Representation
CRFs and Logistic Model draw structure, show conditional distribution
CRFs for Language Features: word capitalized, word in atlas or name list, previous word is “Mrs”, next word is “Times”, …
More CRFs for Language Different chains can use different features
Summary A CRF is parameterized the same as a Gibbs distribution, but normalized differently Don’t need to model distribution over variables we don’t care about Allows models with highly expressive features, without worrying about wrong independencies
END END END
The Chain Rule for Bayesian Nets Intelligence Difficulty Grade Letter SAT 0.3 0.08 0.25 0.4 g2 0.02 0.9 i1,d0 0.7 0.05 i0,d1 0.5 g1 g3 0.2 i1,d1 i0,d0 l1 l0 0.99 0.1 0.01 0.6 0.95 s0 s1 0.8 i1 i0 d1 d0 P(D,I,G,S,L) = P(D) P(I) P(G | I,D) P(L | G) P(S | I)
Suppose q is at a local minimum of a function Suppose q is at a local minimum of a function. What will one iteration of gradient descent do? Leave q unchanged. Change q in a random direction. Move q towards the global minimum of J(q). Decrease q.
Fig. A corresponds to a=0.01, Fig. B to a=0.1, Fig. C to a=1.