Presentation is loading. Please wait.

Presentation is loading. Please wait.

Carnegie Mellon Evidence-Specific Structures for Rich Tractable CRFs Anton Chechetka, Carlos Guestrin General approach: 1.Ground the model / features 2.Use.

Similar presentations


Presentation on theme: "Carnegie Mellon Evidence-Specific Structures for Rich Tractable CRFs Anton Chechetka, Carlos Guestrin General approach: 1.Ground the model / features 2.Use."— Presentation transcript:

1 Carnegie Mellon Evidence-Specific Structures for Rich Tractable CRFs Anton Chechetka, Carlos Guestrin General approach: 1.Ground the model / features 2.Use standard ESS-CRFs + parameter sharing want P( structured Q uery | E vidence ): P( |, ) P( | ) webpage text + links professor student project B B A C P( | ) collection of images + face similarities face labels Collaborative filtering: Webpage classification: Face recognition in image collections: featuresweightsnormalization Dense models Evidence Query f 12 f 34 Induced structure over Q f 12 f 34 featureexpected feature need inference in the induced model Exact inference: #P-complete Approximate inference: NP-complete Hopeless in large dense models Easy for tree-structured models Features can be arbitrarily correlated Convex objective Unique global optimum Intuitive gradient: Tree models Capture complex dependencies Natural extensions to relational settings Arbitrarily bad inference quality Arbitrarily bad parameters quality Simple dependencies only Relational settings are not tree-structured Efficient exact inference Efficient learning of optimal parameters This work: Keep efficient exact inference + parameters, enable rich dependencies and relational extensions evidence-specific structurestandard weighted features structure selection parameters structure selection algorithm Battery is good Engine starts E={gas tank is empty} Battery is good Engine starts Dependence in general No dependence for this specific evidence CRF with Evidence-Specific Structure Formalism: Motivation Conditional Random Fields Model Structure Tradeoffs Intuition: Edge importance depends on evidence Fixed dense model Evidence-specific tree “mask” Evidence-specific model ×= (())() E=e 1 E=e 2 E=e 3 E=e 1 E=e 2 E=e 3 × Capture all potential dependencies Select the most important tree specific to the evidence value () ( ) Select tree structures, based on evidence, to capture the most important dependencies: T(E,u) encodes the output of a structure selection algorithm Global perspective on structure selection Easy to guarantee tree structure (by selecting appropriate alg.) Looking at one edge at a time is not enough to guarantee tree structure: being a tree is a global property Objective still convex in w (but not u ) Efficient exact inference Efficient learning of optimal parameters w Much richer class of models than fixed trees (potential for capturing complex correlations) Structure selection decoupled from feature design and weights (can use an arbitrarily dense model as the basis) 1.choose features f 2.choose tree learning algorithm T(E,  ) 3.learn u 4.select evidence-specific trees T(e i,u) for every datapoint (E=e i,Q=q i ) [u is fixed at this stage] 5.given u, trees T(e i,u), learn w [L-BFGS, etc.] Learning a ESS-CRF model: Algorithm StageDense CRFsESS-CRFs (this work) Structure selectionApproximateApproximate Feature weight learning Approximate (no gurarantees) Exact Test time inference Approximate (no gurarantees) Exact Parameter sharing for both w and u : one weight per relation, not per grounding P(Q) (no evidence) P(Q i,Q j |E=e) (pairwise conditionals) + Chow-Liu algorithm = good tree for E=e P(Q|E=e) Directly generalize existing algorithms for the no-evidence case: Train stage: Learning Good Evidence-Specific Trees EQ EQ 1,Q 2 EEQ 1,Q 3 Q 3,Q 4, original high- dimensional problem low-dimensional pairwise problems … learn pairwise conditional estimators  params u Test stage (evidence-specific Chow-Liu alg.): Instantiate evidence in pairwise estimators: Compute mutual information values edge weights Q1Q1 Q3Q3 Q4Q4 Q2Q2 Return maximum spanning tree: Q1Q1 Q3Q3 Q4Q4 Q2Q2 Fewer Sources of Errors Learning Optimal Feature Weights Our Approach: Evidence-Specific Structures sparsity conforms to the evidence-specific structure structure-related parameters u are fixed from the tree-learning step efficient exact computation because T(e,u) is a tree ++ = E=e 1 ( ())( ) E=e 3 E=e 2 individual datapoints  tree-sparse gradients (with different evidence-dependent sparsity patterns) overall dataset: dense gradient, but still tractable Relational Extensions Results Face recognition [w/ Denver Dash, Matthai Philipose] Exploit face similarities to propagate labels in collections of images Semi-superwised relational model 250…1700 images, 4…24 unique people Compare against dense discriminative models disagree Equal or better accuracy 100 times faster WebKB [data + features thanks to Ben Taskar] webpages text + links  page type (student, project,…) same accuracy as dense models ~10 times faster Can exactly compute the convex objective and its gradient Use L-BFGS or conjugate gradient to find the unique global optimum w.r.t. w exactly Gradient similar to standard CRFs: QiQi QjQj 1 1 0 0 1 … 0 Parameters dimensionality independent of model size Reduces overfitting Structure selection only after grounding No worries about structure being a tree on the relational level P(Q i,Q j ) (pairwise marginals) + Chow-Liu algorithm = optimal tree () Acknowledgements: this work has been supported by NSF Career award IIS-0644225 and by ARO MURI W911NF0710287 and W911NF0810242 agree


Download ppt "Carnegie Mellon Evidence-Specific Structures for Rich Tractable CRFs Anton Chechetka, Carlos Guestrin General approach: 1.Ground the model / features 2.Use."

Similar presentations


Ads by Google