Janardhan Rao (Jana) Doppa, Alan Fern, and Prasad Tadepalli

HC-Search: Learning Heuristics and Cost Functions for Structured Prediction
Janardhan Rao (Jana) Doppa, Alan Fern, and Prasad Tadepalli Structured Prediction HC-Search Framework Heuristic Function Learning: Conduct searches on training examples using the true loss function as a heuristic (generally is a good way to produce high-quality outputs) Key idea: Learn a heuristic function to imitate the observed search behavior with loss function Illustration: Ranking examples for greedy search Cost Function Learning: Key Idea: Learn to rank the outputs generated by the learned heuristic function 𝑯 as per their losses Given: a set of structured input-output pairs of the form 𝑥, 𝑦 Handwriting recognition 𝑥= 𝑦= s t r u c t u r e d Image labeling 𝑥 = 𝑦= Learn: a function 𝐹 :𝑋 →𝑌 to make predictions on new inputs Evaluation: against a loss function 𝐿 𝑥,𝑦, 𝐹 𝑥 ∈ 𝑅 + Hamming loss, F1 score, B3 score … Key Elements: Search space over structured outputs Cost function 𝑪 to score the potential outputs Time-bounded search procedure to find low cost outputs Heuristic function 𝑯 to make the search efficient Search-based Inference: Run a time-bounded search procedure guided by the heuristic function 𝑯 toward low cost parts of the space Return least cost output uncovered by the search procedure Illustration: Properties Anytime predictions Minimal restrictions on the complexity of heuristic function and cost function Rank Learner Heuristic function 𝑯 Key Challenge: “Argmin” inference Experimental Results Existing Approaches: CRFs, Structured SVM … Learn parameters of linear models 𝜙 𝑥,𝑦 is n-dim feature vector over input-output pairs w is n-dim parameter vector F(x) = 𝒂𝒓𝒈 𝐦𝒊𝒏 𝒚∈𝒀 𝒘⋅𝝓(𝒙,𝒚) “Argmin” inference: Find the min. scoring output in an exponentially large set of possible outputs Computationally hard (NP-hard) for all but simplest dependency structure of features 𝜙(𝑥,𝑦) Approximate inference algorithms (e.g., Loopy BP) are generally used in practice Violates exact inference assumptions of most learning algorithms, and behavior not well understood Our Approach: Specify a time-bounded search architecture for “Argmin” inference Learn to do as well as we can within that architecture Assumption: we can learn to do well within the selected architecture Greedy search in Limited Discrepancy Search (LDS) space [Doppa et al., 2012] HC-Search outperforms state-of-the-art algorithms including C-Search (our prior approach that learns a single function C to also serve as heuristic) [Doppa et al., 2012] Loss Decomposition Results: Selection loss 𝝐 𝑪|𝑯 contributes more to the overall loss for both C-Search and HC-Search Improvement of HC-Search over C-Search is due to the improvement in the selection loss Clearly shows the advantage of separating the roles of heuristic and cost function Take Home Message: viewing structured prediction as a search problem is useful When applying to new problems: design a “high quality” search space (e.g., LDS space), pick a suitable search procedure, and use HC-Search “loss decomposition” to train and debug |Y| is generally exponentially large HC-Search: Learning Loss Decomposition: 𝝐 : Overall expected loss 𝝐 𝑯 : Expected loss due to not generating optimal output 𝝐 𝑪|𝑯 : Expected loss due to not picking the best generated output Key idea: Greedy stage-wise minimization guided by the loss decomposition Step 1: 𝐻 = arg 𝑚𝑖𝑛 𝐻∈𝑯 𝜖 𝐻 (heuristic training) Step 2: 𝐶 = arg 𝑚𝑖𝑛 𝐶∈𝑪 𝜖 𝐶| 𝐻 (cost function training)

Janardhan Rao (Jana) Doppa, Alan Fern, and Prasad Tadepalli

Similar presentations

Presentation on theme: "Janardhan Rao (Jana) Doppa, Alan Fern, and Prasad Tadepalli"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Janardhan Rao (Jana) Doppa, Alan Fern, and Prasad Tadepalli

Similar presentations

Presentation on theme: "Janardhan Rao (Jana) Doppa, Alan Fern, and Prasad Tadepalli"— Presentation transcript:

Similar presentations

About project

Feedback