Authors Sebastian Riedel and James Clarke Paper review by Anusha Buchireddygari Incremental Integer Linear Programming for Non-projective Dependency Parsing.

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

Constraint Satisfaction Problems

Thursday, April 11 Some more applications of integer

Iterative Rounding and Iterative Relaxation

1 LP, extended maxflow, TRW OR: How to understand Vladimirs most recent work Ramin Zabih Cornell University.

G5BAIM Artificial Intelligence Methods

Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.

Constraint Optimization We are interested in the general non-linear programming problem like the following Find x which optimizes f(x) subject to gi(x)

Solving IPs – Cutting Plane Algorithm General Idea: Begin by solving the LP relaxation of the IP problem. If the LP relaxation results in an integer solution,

Sub Exponential Randomize Algorithm for Linear Programming Paper by: Bernd Gärtner and Emo Welzl Presentation by : Oz Lavee.

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

EMIS 8373: Integer Programming Valid Inequalities updated 4April 2011.

Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.

Jie Gao Joint work with Amitabh Basu*, Joseph Mitchell, Girishkumar Stony Brook Distributed Localization using Noisy Distance and Angle Information.

Recent Development on Elimination Ordering Group 1.

Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.

Approximation Algorithms

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:

Distributed Combinatorial Optimization

Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding ILPs with Branch & Bound ILP References: ‘Integer Programming’

Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Efficient Model Selection for Support Vector Machines

1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.

Clearing Algorithms for Barter Exchange Markets: Enabling Nationwide Kidney Exchanges Hyunggu Jung Computer Science University of Waterloo Oct 6, 2008.

Computational Geometry Piyush Kumar (Lecture 5: Linear Programming) Welcome to CIS5930.

Nonlinear Programming.  A nonlinear program (NLP) is similar to a linear program in that it is composed of an objective function, general constraints,

Operations Research Assistant Professor Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University of Palestine.

INFERENCE Shalmoli Gupta, Yuzhe Wang. Outline Introduction Dual Decomposition Incremental ILP Amortizing Inference.

Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.

The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

15.053Tuesday, April 9 Branch and Bound Handouts: Lecture Notes.

Soham Uday Mehta. Linear Programming in 3 variables.

Branch-and-Cut Valid inequality: an inequality satisfied by all feasible solutions Cut: a valid inequality that is not part of the current formulation.

Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Integer Programming (정수계획법)

Course 8 Contours. Def: edge list ---- ordered set of edge point or fragments. Def: contour ---- an edge list or expression that is used to represent.

Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.

Vasilis Syrgkanis Cornell University

C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.

© 2009 Prentice-Hall, Inc. 7 – 1 Decision Science Chapter 3 Linear Programming: Maximization and Minimization.

Common Intersection of Half-Planes in R 2 2 PROBLEM (Common Intersection of half- planes in R 2 ) Given n half-planes H 1, H 2,..., H n in R 2 compute.

Sullivan Algebra and Trigonometry: Section 12.9 Objectives of this Section Set Up a Linear Programming Problem Solve a Linear Programming Problem.

Constraint Programming for the Diameter Constrained Minimum Spanning Tree Problem Thiago F. Noronha Celso C. Ribeiro Andréa C. Santos.

Approximation Algorithms based on linear programming.

Tabu Search for Solving Personnel Scheduling Problem

Language Identification and Part-of-Speech Tagging

Learning to Align: a Statistical Approach

Linear Programming for Solving the DSS Problems

Data Driven Resource Allocation for Distributed Learning

Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.

Boosted Augmented Naive Bayes. Efficient discriminative learning of

David Mareček and Zdeněk Žabokrtský

The minimum cost flow problem

CIS 700 Advanced Machine Learning for NLP Inference Applications

1.3 Modeling with exponentially many constr.

Analysis of Algorithms

Chapter 3 The Simplex Method and Sensitivity Analysis

Integer Programming (정수계획법)

3-3 Optimization with Linear Programming

1.3 Modeling with exponentially many constr.

Integer Programming (정수계획법)

Branch-and-Bound Algorithm for Integer Program

Integer LP: Algorithms

Multidisciplinary Optimization

Presentation transcript:

Authors Sebastian Riedel and James Clarke Paper review by Anusha Buchireddygari Incremental Integer Linear Programming for Non-projective Dependency Parsing

Cutting plane Algorithm Structured Prediction Figure above shows a convex region of feasible solutions defined by several constraints. The grid indicates where inside the polygon the feasible integer solutions lie The dot represents the optimal solution (for the linear programming problem) gained from maximizing x 1 + x 2 Note that although it is not an integer solution it is the upper bound for an optimum one

Why this paper? Integer Linear Programming (ILP) was applied for inference of sequential conditional random fields(Roth and Yin,2004). Exponential number of constraints required to prevent cycles occurring in the dependency graph. Model these constraints using ILP is too large to solve efficiently.

What does the paper solve? Method which extends applicability of ILP to more complex set of problems. Instead of adding all the constraints authors tried to solve with a fraction of constraints. Solution is examined and additional constraints are added if required. Procedure is repeated until all the constraints are satisfied.

What’s the problem the authors picked? Authors applied dependency parsing approach to Dutch due to language’s non-projective nature. Took the parser of McDonald et al as starting point for our model. “I will come at twelve and then you will get what you deserve”.

What’s Dependency Parsing? Task of attaching words to arguments. Dependency graph “kom” attached to “ik”. “kom” is head for “ik” and “ik” is child. Dependency tree every token must be the child of exactly one other node, either another token or dummy root token. It cannot have cycles like “en”-> “kom” -> “ik” - >”en”.

How the model looks like? X is a sentence, y is a set of labelled dependencies F(I,j,l) multi dimensional feature vector of the edge from token I to token j with label l

Constraints Decoding/Inference in this model is to maximize T1 for every non-root token in x there exists exactly one head; the root token has no head T2 There are no cycles This corresponds to maximum spanning tree problem.

Linguistic constraints A1 Heads are not allowed to have more than one outgoing edge labelled l for all l in a set of labels U. A1 tells that there can be only one “subject” in the sentence C1 In a symmetric coordination there is exactly one argument to the right of the conjunction and at least one argument on the left. C1 applies to “and”,”or”,”but”

Some more linguistic constraints ● C2 In an asymmetric coordination there are no arguments to the left of the conjunction and at least two arguments to the right. ● Example is “both” having arguments to its left. ● There are other such constraints total of 8 constraints in the paper.

Process ● The function to be maximized is called objective function Ox. ● Variables Vx, e(I,j,l) is 1 if edge exists else 0 ● Base constraints ● T1

Constraints representation A1 C1

Incremental Constraints T2

Algorithm For a sentence x, Bx is the set of constraints that we add in advance. Ix are the constraints we add in incrementally Ox is objective function and Vx is a set of variables including variable declaration.

What happens in the algorithm? Solve(C,O,V) maximizes the objective function O with respect to the set of constraints C and variables V. Violated(y,I) inspects the proposed solution (y) and returns all constraints in I which are violated

Interesting result The number of iterations is most polynomial with respect to number of variables. In practice this technique converges quickly i.e. less than 20 iterations in 99% approximately 12,000 sentences. Yielding average solve times of less than 0.5 seconds.

Experiment How much do our additional constraints help improve accuracy? How fast is our generic inference method in comparison with Chu-Liu- Edmonds algorithm? Can approximations be used to increase the speed of our method while remaining accurate? Data Alpino Treebank 13,300 sentences with a average length of 14.6 tokens Environment Intel Xeon with 3.8 Ghz and 4 GB RAM. Mixed Integer Programming lib IP_Solve. Code ran on Java and called JNI-wrapper aroung IP_solve lib Feature Sets Along with POS tags there are additional attributes like gender,number and case. Combined attr of head to child

Results Accuracy Nl is the number of tokens with correct head and label and Nt is the total number of tokens. Unlabelled accuracy Nu is the number of tokens with correct head

Accuracy Results table bl means baseline without any linguistic constraints which is compared it to a system with additional constraints (cnstr) There are problems the system suffers from poor next best solution due to inaccurate local score distributions.

Runtime Efficiency Average solve time (ST) for sentences with respect to no of tokens in each sentence. Approximation Total runtime of the system is competitive to CLE.

Discussion Higher order features by using extended set of variables and a modified function which is likely to increase runtime. Fast for real world applications. Large time for first iteration after that algorithm uses last state to efficiently search for solutions in the presence od new constraints. Exponential blow when the cycle constraints are added

Conclusion Novel approach to inference of ILP Efficiently use ILP for dependency parsing Slower than baseline approach but parses large sentences with more than 50 tokens. Parsing time can be significantly reduced using a simple approximation like from q=10 to q=5 which will only marginally degrades the performance from 85% to 84% in the table we have seen in approximation.

THANK YOU Questions?