Efficient Inference on Sequence Segmentation Models

Efficient Inference on Sequence Segmentation Models
Sunita Sarawagi IIT Bombay

Sequence segmentation models
Flexible & accurate models for many applications Speech segmentation on phonemes Syntactic chunking Protein/Gene finding Information extraction with entity-level features Whole entity match with database of entities Length of entity between 3 and 8 words Third or fourth token of entity a “-” Last three tokens are digits From Keshet et al ’05 NIPS wkshp

Sequence Vs. Segmentation Models
1 2 3 4 5 6 7 8 R. Fagin and J. Halpern Belief Awareness Reasoning Author Other Title x y Features describe the single word “Fagin” y1 y2 y3 y4 y5 y6 y7 y8 l,u l1=1, u1=2 l1=u1=3 l1=4, u1=5 l1=6, u1=8 R. Fagin and J. Halpern Belief Awareness Reasoning Author Other Title x Degree of match of entire entity with the entity in the database y Similarity to author’s column in database Features describe full entity

Segmentation models Input: sequence x=x1,x2..xn, label set Y
Output: segmentation S=s1,s2…sp sj = (start position, end position, label) = (tj,uj,yj) Score: F(x,s) = Transition potentials Segment starting at i has label y and previous label is y’ Segment potentials Segment starting at i’, ending at i, and with label y. All positions from i’ to i get same label. Inference Most likely segmentation (Max-margin trainers) Marginal around segments (likelihood-based & exponentiated-gradient trainers)

Inference: Marginal for a segment
Forward messages (L = max segment length) O(n L2) Matrix notation: for L = n Segment Marginal y1 y2 y3 y4 y5 y6 y7 y8

Goal Speed up segmentation models
Currently 3—8 times slower than sequence models Eliminate L, the hard limit on segment length Efficiently handle mix of potentials spanning varying number of tokens Pay the penalty of segmentation models only for longer entity level features instead of all of them Empirical results on extraction tasks: Segmentation models with few entity features: higher accuracy at the same cost as sequence models

Succinct potentials Key insight Main challenge
Compactly represent features on overlapping segments Main challenge Inference algorithms on compact potentials where cost is independent of segments a potential applies to Four kinds of potentials

Applications with mixed potentials
Named Entity Recognition Speech segmentation on phonemes TODO: Add one more example

Efficient Inference: forward pass
Sharing computation Split potentials (y-s) into two parts: different Common to all segments ending after i-1 Common to all segments starting before i-m Maximum gap between boundary of any y

Optimized forward pass
Two sets of modified forward messages y1 y2 y3 y4 y5 y6 y7 y8 y9 O(n m2) Similar two sets of backward messages TODO: second alpha a<= i-1, remove combined alpha theta B9 Same strategy for max-product inference

Marginals around potentials
Direct computation of marginals is O(n2) Reduced to O(1) by two tricks Decomposing potentials as in a,b Sharing computations across adjacent potentials Direct Optimized a bit more tricky

Complexity and data structures
Complexity of computing marginals Optimized: O(nm+H), Original: O(nL+G) H = number of features in succinct form G = O(L2H) (In real-data |G| = times |H|) Achieved via incremental computation of q Special data structure for storing y to compute qi’:i in O(1) time from previous qi’:i-1 Marginals m computed in sorted order: increasing start boundary, decreasing end boundary TODO:

Empirical evaluation Task: Features: Methods
Citations: Cora, articles (L=20) Address: Indian address (L=7) Features: Token-level Orthographic properties/lexicon match of words at the start, end, middle, left, right of segment Entity-level TFIDF Match with lexicon, entity length Methods Sequence-BCEU: Begin-Continue-End-Unique labels Segment: Original un-optimized algorithm Segment-Opt: Optimized inference with compact potentials

Running time and Accuracy

Limit on segment length (L)
L (Hard limit on segment length) Too small  reduced accuracy 9081 Too large  increased running time 30 minutes  1 hour m (Maximum entity-level features) Reduced by half  accuracy still 3% higher than Sequence Too large  running time does increases by only 30%

Code: http://crf.sourceforge.net
Concluding remarks Segmentation models: natural, flexible, accurate Main limitation: inference expensive Solved via a compact design of shared potentials New efficient inference algorithms Pays penalty of entity-level features only when needed Running time comparable to sequence models No hard limit on segment length Future work: Features that are functions of distance from boundary Other models: 2-D segmentation? Code:

Efficient Inference on Sequence Segmentation Models

Similar presentations

Presentation on theme: "Efficient Inference on Sequence Segmentation Models"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient Inference on Sequence Segmentation Models

Similar presentations

Presentation on theme: "Efficient Inference on Sequence Segmentation Models"— Presentation transcript:

Similar presentations

About project

Feedback