A Lightweight and High Performance Monolingual Word Aligner Xuchen Yao, Benjamin Van Durme, (Johns Hopkins) Chris Callison-Burch and Peter Clark (UPenn)

A Lightweight and High Performance Monolingual Word Aligner Xuchen Yao, Benjamin Van Durme, (Johns Hopkins) Chris Callison-Burch and Peter Clark (UPenn) (Vulcan)

2013-8-6ACL 2013, Sofia2 monolingual word alignment Aligning one sentence pair from RTE2 Premise: Linda Johnson, who lives with her husband, Charles, and two cats in..., said Katrina has... Hypothesis: Linda Johnson is married to Charles alignment contributed by Brockett (2007)

2013-8-6ACL 2013, Sofia3 monolingual vs. bilingual aligment less training data (labeled or unlabeled), but more lexical resources semantic relatedness: cued by distributional word similaries the same grammar shared by source/target sentences

2013-8-6ACL 2013, Sofia6 a discriminative model first proposed by Blunsom and Cohn (2006): s, t: source (observation), target sentence a: target word indices (0 to target length), state 0 is NULL state for deletion. f(): feature functions

2013-8-6ACL 2013, Sofia9

2013-8-6ACL 2013, Sofia10 desired Viterbi decoding path

2013-8-6ACL 2013, Sofia12 features string similarity –Jaro Winkler, Dice Sorensen, Hamming, Jaccard, Levenshtein, NGram overlapping and common prefix matching POS tags matching WordNet –hypernym, hyponym, synonym, derived form, entailing, causing, members of, have member, substances of, have substances, parts of, have part

2013-8-6ACL 2013, Sofia15 features positional –offset difference between src/tgt word context –whether neighboring words are similar –helps to align functional words distortion (Markov feature) –how far apart are two aligned target words

2013-8-6ACL 2013, Sofia18 Implementation: jacana-align source code at http://code.google.com/p/jacana lightweight: only used a POS tagger and WordNet written in Scala, optimize with LBFGS platform independent, compiles to a.jar file, fully interoperable with Java high performance? -> evaluation

2013-8-6ACL 2013, Sofia19 Baselines GIZA++ Tree Edit Distance (with stem/wordnet matching) MANLI –MacCartney, B.; Galley, M. & Manning, C. D., A Phrase-Based Alignment Model for Natural Language Inference, EMNLP 2008 MANLI-constraint (decoding with ILP) –Thadani, K. & McKeown, K. Optimal and syntactically-informed decoding for monolingual phrase-based alignment. ACL 2011

2013-8-6ACL 2013, Sofia23 performance in F1 10.3%

2013-8-6ACL 2013, Sofia24 performance in F1 0.8% 3.3%

2013-8-6ACL 2013, Sofia25 performance in speed (seconds per sentecne) when sentences are more balanced, jacana- align is about 20x faster corpussentence pair length MANLI-approx.MANLI-exactjacana-align RTE229/111.67s0.08s0.025s FUSION27/2761.96s2.45s0.096s 20x

2013-8-6ACL 2013, Sofia26 performance in speed (seconds per sentecne) the speed of jacana-align is not as sensitive to sentence length increase corpussentence pair length MANLI-approx.MANLI-exactjacana-align RTE229/111.67s0.08s0.025s FUSION27/2761.96s2.45s0.096s 30x 4x

2013-8-6ACL 2013, Sofia27 Conclusion state-of-the-art monolingual word aligner –in accuracy –in speed open source, use it and hack it!

thank you with a demo

A Lightweight and High Performance Monolingual Word Aligner Xuchen Yao, Benjamin Van Durme, (Johns Hopkins) Chris Callison-Burch and Peter Clark (UPenn)

Similar presentations

Presentation on theme: "A Lightweight and High Performance Monolingual Word Aligner Xuchen Yao, Benjamin Van Durme, (Johns Hopkins) Chris Callison-Burch and Peter Clark (UPenn)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Lightweight and High Performance Monolingual Word Aligner Xuchen Yao, Benjamin Van Durme, (Johns Hopkins) Chris Callison-Burch and Peter Clark (UPenn)

Similar presentations

Presentation on theme: "A Lightweight and High Performance Monolingual Word Aligner Xuchen Yao, Benjamin Van Durme, (Johns Hopkins) Chris Callison-Burch and Peter Clark (UPenn)"— Presentation transcript:

Similar presentations

About project

Feedback