Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP Lab, Soochow University.

Slides:

Advertisements

Similar presentations

Automatic summarization Dragomir R. Radev University of Michigan

Advertisements

CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng.

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.

R Yun-Nung Chen 資工碩一陳縕儂 1 /39.  Non-projective Dependency Parsing using Spanning Tree Algorithms (HLT/EMNLP 2005)  Ryan McDonald, Fernando.

计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.

Dependency Parsing Some slides are based on:

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.

Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

Predicting Cloze Task Quality for Vocabulary Training Adam Skory, Maxine Eskenazi Language Technologies Institute Carnegie Mellon University

Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University

1 Quasi-Synchronous Grammars  Based on key observations in MT: translated sentences often have some isomorphic syntactic structure, but not usually in.

Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.

A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.

Learning to Predict Structures with Applications to Natural Language Processing Ivan Titov TexPoint fonts used in EMF. Read the TexPoint manual before.

Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.

Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Zhenghua Li, Jiayuan Chao, Min Zhang, Wenliang Chen {zhli13, minzhang, Soochow University, China Coupled Sequence.

Webpage Understanding: an Integrated Approach

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

Summary  The task of extractive speech summarization is to select a set of salient sentences from an original spoken document and concatenate them to.

 Motivation & Previous Work  Sentence Compression Approach  Linguistically-motivated Heuristics  Word Significance  Compression Generation and Selection.

Syntax Analysis The recognition problem: given a grammar G and a string w, is w  L(G)? The parsing problem: if G is a grammar and w  L(G), how can w.

Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.

A Language Independent Method for Question Classification COLING 2004.

Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.

Part E: conclusion and discussions. Topics in this talk Dependency parsing and supervised approaches Single model Graph-based; Transition-based; Easy-first;

Adding Semantics to Clustering Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang Microsoft Research Asia, Beijing, P.R.China Department of Computer.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.

Approximation Algorithms For Protein Folding Prediction Giancarlo MAURI,Antonio PICCOLBONI and Giulio PAVESI Symposium on Discrete Algorithms, pp ,

Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.

INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.

Algorithmic Detection of Semantic Similarity WWW 2005.

A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪

2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.

The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,

A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

Genetic Programming. What is Genetic Programming? GP for Symbolic Regression Other Representations for GP Example of GP for Knowledge Discovery Outline.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

WikiSimple – Automatic Simplification of Wikipedia Articles By Kristian Woodsend and Mirella Lapata Presented by Kira Belkin 05/

Web News Sentence Searching Using Linguistic Graph Similarity

Statistical NLP Spring 2011

A Path-based Transfer Model for Machine Translation

Statistical NLP Spring 2011

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

Presentation transcript:

Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP Lab, Soochow University

 Introduction  Related Work  Sentence Compression based on ILP  Experiments  Conclusion Outline

Introduction(1) Definition of Sentence Compression –It aims to shorten a sentence x=l 1,l 2,……,l n into a substring y=c 1,c 2,……c m, where c i ∈ { l 1,l 2,……,l n }. Example: –Original Sentence: 据法新社报道，有目击者称，以军 23 日空袭加沙地带中部，目前尚无伤亡报告。 –Target Sentence: 目击者称以军空袭加沙地带中部

Introduction(2) Sentence compression has been widely used in: –Summarization –Automatic title generation –Searching engine –Topic detection –…

Related Work(1) Mainstream solution – corpus-driven supervised leaning –Generative model To select the optimal target sentence by estimating the joint probability P(x, y) of original sentence x having the target sentence y. –Discriminative model

Related Work(2) Generative model –Knight & Marcu (2002) firstly apply the noisy- channel model for sentence compression. –Shortcomings: the source model is trained on uncompressed sentences – inaccurate data the channel model requires aligned parse trees for both compressed and uncompressed sentences in the training set -- alignment difficult and the channel probability estimates unreliable

Related Work(3) Discriminative model –McDonald(2006) used max-margin relaxed algorithm (MIRA) to study the feature weight, then rank the subtrees, and finally select the tree with the highest score as the optimal target sentence. –Cohn & Lapata (2007, 2008, and 2009) formulated the compression problem as tree-to-tree rewriting using a synchronous grammar. Each grammar rule is assigned a weight which is learned discriminatively within a large margin model. –Zhang et al. (2013) compressed sentences based on Structured SVM model which treats the compression problem as a structured learning problem

Our Method The sentence compression problem is treated as a structured learning problem followed Zhang et al.(2013) –Learning a subtree from the original sentence parse tree as its compressed sentence –Formulating the problem of finding the optimal subtree to an ILP decoding problem

The Framework of SC

Sentence Compression based on ILP Linear objective function x is the original sentence syntactic tree, y is the target subtree is the feature function of bi-gram and trimming features from x to y, w is the vector of feature weight

Linear constrains n i for each non-terminal node – where n i is the parent node of n j w i for each terminal node – w i = n j, where n j is the POS node of word w i f i for the i th feature –if f i =1 ， the i th feature appears; or, the feature doesn’t appear – According to the restrictions of feature value, the corresponding linear constrains are added –f i =1-w i

Features – Word/POS Features –the remaining word’s bigram POS PosBigram ( 目击者称 ) = NN&VV –whether the dropped word is a stop word IsStop ( 据 ) = 1 –whether the dropped word is the headword of the original sentence –the number of remaining words.

Features – Syntax features the parent-children relationship of the cutting edge –del-Edge (PP) = IP-PP the number of the cutting edge the dependant relation between the dropped word and its dependence word –dep_type( 有 )=DEP the relation chain of the dropped word’s POS with its dependence word’s POS –dep_link ( ， ) = PU-VMOD-VV whether the dependence tree’s root is deleted –del_ROOT ( 无 ) = 1 whether each dropped word is a leaf of the dependence tree –del_Leaf ( 法新社 ) = 1

Loss Function Function 1 –the loss ratio of bigram of the remaining word in original sentence Function 2: word loss-based function –the sum of the number of the words deleted by mistake and the number of the words remained by mistake between the predict sentence and the gold target sentence

Evaluation manual evaluation –Importance –Grammaticality automatic evaluation –compression ratio (CR) (0.7~10) – BLEU score

Experimental settings Parallel corpus extracted from news documents Stanford Parser Alignment tool developed by our own Structured SVM

Experimental results Compared to the McDonald’s decoding method, the system based ILP decoding method achieves a comparable performance using simpler and less features

Conclusions the problem of sentence compression is formulated as a problem of finding an optimal sub-tree using ILP decoding method. Compared to the work using McDonald’s decoding method, the system which only uses simpler and fewer features achieves a comparable performance on same conditions.