S © Siemens Corporate Research, Inc. S I E M E N S C O R P O R A T E R E S E A R C H A Corpus-based Analysis for the Ordering of Clause Aggregation Operators.

Slides:



Advertisements
Similar presentations
© 2014 Systems and Proposal Engineering Company. All Rights Reserved Using Natural Language Parsing (NLP) for Automated Requirements Quality Analysis Chris.
Advertisements

Design by Contract.
TOPIC : SYNTHESIS DESIGN FLOW Module 4.3 Verilog Synthesis.
Seyedehmehrnaz Mireslami, Mohammad Moshirpour, Behrouz H. Far Department of Electrical and Computer Engineering University of Calgary, Canada {smiresla,
Transformations We want to be able to make changes to the image larger/smaller rotate move This can be efficiently achieved through mathematical operations.
F22H1 Logic and Proof Week 7 Clausal Form and Resolution.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Chapter 10 Algorithmic Thinking. Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Learning Objectives List the five essential.
Chapter 20: Natural Language Generation Presented by: Anastasia Gorbunova LING538: Computational Linguistics, Fall 2006 Speech and Language Processing.
Introduction to RST Rhetorical Structure Theory Maite Taboada and Manfred Stede Simon Fraser University / Universität Potsdam Contact:
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
DS-to-PS conversion Fei Xia University of Washington July 29,
Data integration mediation system “ … The mountain is a mountain, The mountain is not a mountain The mountain is a mountain. “ Presented by Taras Mahlin.
MAGIC Seen from the Perspective of RAGS Kathleen R. McKeown Department of Computer Science Columbia University.
Wrapping Up PBL Problems Hal White Dept. of Chemistry and Biochemistry Workshop Wednesday June 28, 2006 about Developed by with who uses Presented on emphasizing.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
Incomplete Contracts Renegotiation, Communications and Theory December 10, 2007.
Basic Scientific Writing in English Lecture 3 Professor Ralph Kirby Faculty of Life Sciences Extension 7323 Room B322.
©Ian Sommerville 2000Software Engineering, 6/e, Chapter 91 Formal Specification l Techniques for the unambiguous specification of software.
Lesson 6. Refinement of the Operator Model This page describes formally how we refine Figure 2.5 into a more detailed model so that we can connect it.
Introduction to SAP R/3.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 10 Slide 1 Formal Specification.
MCA –Software Engineering Kantipur City College. Topics include  Formal Methods Concept  Formal Specification Language Test plan creation Test-case.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 10 Slide 1 Critical Systems Specification 3 Formal Specification.
UML Class Diagrams: Basic Concepts. Objects –The purpose of class modeling is to describe objects. –An object is a concept, abstraction or thing that.
staffs.ac.uk Process Model. staffs.ac.uk Contents Provide definitions Explain the components and representations Introduce a step.
CHAPTER 3: DEVELOPING LITERATURE REVIEW SKILLS
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
©Silberschatz, Korth and Sudarshan5.1Database System Concepts Chapter 5: Other Relational Languages Query-by-Example (QBE) Datalog.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
©Ian Sommerville 2000Software Engineering, 6th edition. Chapter 9 Slide 1 Formal Specification l Techniques for the unambiguous specification of software.
Supporting Automatic Model Inconsistency Fixing Yingfei Xiong University of Tokyo, Japan Zhenjiang HuNational Institute of Informatics, Japan Haiyan ZhaoPeking.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
Concepts and Terminology Introduction to Database.
Centro de Estudos e Sistemas Avançados do Recife PMBOK - Chapter 4 Project Integration Management.
Architecture styles Pipes and filters Object-oriented design Implicit invocation Layering Repositories.
1 Relational Expressions Relational expressions: –Expressions that compare operands –Sometimes called conditions –Evaluated to yield a result –Typically.
Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz.
WXGE6103 Software Engineering Process and Practice Formal Specification.
Methodology: Conceptual Databases Design
The Problem page, Coherence, ideology How an ideological message is conveyed through language, and particularly through the following aspects of textual.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Built-in Data Structures in Python An Introduction.
Culture , Language and Communication
A Genetic Algorithm Approach To Interactive Narrative Generation TeongJoo Ong and John Leggett Texas A&M University.
Learning to Share Meaning in a Multi-Agent System (Part I) Ganesh Padmanabhan.
Topic 4 - Database Design Unit 1 – Database Analysis and Design Advanced Higher Information Systems St Kentigern’s Academy.
Session 1 Module 1: Introduction to Data Integrity
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Inferring Declarative Requirements Specification from Operational Scenarios IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 24, NO. 12, DECEMBER, 1998.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
ARTIFICIAL INTELLIGENCE Lecture 2 Propositional Calculus.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
Yu, et al.’s “A Model-Driven Development Framework for Enterprise Web Services” In proceedings of the 10 th IEEE Intl Enterprise Distributed Object Computing.
Error Explanation with Distance Metrics Authors: Alex Groce, Sagar Chaki, Daniel Kroening, and Ofer Strichman International Journal on Software Tools for.
Object Design More Design Patterns Object Constraint Language Object Design Specifying Interfaces Review Exam 2 CEN 4010 Class 18 – 11/03.
4 - Conditional Control Structures CHAPTER 4. Introduction A Program is usually not limited to a linear sequence of instructions. In real life, a programme.
DATA FLOW DIAGRAMS.
Designing classes How to write classes in a way that they are easily understandable, maintainable and reusable 6.0.
Introduction Chomsky (1984) theorized that language is an innate ability ingrained in all humans as expressed by universal grammar. Later, Mitchell and.
Logical Database Design and the Rational Model
Formal Specification.
Advanced Computer Systems
Reconstructing Ancient Literary Texts from Noisy Manuscripts
Presentation transcript:

s © Siemens Corporate Research, Inc. S I E M E N S C O R P O R A T E R E S E A R C H A Corpus-based Analysis for the Ordering of Clause Aggregation Operators James Shaw Multimedia/Video Technology Department 755 College Road East Siemens Corporate Research, Inc. Princeton, NJ

s S I E M E N S C O R P O R A T E R E S E A R C H COLING Introduction  General goal: Automatic generation of concise and fluent complex sentences. Input propositions:  Jones is a female patient.  Jones has hypertension.  Jones has peptic ulcer.  Jones is 80 years old.  Jones … Output sentence:  Ms. Jones is an 80-year-old hypertensive patient with peptic ulcer underwent coronary artery bypass grafting.  Syntactic clause aggregation operators, i.e.,  Paratactic constructions:  Conjunction transformations  Hypotactic constructions:  Adjective, prepositional phrase, reduced relative clause, and relative clause transformations

s S I E M E N S C O R P O R A T E R E S E A R C H COLING Specific Goal  What is the correct ordering of applying clause aggregation operators in a domain independent natural language generation system?  In our first implementation of MAGIC system (McKeown97), the ordering of the operators is the following:  Paratactic operators first (conjunction transformations)  Hypotactic operators  Paratactic operators again  Why are the paratactic operators applied twice and hypotactic operators only once?  We cannot simply permute all the clause aggregation operators and find an optimal ordering.  Instead of finding an optimal ordering, our goal is to find an ordering which performs well.

s S I E M E N S C O R P O R A T E R E S E A R C H COLING Why the ordering should be identified  Clause aggregation operators are not communicative. Applying one of the operators to input propositions prevents the application of others.  The ordering affects meaning:  Input propositions:  John drank cider  John ate oranges  (even though) John didn’t like fruits  Potential output sentences:  John drank cider and even though he didn’t like fruits, he ate oranges.  Even though John didn’t like fruits, he drank cider and ate oranges.

s S I E M E N S C O R P O R A T E R E S E A R C H COLING Related Work  Syntactically simple expressions of embedding are to be preferred over more complex ones (Scott and de Souza90, Shaw98)  Rhetorical Structure Theory (Mann and Thompson 1988)  Cohesion analysis (Halliday and Hasan76)  Similar to other works in clause aggregation (Moser and Moore95, Rösner and Stede92)  Graphical tools to facilitate discourse annotation (O'Donnell00, Garside and Rayson97)  Automating the discourse annotation (Marcu00)  Ordering of applying the same operator, such as adjective transformation (Shaw99, Malouf00)

s S I E M E N S C O R P O R A T E R E S E A R C H COLING Methodology 1. Collect a corpus with sentences containing paratactic and hypotactic constructions 2. De-aggregate those sentences into propositions 3. Specify rhetorical relations between the propositions 4. Specify a sequence of transformation operators to combine the de- aggregated proposition into the original sentences 5. Evaluate how well the proposed operator ordering works by check the sequence of transformation against our propose ordering of clause aggregation operators.

s S I E M E N S C O R P O R A T E R E S E A R C H COLING Corpus Collection  Corpus is taken from from medical domain and Wall Street Journal.  Only sentence contain the conjunctor “and” are selected to increase the likelihood of encountering interactions between paratactic and hypotactic constructions.  Due to the amount of effort needed to annotate complex sentences, only 100 sentences from each domain are annotated.

s S I E M E N S C O R P O R A T E R E S E A R C H COLING Corpus Annotation  Use XML as the markup language  Manual de-aggregation by the author  Each annotated sentence entry consists of 5 parts:  The original sentence.  A list of de-aggregated propositions after manual reconstruction of ellided constituents. These propositions are enclosed in propset.  The rhetorical relations which link the de-aggregated propositions or propset.  The sequence of transformations that can be used to reproduce the original sentence.  The annotator’s comments.

s S I E M E N S C O R P O R A T E R E S E A R C H COLING Annotation Section 1 & 2: The original sentence & the propositions  Section 1 (Original sentence): “Local sports fans themselves, long known for their passive demeanor at games and propensity to leave early, don’t resist the image.”  Section 2 (Propositions) Local sports fans don't resist the image. Local sports fans are long known for their passive demeanor at games. Local sports fans are long known for their propensity to leave early.  Section 3 (Rhetorical relations):

s S I E M E N S C O R P O R A T E R E S E A R C H COLING Annotated Section 3 & 4: Rhetorical relations and operator sequences  Section 2 (Propositions): Local sports fans don't resist the image. Local sports fans are long known for their passive demeanor at games. Local sports fans are long known for their propensity to leave early.  Section 4 (Operator sequences): Local sports fans are long known for their passive demeanor at games and local sports fans are long known for their propensity to leave early. Local sports fans themselves, who are long known for their passive demeanor at games and propensity to leave early, don’t resist the image.

s S I E M E N S C O R P O R A T E R E S E A R C H COLING Annotated Section 5: Annotator’s comment  Our proposed aggregation operator ordering: 1. Adjective (conjunction optional) 2. Prepositional phrase (conjunction optional) 3. Reduced relative clause, including apposition (conjunction optional) 4. Relative clause (conjunction optional) 5. Transformations for other rhetorical relations (conjunction optional) 6. Simple conjunction 7. Complex conjunction  Section 5 (annotator’s comment):  If the sequence of transformation does not differ from the proposed ordering, seqorder is assigned true.

s S I E M E N S C O R P O R A T E R E S E A R C H COLING The concept of Propset  Issues  Simply de-aggregated propositions did not provide sufficient information to reproduce the original sentence.  John drank cider  John ate oranges  (even though) John didn’t like fruits They can be mapped to either one of the following sentences:  John drank cider and even though he didn’t like fruits, he ate oranges.  Even though John didn’t like fruits, he drank cider and ate oranges.  The number of rhetorical relations can be greater than number of propositions, up to for n propositions.

s S I E M E N S C O R P O R A T E R E S E A R C H COLING The Benefits of using Propset  Propset allows annotators to do the following:  Group related propositions are more tightly related, i.e., a smoker quit 10 years ago.  Specify the scope of modifying propositions, as the earlier example.  Simply annotation for certain constructions, such as “say” and “believe”: [ propset [ prop John believed [ propset [ prop Tim invested in stock] [ prop Tim invested in real estate] ] ] ] Instead of [ prop John believed Tim invested in stock] [ prop John believed Tim invested in real estate]  Minimize redundant specification of multiple modifying rhetorical relations

s S I E M E N S C O R P O R A T E R E S E A R C H COLING Why minimize redundancy rhetorical relations?  The sentence: Even though John didn’t like fruits, he drank cider and ate oranges.  Input propositions: John drank cider. John ate oranges. (even though) John didn’t like fruits. Instead of The annotated relations are  The number of rhetorical relations is always n-1, n is the number of propositions.  Simplifies clause aggregation algorithm because one transformation maps to one rhetorical relation.

s S I E M E N S C O R P O R A T E R E S E A R C H COLING Results  In our evaluation, we assume the operators applied earlier should result in constituents closer to the head than the constituents result from operators applied later.  200-sentence corpus was de-aggregated manually (average sentence length is 23 words)  763 propositions  3.8 propositions per sentence  2.6 transformations per sentence  523 rhetorical relations, with 440 relations either Elaboration, Joint, or Sequence.  20% of the annotated transformations cannot not be handled in by our operators.  195 out of 200 original sentence can be re-synthesized using the proposed ordering.  Why such a good result?  The use of propset in the annotation removed many potential conflicts.

s S I E M E N S C O R P O R A T E R E S E A R C H COLING What We Learned?  The use of propset during discourse annotation is very useful for discourse annotation.  The first application of paratactic operator is a sub-step of the hypotactic operation which combines satellite propositions with similar syntactic structures and modify the same entity in the nucleus proposition.  The correct ordering of operators is a 2-stage process 1. hypotactic operators (with conjunction operator optional as local optimization) 2. paratactic operators  Hypotactic operators are applied first because their operations are local in nature, “Bob is a reputable stock-broker [with deep pocket] who is interested in dot-coms.” In contrast, paratactic operators are is not local. They are sensitive to the surface position of identical constituents across all the propositions being combined -- directional constraint (Ross70, Shaw98b).

s S I E M E N S C O R P O R A T E R E S E A R C H COLING Conclusion  Even though researchers have studied rhetorical relations in conjunction with clause aggregation operations, the explicit use of propset in discourse annotation in such a context is new.  We explained why some paratactic operators are applied before hypotactic operators while others are applied afterward.  By imposing our proposed ordering onto de-aggregated propositions and try to re-synthesize the original sentences, we identified an ordering of clause aggregation operators which work well based on a human-written corpus.  Such ordering can be implemented and reused in domain independent natural language generation systems to create complex sentences that are also concise and fluent.