Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Supporting your child with NAPLAN
Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.
Adverbs and Adjectives
© Paradigm Publishing, Inc Word 2010 Level 2 Unit 1Formatting and Customizing Documents Chapter 2Proofing Documents.
Semantics (Representing Meaning)
December 2003CSA3050: Natural Language Generation 1 What is Natural Language Generation? When is NLG an Appropriate Technology? NLG System Architectures.
 To show reason/cause › Because › Since › As  To show contrast › Although › Though › Even though › while  To show time relationship › After › Before.
Statistical NLP: Lecture 3
MORPHOLOGY - morphemes are the building blocks that make up words.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
The Eight Parts of Speech
Designing a Continuum of Learning to Assess Mathematical Practice NCSM April, 2011.
Natural Language Generation: Discourse Planning
Chapter 20: Natural Language Generation Presented by: Anastasia Gorbunova LING538: Computational Linguistics, Fall 2006 Speech and Language Processing.
Natural Language Generation Research Presentation Presenter Shamima Mithun.
MAGIC Seen from the Perspective of RAGS Kathleen R. McKeown Department of Computer Science Columbia University.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
Natural Language Generation Martin Hassel KTH CSC Royal Institute of Technology Stockholm
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Natural Language Generation Ling 571 Fei Xia Week 8: 11/17/05.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Meaning and Language Part 1.
Designed by Elisa Paramore
Unit One: Parts of Speech
PRONOUNS.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Introduction to Natural Language Generation
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 14, Feb 27, 2007.
Glossing – Lesson 3 Omit English words that do not exist in ASL.
9/8/20151 Natural Language Processing Lecture Notes 1.
Introduction to English Syntax Level 1 Course Ron Kuzar Department of English Language and Literature University of Haifa Chapter 2 Sentences: From Lexicon.
MECHANICS OF WRITING C.RAGHAVA RAO.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Final Review.  Consists of 60 Multiple Choice Questions  Skills include:  Reading Comprehension  Commonly Confused Words  Subject-Verb Agreement.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 11.
The verb of a sentence expresses an action or simply states a fact. Verbs that simply state a fact are often called state of being verbs or verbs of existence.
English II March 18 – March 22. Daily Grammar Practice – Monday Notes Write out the sentence and identify parts of speech (noun, verb, adjective, etc.)
CS3773 Software Engineering Lecture 04 UML Class Diagram.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
Introduction to Computational Linguistics
Rules, Movement, Ambiguity
Natural Language Processing
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Natural Language Generation Martin Hassel KTH NADA Royal Institute of Technology Stockholm
Natural Language Processing Slides adapted from Pedro Domingos
SYNTAX.
Levels of Linguistic Analysis
NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)
1 Some English Constructions Transformational Framework October 2, 2012 Lecture 7.
Natural Language Processing (NLP)
Basic Syntactic Structures of English CSCI-GA.2590 – Lecture 2B Ralph Grishman NYU.
NATURAL LANGUAGE PROCESSING
GRAMMAR AND PUNCTUATION REVISE AND REVIEW WORD CLASSES.
Why languages differ: Variation in the conventionalization of constraints on inference By: Randy J. LaPolla City University of Hong Kong Presented by:
Parts of speech English Grade 9 Kaleena Ortiz PARTS OF SPEECH Noun Pronoun Adjective AdverbVerbPreposition Conjunction Interjection Click here for this.
Lec. 10.  In this section we explain which constituents of a sentence are minimally required, and why. We first provide an informal discussion and then.
Implicature. I. Definition The term “Implicature” accounts for what a speaker can imply, suggest or mean, as distinct from what the speaker literally.
Child Syntax and Morphology
Statistical NLP: Lecture 3
English 108 Final Review.
Semantics (Representing Meaning)
Chapter 11 user support.
Introduction to Text Analysis
Presentation transcript:

Summer School on Natural Language Processing and Text Mining 2008 Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science & Engineering IIT Kharagpur

Text Language Technology Natural Language Understanding Natural Language Generation Speech Recognition Speech Synthesis Text Meaning Speech

What is NLG? Thought / conceptualization of the world  Expression The block c is on block a The block a is under block c The block b is by the side of a The block b is on the right of a The block b has its top free The block b is alone ………

Conceptualization  Some intermediate form of representation ON (C, A) ON (A, TABLE) ON (B, TABLE) RIGHT_OF (B,A) ……. What to say?

Conceptualization C A B On Right_of Is_a Block Is_a What to say?

What to say ? How to say ? Natural language generation is the process of deliberately constructing a natural language text in order to meet specified communicative goals. [McDonald 1992]

Some of the Applications  Machine Translation  Question Answering  Dialogue Systems  Text Summarization  Report Generation

Thought / Concept  Expression  Objective: produce understandable and appropriate texts in human languages  Input: some underlying non-linguistic representation of information  Knowledge sources required: Knowledge of language and of the domain

Involved Expertise  Knowledge of Domain What to say Relevance  Knowledge of Language Lexicon, Grammar, Semantics  Strategic Rhetorical Knowledge How to achieve goals, text types, style  Sociolinguistic and Psychological Factors Habits and Constraints of the end user as an information processor

Asking for a pen  have(X, z) not have (Y,z)  want have (Y,z)  ask(give (X,z,Y)))  Could you please give me a pen? Situation Goal Conceptualization Expression Why? What? How?

Summer School on Natural Language Processing and Text Mining 2008 Some Examples

Example System #1: FoG  Function: Produces textual weather reports in English and French  Input: Graphical/numerical weather depiction  User: Environment Canada (Canadian Weather Service)  Developer: CoGenTex  Status: Fielded, in operational use since 1992

FoG: Input

FoG: Output

Example System #2: STOP  Function: Produces a personalised smoking-cessation leaflet  Input: Questionnaire about smoking attitudes, beliefs, history  User: NHS (British Health Service)  Developer: University of Aberdeen  Status: Undergoing clinical evaluation to determine its effectiveness

STOP: Input

STOP: Output Dear Ms Cameron Thank you for taking the trouble to return the smoking questionnaire that we sent you. It appears from your answers that although you're not planning to stop smoking in the near future, you would like to stop if it was easy. You think it would be difficult to stop because smoking helps you cope with stress, it is something to do when you are bored, and smoking stops you putting on weight. However, you have reasons to be confident of success if you did try to stop, and there are ways of coping with the difficulties.

Summer School on Natural Language Processing and Text Mining 2008 Approaches

Template-based generation Most common technique In simplest form, words fill in slots:  “The train from Source to Destination will leave platform number at time hours” Most common sort of NLG found in commercial systems

Pros and Cons  Pros Conceptually simple No specialized knowledge needed Can be tailored to a domain with good performance  Cons Not general No variation in style – monotonous Not scalable

Modern Approaches  Rule Based approach  Machine Learning Approach

Summer School on Natural Language Processing and Text Mining 2008 Some Critical Issues

Context Sensitivity in Connected Sentences  X-town was a blooming city. Yet, when the hooligans started to invade the place, __________. The place was not livable any more.  the place was abandoned by its population  the place was abandoned by them  the city was abandoned by its population  it was abandoned by its population  its population abandoned it……..

Referencing John is Jane’s friend. He loves to swim with his dog in the pool. It is really lovely. I am taking the Shatabdi Express tomorrow. It is a much better train than the Rajdhani Express. It has a nice restaurant car, while the other has nice seats.

Referencing John stole the book from Mary, but he was caught. John stole the book from Mary, but the fool was caught.

Aggregation The dress was cheap. The dress was beautiful The dress was cheap and beautiful The dress was cheap yet beautiful I found the boy. The boy was lost. I found the boy who was lost I found the lost boy. Sita bought a story book. Geeta bought a story book. ???? Sita and Geeta bought a story book. ???? Sita bought a story book and Geeta also bought a story book

Choice of words (Lexicalization) The bus was in time. The journey was fine. The seats were bad. The bus was in perfect time. The journey was fantastic. The seats were awful. The bus was in perfect time. The journey was fantastic. However, the seats were not that good.

Summer School on Natural Language Processing and Text Mining 2008 General Architecture

Component Tasks in NLG  Content Planning === Macroplanner  Document Structuring  Sentence Planner === Microplanning Aggregation ; Lexicalization; Referring Expression Generation  Surface Form Realization Linguistic realization; Structure Realization

A Pipelined Architecture Document Planning Microplanning Surface Realization Document Plan Text Specification

An Example Consider two assertions has (Hotel_Bliss, food (bad)) has (Hotel_Bliss, ambience (good)) Content Planning selects information ordering Hotel Bliss has bad food but its ambience is good Hotel Bliss has good ambience but its food is good

has (Hotel_Bliss, food (bad)) Sentence Planning choose syntactic templates choose lexicon bad or awful food or cuisine good or excellent Aggregate the two propositions Generate referring expressions It or this restaurant Ordering A big red ball OR A red big ball Have EntityFeature Modifier SubjObj

Realization correct verb inflection Have  Has may require noun inflection (not in this case) Articles required? Where? Conversion into final string Capitalization and Punctuation

Content Planning  What to say Data collection Making domain specific inferences Content selection Proposition formulation  Each proposition  A clause Text structuring  Sequential ordering of propositions  Specifying Rhetorical Relations

Content Planning Approaches  Schema based (McKeown 1985) Specify what information, in which order The schema is traversed to generate discourse plan  Application of operators (similar to Rule Based approach) --- Hovy 93 The discourse plan is generated dynamically  Output is Content Plan Tree

Discourse Demograph Detailed view Summary NameAge Blood Sugar Care Group nodes

Content Plan  Plan Tree Generation  Ordering – of Group nodes  Propositions  Rhetorical relations between leaf nodes  Paragraph and sentence boundaries

Rhetorical Relations You should...I’m in...You can get...The show...It got a... MOTIVATION EVIDENCE ENABLEMENT

Rhetorical Relations Three basic rhetorical relationships:  SEQUENCE  ELABORATION  CONTRAST Others like  Justification  Inference

Nucleus and Satellites I love to collect classic cars My favourite car is Toyota Innova I drive my Maruti 800 Elaboration Contrast N

Target Text The month was cooler and drier than average, with the average number of rain days, but the total rain for the year so far is well below average. Although there was rain on every day for 8 days from 11th to 18th, rainfall amounts were mostly small.

Document Structuring in WeatherReporter The Message Set: MonthlyTempMsg ("cooler than average") MonthlyRainfallMsg ("drier than average") RainyDaysMsg ("average number of rain days") RainSoFarMsg ("well below average") RainSpellMsg ("8 days from 11th to 18th") RainAmountsMsg ("amounts mostly small")

Document Structuring in Weather Reporter RainSoFar Msg CONTRAST RainAmounts Msg CONTRASTELABORATIO N RainSpell Msg RainyDays Msg ELABORATIO N MonthlyTmp Msg SEQUENCE Monthly RainfallMsg

Some Common RST Relationships  Elaboration: The satellite presents more details about the content of the nucleus  Contrast: The nuclei presents things, which are similar in some respects but different in some other relevant way. Multinuclear – no distinction bet. N and S  Purpose: S presents the goal of performing the activity presented in the nucleus  Condition: S presents something that must occur before the situation presented in N can occur  Result: N results from S

Planning Approach Save Document The system saves the document Choose Save option Select Folder Type Filename Click Save Button A dialog box displayed Dialog box closed

Planning Operator Name: Expand Purpose Effect: (COMPETENT hearer(DO-ACTION ?action)) Constraints: (AND (get_all_substeps ?action ?subaction) (NOT (singular list ?subaction)) Nucleus: (COMPETENT hearer (DO-SEQUENCE ?subaction)) Satellite: (((RST-PURPOSE (INFORM hearer (DO ?action)))

Expand Subactions Effect: (COMPETENT hearer (DO-SEQUENCE ?actions)) Constraints: NIL Nucleus: (for each ?actions (RST-SEQUENCE (COMPETENT hearer (DO-ACTION ?actions)))) Satellites: NIL

Purpose Result Choose Save Dialog Box Opens Choose Folder Sequence

Discourse  To save a file 1. Choose save option from file menu A dialog box will appear 2. Choose the folder 3. Type the file name 4. Click the Save button The system will save the document

Example Content Plan Tree

Rhetorical Relations – Difficult to infer Johh abused the duck The duck buzzed John 1. John abused the duck that had buzzed him 2. The duck buzzed John who had abused it 3. The duck buzzed John and he abused it 4. John abused the duck and it buzzed him

Summer School on Natural Language Processing and Text Mining 2008 On Clause Aggregation

Benefits of Aggregation  Conciseness Same information with fewer words  Cohesion We want a semantic unit – not a jumble of disconnected phrases  Fluency Less effort to read Unambiguous and acc. to communication conventions

Complex interactions  Aggregation adds to fluency The patient was admitted on Monday and released on Friday.  Someone ate apples. Someone ate oranges  Someone, who ate apples also ate oranges

Aggregation Operators CategoryOperatorsResourcesSurface markers InterpretiveSummarization Inference Common sense knowledge Ontology ReferentialRef. expr. Generation Quantified expression Ontology Discourse Each, all both some SyntacticParatactic Hypotactic Syntactic rules Lexicon And, with, who, which LexicalParaphrasingLexicon

Interpretive John punched Mary Mary kicked John => John fought with Mary John kicked Mary Not always meaning preserving Note use of Ontology John kicked Mary + John punched Mary =/> John fights with Mary

Referential Aggregation  Reference Expression generation The patient is Mary [name]. The patient is female [gender] The patient is 80 years old [age]. The patient has hypertension [med.history] The patient is Mary. She is an 80 year old female. She has hypertension. How much info in one sentence?

Reference ( Quantification)  John is doing well  Mary is doing well  All the patients are doing well  Note the use of background knowledge  The patient’s leftarm  The patient’s right arm  Each arm  Note the use of Ontology

Syntactic Aggregation  Paratactic: Entities are of equal syntactic status Ram likes Sita and Geeta Main operator is co-ordinating conjunction  Hypotactic: Unequal status NP modified by a PP Ram likes Sita, who is a nurse

Lexical Aggregation  In hypotactic aggregation, the satellite propositions are modified.  The Maths score was 99.8%  99.8% is a record high score  The Maths score was 99.8%, a record high score (apposition modification)  The Maths score was a record high score of 99.8%  A dog used by police  A police dog  Rise sharply  shoot  Drop sharply  plunge

Rhetorical Relations and Hypotactics Use of cue operators RR: Concession He was fine He just had an accident Although he had an accident he was fine RR: Evidence My car is not IndianMy car is a Toyota My car is not Indian because it is a Toyota RR: Elaboration My car is not Indian My car is expensive My expensive car is not Indian

Hypotactic Operators  If propositions do not share any common entity, the operator can simply join using cue phrase N:Tom is feeling coldS:The window is open Cause Tom is feeling cold because the window is open  If the linked propositions share common entities, the internals of the linked propositions undergo modifications N: The child stopped hunger S: The child ate an apple [Purpose] To stop hunger, the child ate an apple.

Two stage transformation: RR: Evidence N: Tom was hungry S: Tom did not eat dinner Replace Tom in N by ‘he’ Apply Rule 1 Because Tom did not eat dinner, he was hungry

Corpus study to Rules [Example RR: Purpose N: Lift the cover S: Install battery] %Example To-infinitive59.6To install battery, lift the cover For-Nominalization7.5Lift the cover for battery installation For-Gerund2.5Lift the cover for installing battery By-pupose10Install battery by lifting cover So-Tat Purpose8.4Lift cover so battery can be installed

Syntactic constructions for realizing Elaboration relations VerbosityM-directionExamples R-ClauseShortBeforeAn apple which weighs 3 oz Reduced R-ClauseShorterBeforeAn apple weighing 3oz PPShorterBeforeAn apple in the basket AppositionShortestBeforeAn apple, a small fruit PrenominalizationShortestAfterA 3 oz apple AdjectiveShortestAfterA dark red apple

Lexical Constraints  Except for R-clause and Reduced R-clause, transforming a proposition into an apposition, an adjective or a PP requires that the satellite proposition be of a specific syntactic type ( a noun, an adj or a PP respectively). N: Jack is a runner. S: Jack is fast. Jack is a fast runner Fast and runner has a possible qualifying relationship. Qualia Structure (Pustejovsky 91)

Constraints  Linear Ordering Paratactic  Years 1998,1999 and 2000  Not Years 1999, 1998 and 2000 Hypotactic  Uncommon orderings between premodifiers create disfluencies  A happy old man ---- An old happy man

Linear Ordering and Scope of Modifiers  Problem when multiple modifiers modify the same noun Decide the order Avoid ambiguity Ms. Jones is a patient of Dr. Smith, undergoing heart surgery Old men and women should board first Women and old men should board first

Linear Ordering of Modifiers  A simplex NP is a maximal noun phrase that includes pre- modifiers such as determiners and possessives, but not post-nominals such as PPs and R-Cls.  A POS tagger along with FS grammar can be used to extract simples NPs.  A morphology module transforms plurals of nouns, comparative and superlative adjectives into their base form for frequency count.  Regular expression filter to remove concatenations of NPs Takeover bid last week Metformin 500 milligrams

Three stages of subsequent analysis  Direct Evidence Modifier sequences are transformed in ordered pairs  Well known traditional brand name drug  Well known < traditional  Well known < brand name  traditional < brand name Three possibilities  A < B ; B< A; B=A (no order)

 For n modifiers n C 2 ordered pairs  Form a w X w matrix where w is the number of distinct modifiers.  Find Count[A,B] and Count[B,A]  For small corpus binomial distribution of one following the other is observed.

 Transitivity Again from corpus A < B and B< C  ? A < C Long, boring and strenuous stretch Long strenuous lecture  Clustering: Formation of equivalence classes of words with same ordering with respect tp other premodifiers

John is a 74 year old hypertensive diabetic white male patient with a swollen mass in the left groin John is a diabetic male white 74 year old hypertensive patient with a red swollen mass in the left groin

Other Constraints  For conjunctions John ate an apple and an orange (NP and NP) John ate in the morning and in the evening (PP and PP) X John ate an apple and in the evening (NP and PP)  Moral: Same syntactic category? John and a hammer broke the window ??? He is Nobel Prize winner and at the peak of his career.  Others: Adj phrase attachment, PP attachment etc.

Summer School on Natural Language Processing and Text Mining 2008 Conjunctions

Three interesting types  John ate fish on Monday and rice on Tuesday (non-constituent coordination)  John ate fish and Bill rice (gapping)  Right node raising John caught and Mary killed the spider

A Naïve Algorithm 1. Group propositions and order them according to similarities 1.I sold English books on Monday 2.I sold Hindi books on Wednesday 3.I sold onion on Monday 4.I sold Bengali books on Monday ((1,3,4),2) OR ((1,4),3,2) OR…..

2. Identify recurring elements 3. Determine sentence boundary 4. Delete redundant elements

Still Funny Scenarios  The baker baked. The bread baked.   The baker and the bread baked.  I don’t drink. I don’t chew tobacco.   I don’t drink and chew tobacco.  ==What should the constraints be?

Morphological Synthesis  Inflections depending on tense, aspect, mood, case, gender, number, person and familiarity.  A typical Bengali verb has 63 different inflected forms (120 if we consider the causative derivations)  Exceptions

Synthesis Approach  Classification of words based on Syllable structure [19 classes for Bengali verbs]  Paradigm tables for each of the classes  Table-driven modification of the words  Exceptions treated separately.

Different rules are used to inflect qualifiers and headwords The rule to inflect proper noun as a headword in a particular SSU IF (headword type = proper noun AND the SSU to which the headword belongs = kAke AND the last character of root word = ‘a’), THEN Rule1: headword = headword + “ke” rAma  rAmake IF (Verb1==verb2 AND the Conjunction = Ebong AND SSU2 to which the headword belongs = kakhana AND the last character of root word = ‘a’) THEN Rule1: headword = headword –’a’. Rule2: headword = headword +’o’. Aaem gfkal bl /K/leClam ybL Aajo /Klb. Headword : Aaj + o Noun Morphology Synthesis

Depends upon TAM option. Category Identification +Table lookup Category Identification: Structure of root verb: X * VC * $. where: X= Any Character, V= vowel, C=constant and $ € { Ø, a, A, oYA }. Verb Morphology Synthesis ghumA [ghumAno] (to sleep) u/au so;oYA [so;oYAno] (lie, causative) tolA [tolAno] (pick, causative) tola [tolA] (to pick) so [so;oYA] (to lie down) o deoYA [deoYAno] (give, causative) dekhA [dekhAno] (to show) dekha [dekhA] (to see ) e ni~NrA [ni~NrAno] likha [lekhA] (to write) di [deoYA] (to give) i khAoYA [khAoYAno] (to feed) jAnA [jAnAno] (to inform) jAna [jAnA] (to know) khA [khAoYA] (to eat) A saoYA [saoYAno] (undergo, causative) karA[karAno] (do, causative) kara [karA] (to do) ha [haoYA] ( to happen) a oYA A a*  $ V

Table Look Up  The Table Lookup Stage: i)Pr  Present ii)Pa  Past iii) Sim  Simple iv) Per  Perfect v) Co  Continuous vi) Ind  Indicative vii) Neg  Negation.

Summer School on Natural Language Processing and Text Mining 2008 ?Questions?