ZERO PRONOUN RESOLUTION IN JAPANESE Jeffrey Shu Ling 575 Discourse and Dialogue.

Slides:



Advertisements
Similar presentations
TYPES OF TESTS …MULTIPLE CHOICE, ESSAY, AND TRUE FALSE
Advertisements

 Before you submit your paper, check these things.
Introduction to phrases & clauses
Discourse Martin Hassel KTH NADA Royal Institute of Technology Stockholm
Chapter 18: Discourse Tianjun Fu Ling538 Presentation Nov 30th, 2006.
From requirements to design
Pronouns Meeting 5 Matakuliah: G0794/Bahasa Inggris Tahun: 2007.
Generative Models of Discourse Eugene Charniak Brown Laboratory for Linguistic Information Processing BL IP L.
CS 4705 Lecture 21 Algorithms for Reference Resolution.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
Some definitions Morphemes = smallest unit of meaning in a language Phrase = set of one or more words that go together (from grammar) (e.g., subject clause,
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Argumentation - 1 We often encounter situations in which someone is trying to persuade us of a point of view by presenting reasons for it. We often encounter.
1 Human simulations of vocabulary learning Présentation Interface Syntaxe-Psycholinguistique Y-Lan BOUREAU Gillette, Gleitman, Gleitman, Lederer.
The role of theory in research
Short Stories and Essays Almost everything you need to know!
 Main Idea/Point-of-View  Specific Detail  Conclusion/Inference  Extrapolation  Vocabulary in Context.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Paraphrasing and Plagiarism. PLAGIARISM Plagiarism is using data, ideas, or words that originated in work by another person without appropriately acknowledging.
Getting the Language Right ITSW 1410 Presentation Media Software Instructor: Glenda H. Easter.
ELA Common Core Shifts. Shift 1 Balancing Informational & Literary Text.
Differential effects of constraints in the processing of Russian cataphora Kazanina and Phillips 2010.
Scientific writing style Exact  Word choice: make certain that every word means exactly what you want to express. Choose synonyms with care. Be not.
The Expository Essay An Overview
Home Enrichment (HE) TEST THE IDEA. DAY ONE (1) Focus: Purpose & Questions at Issue 4 Home Enrichment (HE)- 4/13 Do Nightly / Due on Fri. 4/17 TEST THE.
Discourse. The study of discourse: – Involves our efforts to interpret or be interpreted…and how we accomplish it – Goes beyond just linguistic forms.
LAS LINKS DATA ANALYSIS. Objectives 1.Analyze the 4 sub-tests in order to understand which academic skills are being tested. 2.Use sample tests to practice.
UNIT 7 DEIXIS AND DEFINITENESS
Stages of Teaching an Oral Lesson
The Science of Good Reasons
S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.
UML-1 8. Capturing Requirements and Use Case Model.
ERIKA LUSKY JULIE RAINS Collaborative Dialogue in the Classroom
HYMES (1964) He developed the concept that culture, language and social context are clearly interrelated and strongly rejected the idea of viewing language.
Mid Term Review Quote: A book is good company. It is full of conversation without loquacity. It comes to your longing with full instruction, but pursues.
What is a M.C. Cloze? Section C – Reading and Language System.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
HOW TO STUDY??? STUDY HABITS Who needs them? We all do. Everyone has deadlines to assignments. No matter how much we like or dislike a subject we are working.
1 KINDS OF PARAGRAPH. There are at least seven types of paragraphs. Knowledge of the differences between them can facilitate composing well-structured.
Rules, Movement, Ambiguity
Reading Comprehension Tips Suggestions for reading non-fiction and for completing reading comprehension tests *Presentation based on Vivien Martin’s Test-Prep.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
ACE TESOL Diploma Program – London Language Institute OBJECTIVES You will understand: 1. The terminology and concepts of semantics, pragmatics and discourse.
Parts of Speech Major source: Wikipedia. Adjectives An adjective is a word that modifies a noun or a pronoun, usually by describing it or making its meaning.
Unit 4: REFERRING EXPRESSIONS
Topic and the Representation of Discourse Content
WORDS The term word is much more difficult to define in a technical sense, and like many other linguistic terms, there are often arguments about what exactly.
Unit 8 Syntax. Syntax Syntax deals with rules for combining words into sentences, as well as with relationship between elements in one sentence Basic.
What will I have to do on the SBAC? As you read through the types of questions you may be asked on the SBAC, indicate whether or not you feel prepared.
How to solve the legal case Based on Introduction and General Presentation (Cristina Verones, Sebastien Rosselet) – exercisebook for students.
TOP TIPS for the Higher Language Paper Preparation and Exam Technique are the Key to Success.
CHARACTER, SETTING, PLOT Characters: people or animals that appear in the story Setting: time and place in which the story happens Plot: action or events.
Differences between Spoken and Written Discourse
The single most important skill for a computer programmer is problem solving Problem solving means the ability to formulate problems, think creatively.
SAT Reading Strategies.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
CAS Managebac update CAS opportunity for someone with a scanner. Cambodia?
#1 Make sense of problems and persevere in solving them How would you describe the problem in your own words? How would you describe what you are trying.
Inflection. Inflection refers to word formation that does not change category and does not create new lexemes, but rather changes the form of lexemes.
Differences between Spoken and Written Discourse Source: Paltridge, p.p
THE GENITIVE CASE Their Syntactical Classification.
How to Take Tests! 4 Learn all about: 4 True/False 4 Matching 4 Multiple Choice 4 Fill in the Blank.
SAT Reading Strategies.
EmSAT English Achieve.
SAT Reading Strategies.
SAT Reading STRATEGIES.
Main Idea vs. Author’s Purpose
Mariana Berenguer AP Language and Composition
Deixis Saja S. Athamna
Presentation transcript:

ZERO PRONOUN RESOLUTION IN JAPANESE Jeffrey Shu Ling 575 Discourse and Dialogue

Review of Zero Pronouns  (Pro)nouns often dropped if pragmatically/semantically inferable from context  General preference for names/nouns rather than pronouns in polite/formal speech (except 1 st person and demonstrative pronouns)  Grammatical markers removed with nouns  Pronominal referent may or may not appear in discourse prior to appearance of zero pronoun

Overview of Objectives  Find general syntactic rules that can be used to identify the presence of zero pronouns  Find syntactic/semantic/pragmatic clues to identify referents for ZPs, or determine appropriate pronoun/noun insertion if not explicitly found in textual context  Determine priority in case of conflict

Syntactic Identification of Zero Pronouns  Determining presence of zero pronoun fairly straightforward for subject/topic, objects  Determine syntactic argument structure of verb  Identify if noun exists to fill arguments (simple task with grammatical markers) Sometimes grammatical markers not used in casual speech  Not so straightforward for other nouns (e.g. locatives)  Might not be important if unstated (e.g. “I’ll go to the store” vs. “I’ll go”). Usually has precedent.

Findings: Some Generalizations from Semantics/Syntax  In consecutive statements with ZPs, the topic/subject is often the same  Many verbs have a semantic preference for certain types of nouns (e.g. animate vs. inanimate) which can narrow possibilities  Imperatives are always 2 nd person (unless speaker is talking to him/herself)  Rarely have a subject/topic

Findings: Conversations  1P by far most commonly used pronoun  If ZP in statement is 1P, usually has an explicit precedent  If 2P, it may or may not have an explicit precedent  Question/answer format generalizations  Subject/topic of question usually 2/3P If1P, often directly stated  Subject of answer almost always opposite pronoun of question (2/3P vs. 1P), same referent (i.e. same person)  3P both simplest and potentially most complicated  3P personal pronouns (e.g. he/she) very rarely used  If 3P ZP, almost always preceded by explicit reference to name/noun  Gender indeterminate (possible problem for MT)

Findings: Domain Specific  Formal situations usually dictate specific conventions to be followed  Representatives almost never use 1P singular or 3P personal pronouns  2P ZPs can be inferred from domain context (a corporate statement would probably be addressing its customers or investors)

Findings: Domain Specific  Reference articles (e.g. Wikipedia)  While ZPs commonly are used, the antecedent is usually found no more than a few sentences prior  ZPs in consecutive sentences usually have same referent  If no clear antecedent can be found, subject of article is often a reasonable assumption  1 st and 2 nd person virtually non-existent, can be safely eliminated as possibilities Exceptions: reader-addressed texts (e.g. reference guides)  News articles similar conventions

Findings: Domain Specific  Visual media (e.g. comics, TV, etc.)  Most problematic  Referent very often in visual context with no textual context  Heavy reliance on visuals results in not only (pro)noun dropping, but “anything” dropping (including verbs), losing syntactic information  Quite possibly impossible to resolve without human intervention  Purely textual works (e.g. novels) usually have enough information

Priority  Domain has highest priority in all situations  Rules for separate domains largely mutually exclusive  Failure to determine reasonable pronouns as determined to be can lead to misinformation  Japanese particularly sensitive to social context Generic pronoun insertion may be highly inappropriate  Simple domain information can be extremely valuable  Unless ruled out by domain, general conversation rules may be applicable to many different media

Priority  Pragmatics/semantics > syntax  Ex: Question/answer conventions are of greater relevance than assumption that the subject of the previous sentence is the same as the subject of the current sentence  Ex: The semantics of verbs to prefer certain types of nouns is of greater relevance than the fact that a ZP is a particular grammatical role (e.g. naïve assumption that direct objects tend to be inanimate)

An Idea  Instead of inserting “best guess” pronouns, provide a selection of best candidates in text for user to disambiguate  In current MT systems that insert generic pronouns, users have to “interpret” (guess) what is really meant anyway  Insertion of pronouns is never 100% certain  Some media (visual) require human intervention  Insertion of pronouns/nouns can lead to misinformation, faux pas, and sense of unreliability of system  It is much faster to pick out of a set of provided candidates rather than guess whether the pronoun is right or wrong, and go back to try to figure out what is going on