On the Nature of Word Classes in Chinese K.K. Luke Nanyang Technological University.

Slides:



Advertisements
Similar presentations
The Structure of Sentences Asian 401
Advertisements

İDB 408 LINGUISTIC PHILOSOPHY 2010/2011 Spring Term Instructor: Dr. Filiz Ç. Yıldırım.
Grammar: Meaning and Contexts * From Presentation at NCTE annual conference in Pittsburgh, 2005.
CODE/ CODE SWITCHING.
The NOUN 1 General characteristics and classification
Statistical NLP: Lecture 3
MORPHOLOGY - morphemes are the building blocks that make up words.
Linguistic Theory Lecture 8 Meaning and Grammar. A brief history In classical and traditional grammar not much distinction was made between grammar and.
Introduction to Linguistics and Basic Terms
Cognitive Linguistics Croft & Cruse 10 An overview of construction grammars (part 2, through end)
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Meaning and Language Part 1.
Its Grammatical Categories
Lecture 1 Introduction: Linguistic Theory and Theories
Chapter 2 A rapid overview.
The Langue/Parole distinction`
Language and Culture Prof. R. Hickey SoSe 2006 How language works
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 14, Feb 27, 2007.
Substance Substance & Form Diachronic and Synchronic approaches Substance & Form Diachrony& Synchrony Lecture # 12.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
9/8/20151 Natural Language Processing Lecture Notes 1.
Linguistics, Pragmatics & Natural Grammar
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
1 How to Compute the Meaning of Natural Language Utterances Patrick Hanks, Research Institute of Information and Language Processing, University of Wolverhampton.
Lecture 9: The Gerund.  The English gerund is an intriguing structure which causes a particular problem for X-bar theory  [His constantly complaining.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
Lecture 2 What Is Linguistics.
Natural Language Processing Lecture 6 : Revision.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 13, Feb 16, 2007.
UNIT 7 DEIXIS AND DEFINITENESS
Linguistics The third week. Chapter 1 Introduction 1.3 Some Major Concepts in Linguistics.
Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 12, Feb 13, 2007.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Linguistic Essentials
Culture , Language and Communication
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Rules, Movement, Ambiguity
Artificial Intelligence: Natural Language
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
The Minimalist Program
Natural Language Processing
WHAT IS LANGUAGE?. INTRODUCTION In order to interact,human beings have developed a language which distinguishes them from the rest of the animal world.
Cognitive Linguistics Croft&Cruse
Introduction Chapter 1 Foundations of statistical natural language processing.
WORDS The term word is much more difficult to define in a technical sense, and like many other linguistic terms, there are often arguments about what exactly.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
SYNTAX.
Levels of Linguistic Analysis
3 Phonology: Speech Sounds as a System No language has all the speech sounds possible in human languages; each language contains a selection of the possible.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 3.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 11, Feb 9, 2007.
Slang. Informal verbal communication that is generally unacceptable for formal writing.
GRAMMAR AND PUNCTUATION REVISE AND REVIEW WORD CLASSES.
Why languages differ: Variation in the conventionalization of constraints on inference By: Randy J. LaPolla City University of Hong Kong Presented by:
X-Bar Theory. The part of the grammar regulating the structure of phrases has come to be known as X'-theory (X’-bar theory'). X-bar theory brings out.
Lec. 10.  In this section we explain which constituents of a sentence are minimally required, and why. We first provide an informal discussion and then.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
Statistical NLP: Lecture 7
Statistical NLP: Lecture 3
Part I: Basics and Constituency
The scope of Semantics Made Simple
Natural Language Processing
Linguistic Essentials
Presentation transcript:

On the Nature of Word Classes in Chinese K.K. Luke Nanyang Technological University

Debates on word classes Started in the 1950s – Gao Mingkai, Lü Shuxiang and others on the relative importance of morphological and syntactic criteria Latest one in 2009 (Yuyanxue Luncong) – Shen Jiaxuan, Zhan Weidong and others A span of more than 50 years

The first debate As a result of the first debate, it was generally agreed that the main criterion for identifying part-of-speech should be syntactic function, not morphology. Gabelentz (1881): The existence of grammatical categories is proved by the fact that Chinese words differ in their syntactic behaviour. But few have dared to take this argument to its logical conclusion.

Reluctance to follow through Famous example of chuban publish/ publication publish books the publication of books a ceremony to announce the publication of a book, i.e., a book launch Reluctance to assign chuban to two or more classes ( )

More recent debates Everybody agrees that the purpose of setting up word classes is for the convenience of doing grammatical analysis, but have word classes brought us more convenience or headache? (Preface to the 2009 special issue of Yuyanxue Luncong) Contributions informed by Linguistic Typology and Natural Language Processing (NLP)

Word classes for NLP A degree of arbitrariness in assigning POS tags No two corpora use the same POS tagset – PKU corpus: 38 tags – Academia Sinica Balanced Corpus: 43 tags – Penn Tree Bank: 33 tags No guarantee (or hope) that the POS tags used in NLP will correspond in any way to how words are represented and organised in speakers minds.

Sources of problem Why cant scholars agree on word classes in Chinese? – Structure of the language – Psychology of the linguist

Structure of the language Little inflectional morphology – Word classes have little morphological marking Flexibility – chi: chifan eat (rice); xiaochi snack – zao: da zao early morning; Zao! Morning!; zao shuo Why didnt you say that earlier? – Jun jun, chen chen, fu fu, zi zi (Confucian Analects) –

Psychology of the linguist Fear: that Chinese might be regarded as an inferior language without grammar if it turns out that words cannot be assigned to fixed word classes. ( ) W. von Humboldt (1826): The Chinese language seems rather to disdain than to neglect the denoting of grammatical categories.

Psychology of the linguist Great reluctance to entertain the possibility that syntactic categories can be determined only by reference to constructions. Li Jinxi 1950 ( ) (Words do not belong to any syntactic classes until they enter into a construction.) Li Jinxis view has almost been universally rejected for fear that it may lead to an unwanted conclusion: Words dont belong to unique word classes in Chinese.

Unnecessary worries But: – First, no fixed word classes doesnt mean no word classes; and – Second, no word classes doesnt mean no grammar.

One word, one class? Great effort made to ensure one word, one class and rule out the possibility of class overlapping (e.g., Lu Jianming 1994). – When all members of a class can occur in a non- typical position (e.g. All verbs can take subject or complement position) laodong work: laodong guangrong, xihuan laodong – Temporary shifts Tai junfa le! Thats too warlord-like! – Different meanings suo to lock/ a lock; daibiao; baogao

Noun or Verb? Possible cases of overlapping – Yanjiu research, diaocha survey, chuban publish Morphology no help – chuban to publish/publication – yanjiu to research/a piece of research – Nothing in the form of these words tells us whether they are nouns or verbs. Nominalization? – General reluctance to treat chuban in zhe ben shu de chuban as overlapping or nominalization

Diaocha in PKU corpus (v) to survey languages (vn) language survey (vn) conduct a survey (v/vn) large-scale survey (v) through surveying X/ through a survey of X

Different views Huang Changning – Class overlapping Guo Rui – Verbs (priority treatment – youxian kaolü) Reasons: economy, psychological acceptance Shen Jiaxuan – Verbs as nouns

Huang Changning X-bar theory: Head of an NP should be an N No good reason why words like chuban and yanjiu should not be treated as nouns, just like any other word in the same syntactic position: – zhe ben shu de chuban – [ NP X de N]

Guo Rui Words like chuban and diaocha are verbs. Distinction between lexical meaning and syntactic function Unlike other languages, in Chinese verbs (and adjectives) can simply occupy the Head position of a NP without undergoing any licensing process, e.g., nominalization. Overlapping is acceptable only if its rare.

Shen Jiaxuan A V N Noun as a superclass Relationship between N and V: constitution as opposed to realization (as in derivational morphology) E.g., English realise (v) > realisation (n); cf. Chinese shixian (v/n)

A constructionist approach Im more in sympathy with Huang, i.e., N/V overlapping However, I would add that words belong to different classes by virtue of their occurrence in different constructions. This idea is adopted from William Crofts Radical Construction Grammar (RCG) As Randy LaPolla has pointed out, RCG can be used to good effect in analysing Sino-Tibetan and Austronesian languages.

Radical Construction Grammar Key reference: Croft (2001) RCG RCG is a nonreductionist theory which begins with the larger units and defines the smaller ones in terms of their relation to the larger units. (2001: 47) Constructions, not categories and relations, are the basic primitive units of syntactic representation. (2001: 46)

RCGs conception of language The proper definition of speech community is a population of individual speakers who are communicatively isolated from other speakers. The communicative interaction of speakers defines another population: the population of utterances produced by the speakers in a speech community. A language is a population of utterances – not possible utterances, but actual utterances, just as the species is a population of actual organisms. (2001: 365)

POS in RCG Noun and verb are not universal categories. Word class labels are a convenient way of referring to classes of words in a particular language. Nouns and verbs of different languages could (and usually do) have very different properties, e.g., – Have a look/smell (English) – Hen junfa (Chinese), cf. ?very warlord

Constructions Form-meaning pairing Form: syntactic elements Meaning: semantic components Link between form and meaning: symbolic The role of each element determined with reference to the construction

Example Wangmian 7-years-old at died father Wangmians father died when he was seven The LOSS construction: X LOST Y Verbs that can take second position come from a small collection: At the age of seven, Wangmian lost his father

Solution Role of chuban determined by place in a particular construction – publish books (VO) – publication of books (X de Y) – book launch (NN) If the same word can occur in different constructions, then it will have different syntactic roles defined by those constructions. Original question whats its word class? is a misleading question.

Unnecessary worries Scanty inflectional morphology – Syntactic position more important Flexibility – Languages probably all flexible each in its own ways No word classes does not make Chinese an inferior language

Flexibility English, for example, has a great deal of overlapping too – Most simple nouns can be used as verbs Chair, table, butter, name, home, light, house, eye, plant – Most simple verbs can be used as nouns (in the Have a X construction) Look, walk, say, think, try

Conclusion Old paradigm – Words can be assigned to a small number of word classes according to their syntactic behaviour, and they are stored in the mental lexicon with word class tags attached to them. New paradigm – Words have meaning potential. By virtue of their meaning potential they can enter into particular positions in particular constructions and play roles defined by those constructions.