1 Words and the Lexicon September 10th 2009 Lecture #3.

Slides:



Advertisements
Similar presentations
The Structure of Sentences Asian 401
Advertisements

Syntax Lecture 2: Categories and Subcategorisation.
Syntax and Context-Free Grammars Julia Hirschberg CS 4705 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
Chapter 4 Syntax.
TOPIC: WORD CLASS Lesson 1. Noun A word that refers to a person (such as Mike or doctor), a place (such Dhaka or city), or a thing, a quality or an activity.
Statistical NLP: Lecture 3
Chapter 4 Basics of English Grammar
10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt NounsVerbsPronounsPrepositionsAdverbs.
SYNTAX Introduction to Linguistics. BASIC IDEAS What is a sentence? A string of random words? If it is a sentence, does it have to be meaningful?
The Eight Parts of Speech
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
Introduction to Computational Linguistics Lecture 2.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
Stemming, tagging and chunking Text analysis short of parsing.
NLP and Speech 2004 English Grammar
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Nouns, Verbs, Adjectives, Adverbs.  Structure Classes: ◦ stable – changes little over time; about 1% of words in English; help put structure of sentences.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Chapter 2 A rapid overview.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
THE PARTS OF SYNTAX Don’t worry, it’s just a phrase ELL113 Week 4.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Syntax The number of words in a language is finite
Constituents  Sentence has internal structure  The structures are represented in our mind  Words in a sentence are grouped into units, and these units.
Chapter 4 Basics of English Grammar Business Communication Copyright 2010 South-Western Cengage Learning.
Chapter 4 Syntax Part II.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
1.Syntax: the rules of sentence formation; the component of the mental grammar that represent speakers’ knowledge of the structure of phrase and sentence.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
Natural Language Processing Lecture 6 : Revision.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
LING 702 Structure Class 2 Instructor: Bede McCormack.
8 Parts of Speech Noun Pronoun Adjective Verb Adverb Preposition Conjunction Interjection.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
Linguistic Essentials
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Natural Language Processing
$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100.
Parts of Speech Major source: Wikipedia. Adjectives An adjective is a word that modifies a noun or a pronoun, usually by describing it or making its meaning.
Words Introduction to English Language Phrases Professor Sabine Mendes Moura.
Grammar Boot Camp Parts of Speech Challenge
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Linguistics Lecture-1: Words Pushpak Bhattacharyya, CSE Department, IIT Bombay 14 June, 2008.
1 Chapter 4 Syntax Part III. 2 The infinity of language pp The number of sentences in a language is infinite. 2. The length of sentences is.
An Introduction to Semantic Parts of Speech Rajat Kumar Mohanty rkm[AT]cse[DOT]iitb[DOT]ac[DOT]in Centre for Indian Language Technology Department of Computer.
Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 3.
Basic Syntactic Structures of English CSCI-GA.2590 – Lecture 2B Ralph Grishman NYU.
3.3 A More Detailed Look At Transformations Inversion (revised): Move Infl to C. Do Insertion: Insert interrogative do into an empty.
NATURAL LANGUAGE PROCESSING
Unit 1 Language Parts of Speech. Nouns A noun is a word that names a person, place, thing, or idea Common noun - general name Proper noun – specific name.
ENGLISH 5050: English Syntax and Morphology All quotations, unless otherwise noted, are from Chapter 2 of The Grammar Book, 2nd edition. Robert F. van.
Parts of speech. Eight parts of speech: Noun pronoun verb adjective adverb preposition conjunction and interjection.
Lecture 2: Categories and Subcategorisation
Different types of Grammer
The theory of word classes in modern grammar studies
Structure, Constituency & Movement
Beginning Syntax Linda Thomas
Statistical NLP: Lecture 3
Chapter 4 Basics of English Grammar
Parts of Speech Project
English parts of speech
Linguistic Essentials
Chapter 4 Basics of English Grammar
Structure of a Lexicon Debasri Chakrabarti 13-May-19.
Artificial Intelligence 2004 Speech & Natural Language Processing
Vocabulary/Lexis LEXIS: n., collective, uncountable
Presentation transcript:

1 Words and the Lexicon September 10th 2009 Lecture #3

2 What is a word? A sequence of characters demarcated by white space Not exactly –Spoken language –Chinese –It’s  it is –Other issues in tokenization (e.g., New York, rock ‘n’ roll)

3 Types versus Tokens A text will contain various words Some of these words may occur more than once “Chapter 2 introduced the regular expression, showing for example how a single search string could help a web search engine find both woodchuck and woodchucks.” 25 words (tokens)

4 Types versus Tokens A text will contain various words Some of these words may occur more than once “Chapter 2 introduced the regular expression, showing for example how a single search string could help a web search engine find both woodchuck and woodchucks.” 25 words (tokens) Some words (types) occur more than once (each a different token) –a – 2 –search – 2

5 Types versus Tokens Word Token: an occurrence of a word at a particular spatio-temporal location (e.g., a sequential position in a text, an utterance event at a time and space). Word Type: a more abstract notion also termed lexeme – we speak of two tokens belonging to the same type. Also, woodchuck and woodchucks are two grammatical forms of the same lexeme (woodchuck).

6 Lexical Knowledge Phonology: sounds rhythm, variants, homophones Semantics: meanings of (parts of) parts of words, synonyms Morphology: related word forms (e.g., plural) Syntax: how to use the word in a sentence Pragmatics: appropriate situations for using the word Orthography: how the word is written variants Etymology: history of the word, obsolete meanings

7 Parts of Speech/Word Classes Open Class Word Categories Nouns: person, place, or thing; proper vs. common, mass vs. count, number, gender, case Verbs: most referring to actions and processes; main verbs vs. auxiliaries; transitive (hit, keep) vs. intransitive (arrive, snore) Adjectives: terms that describe properties or qualities Adverbs: modify something; directional, locative, degree, manner, temporal

8 Parts of Speech/Word Classes Closed Class Word Categories Determiners: definite (the), indefinite (a), demonstrative (this) Prepositions: occur before a noun phrase, semantically they are relational Conjunctions: coordinating (and), subordinating (if, that) Auxiliary verbs: can, may, should, are, have Pronouns: personal (she), possessive (her), interrogative (who), relative (who), reflexive (himself) Particles: combine with a verb to form a phrasal verb; up, down, on, off, in, out Numerals: one, two, three, first, second, third

9 Tests for Word Classes Morphological (formal) tests: Look for closely related forms; E.g., only nouns can bear the plural affix Syntactic tests: What words co-occur (i.e., immediately precede and follow) with the word? E.g., can you say the word directly after the? Semantic (notational) tests: What kinds of things does the word denote? E.g., person, place, or thing?

10 Major Syntactic Constituents Noun Phrase (NP): referring expressions (the blue shoe) Verb Phrase (VP): verbs plus complements (marks the sixth consecutive monthly decline) Prepositional Phrase (PP): direction, location, time, manner, etc. (in three minutes) Adjectival Phrase (AdjP): modified or complemented adjectives (much sharper, content to stay) Complementizers (COMP): (that, whether)

11 Constituent Structure (parse tree) theboy ate peas with afork S NP DET N VP NP N PP P NP DET N V

12 Lexicon Used in NLP systems to associate information with words (either for parsing or generation) Information about a word is called a lexical entry In parsing, each word in the input is scanned, and then lexical lookup retrieves one or more entries from the lexicon. Some of the information may be dynamically computed (e.g., by exploiting various lexical regularities). NLP-specific lexicons are similar to, but typically richer than, a printed dictionary Some NLP systems have used machine-readable versions of printed dictionaries Good source of links:

13 NLP Lexicon: abandon Lexical entry with syntactic frame: (verb :orth “abandon” :subc ((np-pp :p-val (“to”)) (np))) –Words that take complements will have a subcategorization (:subc) feature. For example, the verb “abandon” can occur with a noun phrase followed by a prepositional phrase with the preposition “to” (e.g., “I abandoned him to the sea.”) or with just a noun phrase complement (“I abandoned the ship”).