Presentation is loading. Please wait.

Presentation is loading. Please wait.

November 2009HLT - Sentence Grammar1 Human Language Technology Sentence Grammar.

Similar presentations


Presentation on theme: "November 2009HLT - Sentence Grammar1 Human Language Technology Sentence Grammar."— Presentation transcript:

1 November 2009HLT - Sentence Grammar1 Human Language Technology Sentence Grammar

2 November 2009HLT - Sentence Grammar2 Introduction This lecture has several themes: 1.Crash course in sentence-level grammar Jurafsky and Martin 2nd ed. Chapter 12 Internet Grammar of English http://www.ucl.ac.uk/internet-grammar/ 2.Show how different linguistic phenomena can be captured by grammar rules. 3.Dependency Parsing 4.Tagsets and Treebanks

3 November 2009HLT - Sentence Grammar3 Part 1 Grammar of English

4 November 2009HLT - Sentence Grammar4 Different Kinds of Rule Morphological rules.. govern how words may be composed: re+invest+ing = reinvesting. Syntactic rules.. govern how words and constituents combine to form grammatical sentences. Semantic rules.. govern how meanings may be combined.

5 November 2009HLT - Sentence Grammar5 Syntax: Why? You need knowledge of syntax in many applications: –Parsing –Grammar checkers –Question answering/database access –Information extraction –Generation –Translation Full versus superficial analysis?

6 November 2009HLT - Sentence Grammar6 Levels of Grammar Organisation Word Classes: different parts of speech (POS). Phrase Classes: sequences of words inheriting the characteristics of certain word classes. Clause Classes: sequences of phrases containing at least one verb phrase. On the basis of these one may define: Grammatical Relations: role played by constitutents e.g. subject; predicate; object Syntax-Semantics interface: mapping between syntactic structures and meaning

7 November 2009HLT - Sentence Grammar7 Word Classes Closed classes. –determiners : the, a, an, four. –pronouns : it, he etc. –prepositions : by, on, with. –conjunctions : and, or, but. Open classes. –nouns refer to objects or concepts: cat, beauty, Coke. –adjectives describe or qualify nouns: fried chickens. –verbs describe what the noun does: John jumps. –adverbs describe how it is done: John runs quickly.

8 November 2009HLT - Sentence Grammar8 Word Class Characteristics Different word classes have characteristic subclasses and properties SubclassesProperties Noun proper; mass; count number; gender Verb transitive; intransitive Number; gender; person, tense Adjective dimension; age; colour Number, gender

9 November 2009HLT - Sentence Grammar9 Phrases Longer phrases may be used rather than a single word, but fulfilling the same role in a sentence. –Noun phrases refer to objects: four fried chickens. –Verb phrases state what the noun phrase does: kicks the dog. –Adjective phrases describe/qualify an object: sickly sweet. –Adverbial phrases describe how actions are done: very carefully. –prepositional phrases: add information to a verb phrase: on the table

10 November 2009HLT - Sentence Grammar10 Phrases can be Complex e.g. Noun Phrases Proper Name or Pronoun: Monday; it Specifier, noun: the day Specifiers, premodifier, noun: the first wet day Specifiers, premodifier, noun, postmodifier: The first wet day that I enjoyed in June

11 November 2009HLT - Sentence Grammar11 But they all fit the same context Monday It The day The first wet day The first wet day that I enjoyed in June was sunny.

12 November 2009HLT - Sentence Grammar12 Clauses A clause is a combination of noun phrases and verb phrases Clauses can exist at the top level (main clause) or can be embedded (subordinate clause) –Top level clause is a sentence. E.g. The cat ate the mouse. –Embedded clause is subordinate e.g. John said that Sandy is sick. Unlike phrases, whole sentences can be used to say something complete, e.g. to state a fact or ask a question.

13 November 2009HLT - Sentence Grammar13 Different Kinds of Sentences Assertion: John ate the cat. Yes/No question: Did John eat the cat? Wh- question: What did John eat? Command: Eat the cat John! NB. All these forms share the same underlying semantic proposition.

14 November 2009HLT - Sentence Grammar14 Part II Context Free Grammar Rules

15 November 2009HLT - Sentence Grammar15 Formal Grammar A formal grammar consists of –Terminal Symbols (T) –Non Terminal Symbols (NT, disjoint from TS) –Start Symbol (a distinguished NT) –Rewrite rules of the form , where  and  are strings of symbols

16 November 2009HLT - Sentence Grammar16 Classes of Grammar TypeGrammarsLanguagesMachines 0Phrase Structure Recursively Enumerable TM 1Context Sensitive LBA 2Context Free PDA 3Regular FSA

17 November 2009HLT - Sentence Grammar17 Classes of Grammar Learnability Different classes of grammar result from various restrictions on the form of rules

18 November 2009HLT - Sentence Grammar18 Restrictions on Rules For all rules  Type 0 (unrestricted): no restrictions Type 1 (context sensitive): |  |  |  | Type 2 (context free):  is a single NT symbol Type 3 (regular) –Every rule is of the form A  aB or A  a where A,B  NT and a  T

19 November 2009HLT - Sentence Grammar19 Which Class for NLP? Type 3 (Regular). Good for morphology. Cannot handle central embedding of sentences. The man that John saw eating died. Type 2 (Context Free). OK but problems handling certain phenomena e.g. agreement. Type 1 (Context Sensitive). Computational properties not well understood. Too powerful. Type 0 (Turing). Too powerful.

20 November 2009HLT - Sentence Grammar20 Weak versus Strong Grammar class that is too restrictive –cannot characterise/discriminate exactly NL sentence structures. Grammar class that is too general –has the power to characterise/discriminate structures that don't exist in human languages. More general, higher complexity → less efficient computations.

21 November 2009HLT - Sentence Grammar21 Example Grammar Cabinet discusses police chief’s case French gunman kills four s  np vp np  n np  adj n np  n np vp  v np

22 November 2009HLT - Sentence Grammar22 Classifying the Symbols NT – symbols appearing on the left Start – symbol appearing only on the left from which every other symbol can be derived. T – symbols appearing only on the right To include words we also need special rules such as n  [police] n  [gunman] n  [four] Latter rules define the lexicon or “dictionary interface”.

23 November 2009HLT - Sentence Grammar23 Grammar Induces Phrase Structure French gunman kills four adj n v n np np vp s

24 November 2009HLT - Sentence Grammar24 Phrase Structure PS includes information about –precedence between constituents –dominance between constituents PS constitutes a trace of the rule applications used to derive a sentence PS does not tell you the order in which the rules were used

25 November 2009HLT - Sentence Grammar25 Procedural versus Declarative A grammar induces a structure but does not tell you how to discover that structure A grammar is declarative A parser is a procedure that, given a suitable representation of a grammar and a sentence, actually discovers the structure(s). A parser is procedural

26 November 2009HLT - Sentence Grammar26 Handling Linguistic Phenomena Different sentence-types Nested structures Agreement Multiwords Subcategories of verb

27 November 2009HLT - Sentence Grammar27 Different Sentence Types........Different Grammar Rules Declaratives John left. S → NP VP Imperatives Leave! S →VP Yes-No Questions Did John leave? S →Aux NP VP WH Questions When did John leave? S →Wh-word Aux NP VP

28 November 2009HLT - Sentence Grammar28 Recursively Nested Structures handled by.... Flights to Miami Flights to Miami from Boston Flights to Miami from Boston in April Flights to Miami from Boston in April on Friday Flights to Miami from Boston in April on Friday under $300. Flights to Miami from Boston in April on Friday under $300 with lunch.

29 November 2009HLT - Sentence Grammar29 Recursive Rules NP → N NP → NP PP PP → Preposition NP Flights from miami to boston

30 November 2009HLT - Sentence Grammar30 Ambiguity np  np pp pp  prep np (the man) (on the hill with a telescope by the sea) (the man on the hill) (with a telescope by the sea) (the man on the hill with a telescope)( by the sea) etc.

31 November 2009HLT - Sentence Grammar31 Handling Agreement NP → Determiner N Include these days, this day Exclude this days, these day NP → NPSing NP → NPPlur NPPlur → DetSing NSing NPPlur → DetPlur NPlur Agreement also includes number, gender, case. Danger: proliferation of categories/rules.

32 November 2009HLT - Sentence Grammar32 Handling Multiwords John ran up the stairs John rang up the doctor John ran the stairs up* John rang the doctor up John rang the doctor who lives in Paris up

33 November 2009HLT - Sentence Grammar33 Ordinary CF rules don’t work John rang up the doctor VP → V NP here V is multiword John rang the doctor up VP → V NP particle_from _V here, multiword has split into two parts challenge is to express the relation between the parts

34 November 2009HLT - Sentence Grammar34 Subcategorisation Intransitive verb: no object John disappeared John disappeared the cat* Transitive verb: one object John opened the window John opened* Ditransitive verb: two objects John gave Mary the book John gave Mary*

35 November 2009HLT - Sentence Grammar35 Subcategorisation Rules Intransitive verb: no object VP → V Transitive verb: one object VP → V NP Ditransitive verb: two objects VP → V NP NP If you take account of the category of items following the verb, there are about 40 different patterns like this in English.

36 November 2009HLT - Sentence Grammar36 Overgeneration A grammar should generate only sentences in the language. It should exclude sentences not in the language. s  n vp vp  v n  [John] v  [snore] v  [snores]

37 November 2009HLT - Sentence Grammar37 Undergeneration A grammar should generate all sentences in the language. There should not be sentences in the language that are not generated by the grammar. s  n vp vp  v n  [John] n  [gold] v  [found]

38 November 2009HLT - Sentence Grammar38 Appropriate Stuctures –A grammar should assign linguistically plausible structures. n v d a n John ate a juicy hamburger vp s n v d a n John ate a juicy hamburger np vp s np vs.

39 November 2009HLT - Sentence Grammar39 Criteria for Evaluating Grammars Does it undergenerate? Does it overgenerate? Does it assign appropriate structures to sentences it generates? Is it simple to understand? How many rules are there? Does it contain generalisations or is it just a collection of special cases? How ambiguous is it?

40 November 2009HLT - Sentence Grammar40 Tagsets The main parts of speech reflect naturally occurring occurrence data. Practical applications often make use of special tags which include additional information such as number and case. One of the most commonly used tagsets is the 45-tag Penn Treebank tagset, used for the Brown corpus.

41 November 2009HLT - Sentence Grammar41 Penn Treebank Tagset

42 November 2009HLT - Sentence Grammar42 POS Tagging The grand jury commented on a number of other topics The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS./. POS Tagger

43 November 2009HLT - Sentence Grammar43 Treebanks Treebanks are corpora in which each sentence has been paired with a parse tree (presumably the right one). These are generally created –By first parsing the collection with an automatic parser –And then having human annotators correct each parse as necessary. This requires detailed annotation guidelines that provide a –POS tagset, –a grammar and –instructions for how to deal with particular grammatical constructions.

44 November 2009HLT - Sentence Grammar44 Penn Treebank Penn Treebank is a widely used treebank maintained by the Linguistic Data Consortium. The Penn Treebank Project annotates naturally- occurring text for linguistic structure. Contains skeletal parses showing rough syntactic and semantic information -- a bank of linguistic trees. Most well known is the Wall Street Journal section containing 1M words from the 1987-1989 Wall Street Journal.

45 November 2009HLT - Sentence Grammar45 Penn Treebank Example

46 November 2009HLT - Sentence Grammar46 Treebank Grammars Treebanks implicitly define a grammar for the language covered in the treebank. Simply take the local rules that make up the sub-trees in all the trees in the collection and you have a grammar. Not complete, but if you have decent size corpus, you’ll have a grammar with decent coverage.

47 November 2009HLT - Sentence Grammar47 Treebank Grammars Such grammars tend to be very flat due to the fact that they tend to avoid recursion. For example, the Penn Treebank has 4500 different rules for VPs. Among them...

48 November 2009HLT - Sentence Grammar48 Heads in Trees Finding heads in treebank trees is a task that arises frequently in many applications. –Particularly important in statistical parsing We can visualize this task by annotating the nodes of a parse tree with the heads of each corresponding node.

49 November 2009HLT - Sentence Grammar49 Lexically Decorated Tree

50 November 2009HLT - Sentence Grammar50 Head Finding The standard way to do head finding is to use a simple set of tree traversal rules specific to each non-terminal in the grammar.

51 November 2009HLT - Sentence Grammar51 Noun Phrases

52 November 2009HLT - Sentence Grammar52 Treebank Uses Treebanks (and headfinding) are particularly critical to the development of statistical parsers. Also valuable for Corpus Linguistics –Investigating the empirical details of various constructions in a given language

53 November 2009HLT - Sentence Grammar53 Dependency Grammars In CFG-style phrase-structure grammars the main focus is on constituents. But it turns out you can get a lot done with just binary relations among the words in an utterance. In a dependency grammar framework, a parse is a tree where –the nodes stand for the words in an utterance –The links between the words represent dependency relations between pairs of words. –Relations may be typed (labeled), or not.

54 November 2009HLT - Sentence Grammar54 Dependency Relations

55 November 2009HLT - Sentence Grammar55 Dependency Parse They hid the letter on the shelf

56 November 2009HLT - Sentence Grammar56 Dependency Parsing The dependency approach has a number of advantages over full phrase-structure parsing. –Deals well with free word order languages where the constituent structure is quite fluid –Parsing is much faster than CFG-based parsers –Dependency structure often captures the syntactic relations needed by later applications CFG-based approaches often extract this same information from trees anyway.

57 November 2009HLT - Sentence Grammar57 Dependency Parsing There are two modern approaches to dependency parsing –Optimization-based approaches that search a space of trees for the tree that best matches some criteria –Shift-reduce approaches that greedily take actions based on the current word and state.

58 November 2009HLT - Sentence Grammar58 Summary Context-free grammars can be used to model various facts about the syntax of a language. When paired with parsers, such grammars consititute a critical component in many applications. Constituency is a key phenomenon easily captured with CFG rules. –But certain linguistic phenomena pose significant problems Dependency parsing is suitable for languages with free word order Treebanks pair sentences in corpus with their corresponding trees.


Download ppt "November 2009HLT - Sentence Grammar1 Human Language Technology Sentence Grammar."

Similar presentations


Ads by Google