Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting.

Similar presentations


Presentation on theme: "Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting."— Presentation transcript:

1 Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting Some slides taken from Helmut Schmid, Rada Mihalcea, Bonnie Dorr, Leila Kosseim, Peter Flach and others

2 Topic  Statistical Natural Language Processing  Applies Machine Learning / Statistics to  Learning : the ability to improve one’s behaviour at a specific task over time - involves the analysis of data (statistics) Natural Language Processing  Following parts of the book Statistical NLP (Manning and Schuetze), MIT Press, 1999.

3 Contents  Motivation  Zipf’s law  Some natural language processing tasks  Non-probabilistic NLP models Regular grammars and finite state automata Context-Free Grammars Definite Clause Grammars  Motivation for statistical NLP  Overview of the rest of this part

4 Rationalism versus Empiricism  Rationalist Noam Chomsky - innate language structures AI : hand coding NLP Dominant view 1960-1985 Cf. e.g. Steven Pinker’s The language instinct. (popular science book)  Empiricist Ability to learn is innate AI : language is learned from corpora Dominant 1920-1960 and becoming increasingly important

5 Rationalism versus Empiricism  Noam Chomsky: But it must be recognized that the notion of “probability of a sentence” is an entirely useless one, under any known interpretation of this term  Fred Jelinek (IBM 1988) Every time a linguist leaves the room the recognition rate goes up. (Alternative: Every time I fire a linguist the recognizer improves)

6 This course  Empiricist approach Focus will be on probabilistic models for learning of natural language  No time to treat natural language in depth ! (though this would be quite useful and interesting) Deserves a full course by itself  Covered in more depth in Logic, Language and Learning (SS 05, prob. SS 06)

7 Ambiguity

8 Statistical Disambiguation Define a probability model for the data Compute the probability of each alternative Choose the most likely alternative NLP and Statistics

9 Statistical Methods deal with uncertainty. They predict the future behaviour of a system based on the behaviour observed in the past.  Statistical Methods require training data. The data in Statistical NLP are the Corpora NLP and Statistics

10  Corpus: text collection for linguistic purposes  Tokens How many words are contained in Tom Sawyer?  71.370  Types How many different words are contained in T.S.?  8.018  Hapax Legomena words appearing only once Corpora

11  The most frequent words are function words wordfreqwordfreq the3332in906 and2972that877 a1775he877 to1725I783 of1440his772 was1161you686 it1027Tom679 Word Counts

12 f n f 13993 21292 3 664 4 410 5 243 6 199 7 172 8 131 9 82 10 91 11-50 540 51-100 99 > 100 102 How many words appear f times? Word Counts About half of the words occurs just once About half of the text consists of the 100 most common words ….

13 Word Counts (Brown corpus)

14

15 wordfr f*rwordfr f*r the33321 3332turned5120010200 and29722 5944you‘ll30300 9000 a17753 5235name21400 8400 he87710 8770comes16500 8000 but41020 8400group13600 7800 be29430 8820lead11700 7700 there22240 8880friends10800 8000 one17250 8600begin9900 8100 about15860 9480family81000 8000 more13870 9660brushed42000 8000 never12480 9920sins23000 6000 Oh1169010440Could24000 8000 two10410010400Applausive18000 8000 Zipf‘s Law: f~1/r (f*r = const) Zipf‘s Law Minimize effort

16 Conclusions  Overview of some probabilistic and machine learning methods for NLP  Also very relevant to bioinformatics ! Analogy between parsing  A sentence  A biological string (DNA, protein, mRNA, …)

17 Language and sequences  Natural language processing Is concerned with the analysis of sequences of words / sentences Construction of language models  Two types of models Non-probabilistic Probabilistic

18 Human Language is highly ambiguous at all levels acoustic level recognize speech vs. wreck a nice beach morphological level saw: to see (past), saw (noun), to saw (present, inf) syntactic level I saw the man on the hill with a telescope semantic level One book has to be read by every student Key NLP Problem: Ambiguity

19 Language Model  A formal model about language  Two types Non-probabilistic  Allows one to compute whether a certain sequence (sentence or part thereof) is possible  Often grammar based Probabilistic  Allows one to compute the probability of a certain sequence  Often extends grammars with probabilities

20 Example of bad language model

21 A bad language model

22

23 A good language model  Non-Probabilistic “I swear to tell the truth” is possible “I swerve to smell de soup” is impossible  Probabilistic P(I swear to tell the truth) ~.0001 P(I swerve to smell de soup) ~ 0

24 Why language models ?  Consider a Shannon Game Predicting the next word in the sequence  Statistical natural language ….  The cat is thrown out of the …  The large green …  Sue swallowed the large green …  …  Model at the sentence level

25 Applications  Spelling correction  Mobile phone texting  Speech recognition  Handwriting recognition  Disabled users  …

26 Spelling errors  They are leaving in about fifteen minuets to go to her house.  The study was conducted mainly be John Black.  Hopefully, all with continue smoothly in my absence.  Can they lave him my messages?  I need to notified the bank of….  He is trying to fine out.

27 Handwriting recognition  Assume a note is given to a bank teller, which the teller reads as I have a gub. (cf. Woody Allen)  NLP to the rescue …. gub is not a word gun, gum, Gus, and gull are words, but gun has a higher probability in the context of a bank

28 For Spell Checkers  Collect list of commonly substituted words piece/peace, whether/weather, their/there...  Example: “On Tuesday, the whether …’’ “On Tuesday, the weather …”

29 Another dimension in language models  Do we mainly want to infer (probabilities) of legal sentences / sequences ? So far  Or, do we want to infer properties of these sentences ? E.g., parse tree, part-of-speech-tagging Needed for understanding NL  Let’s look at some tasks

30 Sequence Tagging  Part-of-speech tagging He drives with his bike N V PR PN N noun, verb, preposition, pronoun, noun  Text extraction The job is that of a programmer X X X X X X JobType The seminar is taking place from 15.00 to 16.00 X X X X X X Start End

31 Sequence Tagging  Predicting the secondary structure of proteins, mRNA, … X = A,F,A,R,L,M,M,A,… Y = he,he,st,st,st,he,st,he, …

32 Parsing  Given a sentence, find its parse tree  Important step in understanding NL

33 Parsing  In bioinformatics, allows to predict (elements of) structure from sequence

34 Language models based on Grammars  Grammar Types Regular grammars and Finite State Automata Context-Free Grammars Definite Clause Grammars  A particular type of Unification Based Grammar (Prolog)  Distinguish lexicon from grammar Lexicon (dictionary) contains information about words, e.g.  word - possible tags (and possibly additional information)  flies - V(erb) - N(oun) Grammar encode rules

35 Grammars and parsing  Syntactic level best understood and formalized  Derivation of grammatical structure: parsing (more than just recognition)  Result of parsing mostly parse tree: showing the constituents of a sentence, e.g. verb or noun phrases  Syntax usually specified in terms of a grammar consisting of grammar rules

36 Regular Grammars and Finite State Automata  Lexical information - which words are ? Det(erminer) N(oun) Vi (intransitive verb) - no argument Pn (pronoun) Vt (transitive verb) - takes an argument Adj (adjective)  Now accept The cat slept Det N Vi  As regular grammar S -> [Det] S1 % [ ] : terminal S1 -> [N] S2 S2 -> [Vi]  Lexicon The - Det Cat - N Slept - Vi …

37 Finite State Automaton  Sentences John smiles - Pn Vi The cat disappeared - Det N Vi These new shoes hurt - Det Adj N Vi John liked the old cat PN Vt Det Adj N

38 Phrase structure S NP DN VP NPV D N PP PNP D N the dog chased acatintothegarden

39 Notation  S : sentence  D or Det : Determiner (e.g., articles)  N : noun  V : verb  P : preposition  NP : noun phrase  VP : verb phrase  PP : prepositional phrase

40 Context Free Grammar S-> NP VP NP-> D N VP-> V NP VP-> V NP PP PP-> P NP D-> [the] D-> [a] N-> [dog] N-> [cat] N-> [garden] V-> [chased] V-> [saw] P-> [into] Terminals ~ Lexicon

41 Phrase structure  Formalism of context-free grammars Nonterminal symbols: S, NP, VP,... Terminal symbols: dog, cat, saw, the,...  Recursion „The girl thought the dog chased the cat“ VP-> V, S N-> [girl] V-> [thought]

42 Top-down parsing  S -> NP VP  S -> Det N VP  S -> The N VP  S -> The dog VP  S -> The dog V NP  S -> The dog chased NP  S -> The dog chased Det N  S-> The dog chased the N  S-> The dog chased the cat

43 Context-free grammar S--> NP,VP. NP--> PN. %Proper noun NP--> Art, Adj, N. NP--> Art,N. VP--> VI. %intransitive verb VP--> VT, NP. %transitive verb Art--> [the]. Adj--> [lazy]. Adj--> [rapid]. PN--> [achilles]. N--> [turtle]. VI--> [sleeps]. VT--> [beats].

44 Parse tree S NP NP VP VP Art Art Adj Adj N Vt Vt NP NP PN PN achillesbeatsturtlerapidthe

45 Definite Clause Grammars Non-terminals may have arguments S--> NP(N),VP(N). NP(N)--> Art(N),N(N). VP(N)--> VI(N). Art(singular)--> [a]. Art(singular)--> [the]. Art(plural)--> [the]. N(singular)--> [turtle]. N(plural)--> [turtles]. VI(singular)--> [sleeps]. VI(plural)--> [sleep]. Number Agreement

46 DCGs  Non-terminals may have arguments Variables (start with capital)  E.g. Number, Any Constants (start with lower case)  E.g. singular, plural Structured terms (start with lower case, and take arguments themselves)  E.g. vp(V,NP)  Parsing needs to be adapted Using unification

47 Unification in a nutshell (cf. AI course)  Substitutions E.g. {Num / singular } {T / vp(V,NP)}  Applying substitution Simultaneously replace variables by corresponding terms S(Num) {Num / singular } = S(singular)

48 Unification  Take two non-terminals with arguments and compute (most general) substitution that makes them identical, e.g., Art(singular) and Art(Num)  Gives { Num / singular } Art(singular) and Art(plural)  Fails Art(Num1) and Art(Num2)  {Num1 / Num2} PN(Num, accusative) and PN(singular, Case)  {Num/singular, Case/accusative}

49 Parsing with DCGs  Now require successful unification at each step  S -> NP(N), VP(N)  S -> Art(N), N(N), VP(N) {N/singular}  S -> a N(singular), VP(singular)  S -> a turtle VP(singular)  S -> a turtle sleeps  S-> a turtle sleep fails

50 Case Marking PN(singular,nominative)--> [he];[she] PN(singular,accusative)--> [him];[her] PN(plural,nominative)--> [they] PN(plural,accusative)--> [them] S --> NP(Number,nominative), NP(Number) VP(Number) --> V(Number), VP(Any,accusative) VP(Number,Case) --> PN(Number,Case) VP(Number,Any) --> Det, N(Number) He sees her. She sees him. They see her. But not Them see he.

51 DCGs  Are strictly more expressive than CFGs  Can represent for instance S(N) -> A(N), B(N), C(N) A(0) -> [] B(0) -> [] C(0) -> [] A(s(N)) -> A(N), [A] B(s(N)) -> B(N), [B] C(s(N)) -> C(N), [C]

52 Probabilistic Models  Traditional grammar models are very rigid, essentially a yes / no decision  Probabilistic grammars Define a probability models for the data Compute the probability of each alternative Choose the most likely alternative  Ilustrate on Shannon Game Spelling correction Parsing

53 Some probabilistic models  N-grams Predicting the next word Artificial intelligence and machine …. Statistical natural language ….  Probabilistic Regular (Markov Models) Hidden Markov Models Conditional Random Fields Context-free grammars (Stochastic) Definite Clause Grammars

54 Illustration  Wall Street Journal Corpus  3 000 000 words  Correct parse tree for sentences known Constructed by hand Can be used to derive stochastic context free grammars SCFG assign probability to parse trees  Compute the most probable parse tree

55

56 Sequences are omni-present  Therefore the techniques we will see also apply to Bioinformatics  DNA, proteins, mRNA, … can all be represented as strings Robotics  Sequences of actions, states, … …

57 Rest of the Course  Limitations traditional grammar models motivate probabilistic extensions Regular grammars and Finite State Automata  All use principles of Part I on Graphical Models  Markov Models using n-gramms  (Hidden) Markov Models  Conditional Random Fields As an example of using undirected graphical models Probabilistic Context Free Grammars Probabilistic Definite Clause Grammars


Download ppt "Part II. Statistical NLP Advanced Artificial Intelligence Introduction and Grammar Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting."

Similar presentations


Ads by Google