Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050.

Similar presentations


Presentation on theme: "CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050."— Presentation transcript:

1 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca1 Final HPSGs Cleaning up and final aspects, semantics, overview to statistical NLP

2 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca2 HPSGs An Overlooked Topic: Complements vs. Modifiers Intuitive idea: Complements introduce essential participants in the situation denoted; modifiers refine the description. Generally accepted distinction, but disputes over individual cases. Linguists rely on heuristics to decide how to analyze questionable cases (usually PPs).

3 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca3 HPSGs Heuristics for Complements vs. Modifiers Obligatory PPs are usually complements. Temporal & locative PPs are usually modifiers. An entailment test: If X Ved (NP) PP does not entail X did something PP, then the PP is a complement. Examples – Pat relied on Chris does not entail Pat did something on Chris – Pat put nuts in a cup does not entail Pat did something in a cup – Pat slept until noon does entail Pat did something until noon – Pat ate lunch at Bytes does entail Pat did something at Bytes

4 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca4 HPSGs Agreement Two kinds so far (namely?) Both initially handled via stipulation in theHead- Specifier Rule But if we want to use this rule for categories that don’t have the AGR feature (such as PPs and APs, in English), we can’t build it into the rule.

5 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca5 HPSGs The Specifier-Head Agreement Constraint (SHAC) Verbs and nouns must be specified as:

6 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca6 HPSGs The Count/Mass Distinction Partially semantically motivated – mass terms tend to refer to undifferentiated substances (air, butter, courtesy, information) – count nouns tend to refer to individuatable entities (bird, cookie, insult, fact) But there are exceptions: – succotash (mass) denotes a mix of corn & lima beans, so it’s not undifferentiated. – furniture, footwear, cutlery, etc. refer to individuatable artifacts with mass terms – cabbage can be either count or mass, but many speakers get lettuce only as mass.

7 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca7 HPSGs – Semantics The Linguist’s stance: Building a precise model Some statements are statements about how the model works: “[prep] and [AGR 3sing] cannot be combined because AGR is not a feature of the type prep.” Some statements are statements about how (we think) English or language in general works. “The determiners a and many only occur with count nouns, the determiner much only occurs with mass nouns, and the determiner the occurs with either.” Some are statements about how we code a particular linguistic fact within the model. “All count nouns are [SPR ].”

8 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca8 HPSGs – Semantics The Linguist’s stance:A Vista on the Set of Possible English Sentences... as a background against which linguistic elements (words, phrases) have a distribution... as an arena in which linguistic elements “behave” in certain ways

9 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca9 HPSGs - Semantics So far, our “grammar” has no semantic representations. We have, however, been relying on semantic intuitions in our argumentation, and discussing semantic contrasts where they line up (or don't) with syntactic ones. Examples? structural ambiguity S/NP parallelism count/mass distinction complements vs. modifiers

10 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca10 HPSGs - Semantics Aspects of meaning we won’t account for Pragmatics Fine-grained lexical semantics:

11 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca11 HPSGs - Semantics Our Slice of a World of Meanings “... the linguistic meaning of Chris saved Pat is a proposition that will be true just in case there is an actual situation that involves the saving of someone named Pat by someone named Chris.”

12 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca12 HPSGs - Semantics Our Slice of a World of Meanings What we are accounting for is the compositionality of sentence meaning. How the pieces fit together Semantic arguments and indices How the meanings of the parts add up to the meaning of the whole. Appending RESTR lists up the tree

13 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca13 HPSGs – Semantics in Constraint-based grammar Constraints as generalized truth conditions proposition: what must be the case for a proposition to be true directive: what must happen for a directive to be fulfilled question: the kind of situation the asker is asking about reference: the kind of entity the speaker is referring to Syntax/semantics interface: Constraints on how syntactic arguments are related to semantic ones, and on how semantic information is compiled from different parts of the sentence.

14 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca14 HPSGs – Semantics – Feature Geometry

15 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca15 HPSGs – Semantics – How the pieces fit together

16 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca16 HPSGs – Semantics – How the pieces fit together

17 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca17 HPSGs – Semantics – How the pieces fit together

18 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca18 HPSGs – Semantics (pieces together)

19 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca19 HPSGs – Semantics (more detailed view of same tree)

20 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca20 HPSGs – Semantics To Fill in Semantics for the S-node, we need the Semantics Principles The Semantic Inheritance Principle: In any headed phrase, the mother's MODE and INDEX are identical to those of the head daughter. The Semantic Compositionality Principle: In any well-formed phrase structure, the mother's RESTR value is the sum of the RESTR values of the daughter.

21 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca21 HPSGs – Semantics – semantics inheritance illustrated

22 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca22 HPSGs – Semantics - semantic compositionality illustrated

23 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca23 HPSGs – Semantics – what identifies indices

24 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca24 HPSGs – Semantics – summary words contribute predications ‘expose’ one index in those predications, for use by words or phrases relate syntactic arguments to semantic arguments

25 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca25 HPSGs – Semantics – summary, grammar rules identify feature structures (including the INDEX value) across daughters Head Specifier Rule Head Complement Rule Head Modifier Rule

26 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca26 HPSGs – Semantics – summary, grammar rules identify feature structures (including the INDEX value) across daughters license trees which are subject to the semantic principles - SIP ‘passes up’ MODE and INDEX from head daughter

27 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca27 HPSGs – Semantics – summary, grammar rules identify feature structures (including the INDEX value) across daughters license trees which are subject to the semantic principles -SIP ‘passes up’ MODE and INDEX from head daughter -SCP: ‘gathers up’ predications (RESTR list) from all daughters

28 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca28 HPSGs – other aspects of semantics Tense, Quantification (only touched on here) Modification Coordination Structural Ambiguity

29 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca29 HPSGs – what were are trying to do Objectives Develop a theory of knowledge of language Represent linguistic information explicitly enough to distinguish well-formed from ill-formed expressions Be parsimonious, capturing linguistically significant generalizations. Why Formalize? To formulate testable predictions To check for consistency To make it possible to get a computer to do it for us

30 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca30 HPSGs –how we construct sentences The Components of Our Grammar Grammar rules Lexical entries Principles Type hierarchy (very preliminary, so far) Initial symbol (S, for now) We combine constraints from these components. Question: What says we have to combine them?

31 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca31 HPSGs – an example A cat slept. Can we build this with our tools? Given the constraints our grammar puts on well-formed sentences, is this one?

32 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca32 HPSGs – lexical entry for “a” Is this a fully specified description? What features are unspecified? How many word structures can this entry license?

33 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca33 HPSGs – lexical entry for “cat” Which feature paths are abbreviated and Is this fully specified? What features are unspecified? How many word structures can this entry license?

34 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca34 HPSGs - Effect of Principles: the SHAC

35 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca35 HPSGs - Description of Word Structures for cat

36 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca36 HPSGs - Description of Word Structures for a

37 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca37 HPSGs - Building a Phrase

38 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca38 HPSGs - Constraints Contributed by Daughter Subtrees

39 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca39 HPSGs - Constraints Contributed by the Grammar Rule

40 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca40 HPSGs - A Constraint Involving the SHAC

41 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca41 HPSGs - Effects of the Valence Principle

42 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca42 HPSGs - Effects of the Head Feature Principle

43 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca43 HPSGs - Effects of the Semantic Inheritance Principle

44 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca44 HPSGs - Effects of the Semantic Compositionality Principle

45 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca45 HPSGs - Is the Mother Node Now Completely Specified?

46 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca46 HPSGs - Lexical Entry for slept

47 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca47 HPSGs - Another Head-Specifier Phrase

48 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca48 HPSGs - Is this description fully specified?

49 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca49 HPSGs - Does the top node satisfy the initial symbol?

50 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca50 HPSGs - RESTR of the S node

51 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca51 HPSGs – Another example

52 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca52 HPSGs - Head Features from Lexical Entries

53 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca53 HPSGs - Head Features from Lexical Entries, plus HFP

54 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca54 HPSGs - Valence Features:Lexicon, Rules, and the Valence Principle

55 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca55 HPSGs - Required Identities: Grammar Rules

56 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca56 HPSGs - Two Semantic Features: the Lexicon & SIP

57 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca57 HPSGs - RESTR Values and the SCP

58 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca58 HPSGs - An Ungrammatical Example What’s wrong with this sentence? The Valence Principle, Head Specifier Rule

59 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca59 HPSGs – Overview Information movement in trees Exercise in critical thinking SPR and COMPS Technical details (lexical entries, trees) Analogies to other systems you might know, e.g., How is the type hierarchy like an ontology?

60 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca60 Statistical NLP – Introduction NLP as we have examined thus far can be contrasted with statistical NLP. For example, statistical parsing researchers assue that there is a continuum and that the only distinction to be drawn is between the correct parse and all the rest. The “parse” given by the parse tree on the right would support this continuum view. For statistical NLP researchers, there is no Difference between parsing and syntactic Disambiguation: its parsing all the way!

61 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca61 Statistical NLP – Statistical NLP is normally taught in 2 parts: Part I lays out the mathematical and linguistic foundation that the other parts build on. These include concepts and techniques normally referred to throughout the course. Part II covers word-centered work in Statistical NLP. There is a natural progression from simple to complex linguistic phenomena in collocations, n-gram models, word sense disambiguation, and lexical acquisition. This work is followed by techniques such as Markov Models, tagging, probabilistic context free grammars, and probabilistic parsing, which build on each other. Finally other applications and techniques are introduced: statistical alignment and machine translation, clustering, information retrieval, and text categorization.

62 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca62 Statistical NLP – What we will discuss 1. Information Retrieval and the Vector Space Model Typical IR system architecture, steps in document and query processing in IR, vector space model, tfidf - term frequency inverse document frequency weights, term weighting formula, cosine similarity measure, term-by- document matrix, reducing the number of dimensions, Latent Semantic Analysis, IR evaluation

63 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca63 Statistical NLP - – What we will discuss 2. Text Classification Text classification and text clustering, Types of text classification, evaluation measures in text classification, F-measure, Evaluation methods for classification: general issues - over fitting and under fitting, methods: 1. training error, 2. train and test, 3. n-fold cross-validation

64 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca64 Statistical NLP - – What we will discuss 3. Parser Evaluation, Text Clustering and CNG Classification Parser evaluation: PARSEVAL measures, labeled and unlabeled precision and recall, F-measure; Text clustering: task definition, the simple k-means method, hierarchical clustering, divisive and agglomerative clustering; evaluation of clustering: inter-cluster similarity, cluster purity, use of entropy or information gain; CNG -- Common N-Grams classification method

65 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca65 Statistical NLP - – What we will discuss 4. Probabilistic Modeling and Joint Distribution Model Elements of probability theory, Generative models, Bayesian inference, Probabilistic modeling: random variables, random configurations, computational tasks in probabilistic modeling, spam detection example, joint distribution model, drawbacks of joint distribution model

66 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca66 Statistical NLP - – What we will discuss 5. Fully Independent Model and Naive Bayes Model Fully independent model, example, computational tasks, sum-product formula; Naive Bayes model: motivation, assumption, computational tasks, example, number of parameters, pros and cons; N-gram model, language modeling in speech recognition

67 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca67 Statistical NLP - – What we will discuss 6. N-gram Model N-gram model: n-gram model assumption, graphical representation, use of log probabilities; Markov chain: stochastic process, Markov process, Markov chain; Perplexity and evaluation of N-gram models, Text classification using language models

68 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca68 Statistical NLP - – What we will discuss 7. Hidden Markov Model Smoothing: Add-one (Laplace) smoothing, Bell-Witten smoothing; Hidden Markov Model, graphical representations, assumption, HMM POS example, Viterbi algorithm -- use of dynamic programming in HMMs.

69 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca69 Statistical NLP - – What we will discuss 8. Bayesian Networks Bayesian Networks, definition, example, Evaluation tasks in Bayesian Networks: evaluation, sampling, inference in Bayesian Networks by brute force, general inference in Bayesian Networks is NP-hard, efficient inference in Bayesian Networks,

70 CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca70 Other Concluding Remarks ATOMYRIADES Nature, it seems, is the popular name for milliards and milliards and milliards of particles playing their infinite game of billiards and billiards and billiards.


Download ppt "CSE6339 3.0 Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone - 3050."

Similar presentations


Ads by Google