Midterm Review CS4705 Natural Language Processing.

2 Statistical v. Symbolic Processing –80/20 Rule Regular Expressions Finite State Automata –Determinism v. non-determinism –(Weighted) Finite State Transducers Morphology –Word Classes and p.o.s. –Inflectional v. Derivational –Affixation, infixation, concatenation –Morphotactics Midterm Review

3 –Different languages, different morphologies –Evidence from human performance Morphological parsing –Koskenniemi’s two-level morphology –FSAs vs. FSTs –Porter stemmer Noise channel model –Bayesian inference Spelling correction –Bayesian approach

4 –Minimum Edit Distance (Levenshtein distance) Dynamic Programming N-grams –Markov assumption –Chain Rule –Language Modeling Simple, Adaptive, Class-based (syntax-based) Smoothing –Add-one, Witten-Bell, Good-Turing Back-off models

5 Creating and using ngram LMs –Corpora –Maximum Likelihood Estimation Syntax –Chomsky’s view: Syntax is cognitive reality –Parse Trees Dependency Structure –Part-of-Speech Tagging Hand Written Rules v. Statistical v. Hybrid Brill Tagging

6 –Types of Ambiguity Context Free Grammars –Top-down v. Bottom-up Derivations Left Corners –Grammar Equivalence –Normal Forms (CNF) Probabilistic Parsing –CYK parser –Derivational Probability –Lexicalization

7 Machine Learning –Dependent v. Independent variables –Training v. Development Test v. Test sets –Feature Vectors –Metrics Accuracy Precision, Recall, F-Measure –Gold Standards

