CS621/CS449 Artificial Intelligence Lecture Notes

Slides:



Advertisements
Similar presentations
Introduction to Computational Linguistics
Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Statistical NLP Course for Master in Computational Linguistics 2nd Year Diana Trandabat.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Probabilistic inference
CS 4705 Lecture 13 Corpus Linguistics I. From Knowledge-Based to Corpus-Based Linguistics A Paradigm Shift begins in the 1980s –Seeds planted in the 1950s.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Lecture 5: Learning models using EM
Gobalisation Week 8 Text processes part 2 Spelling dictionaries Noisy channel model Candidate strings Prior probability and likelihood Lab session: practising.
Baysian Approaches Kun Guo, PhD Reader in Cognitive Neuroscience School of Psychology University of Lincoln Quantitative Methods 2011.
Computational Language Andrew Hippisley. Computational Language Computational language and AI Language engineering: applied computational language Case.
Metodi statistici nella linguistica computazionale The Bayesian approach to spelling correction.
1 11 Lecture 12 Overview of Probability and Random Variables (I) Fall 2008 NCTU EE Tzu-Hsien Sang.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
Incorporating New Information to Decision Trees (posterior probabilities) MGS Chapter 6 Part 3.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
BİL711 Natural Language Processing1 Statistical Language Processing In the solution of some problems in the natural language processing, statistical techniques.
Machine Learning Queens College Lecture 3: Probability and Statistics.
November 2005CSA3180: Statistics III1 CSA3202: Natural Language Processing Statistics 3 – Spelling Models Typing Errors Error Models Spellchecking Noisy.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 26 of 41 Friday, 22 October.
Lecture 2: Bayesian Decision Theory 1. Diagram and formulation
1 Foundations of Statistical Natural Language Processing By Christopher Manning & Hinrich Schutze Course Book.
Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.
Naive Bayes Classifier
NLP Language Models1 Language Models, LM Noisy Channel model Simple Markov Models Smoothing Statistical Language Models.
Prof. Pushpak Bhattacharyya, IIT Bombay.1 Application of Noisy Channel, Channel Entropy CS 621 Artificial Intelligence Lecture /09/05.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30 Uncertainty, Fuizziness.
Uncertainty in Expert Systems
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
Univariate Gaussian Case (Cont.)
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 24 of 41 Monday, 18 October.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Pattern Recognition Probability Review
Lecture 1.31 Criteria for optimal reception of radio signals.
Qian Liu CSE spring University of Pennsylvania
Naive Bayes Classifier
Statistical Language Models
Statistical Models for Automatic Speech Recognition
Bayes Net Learning: Bayesian Approaches
Conditional Probability, Bayes’ Theorem, and Belief Networks
Basic Probability Theory
Course: Autonomous Machine Learning
Statistical NLP: Lecture 4
CSA3180: Natural Language Processing
CPSC 503 Computational Linguistics
Natural Language Processing
Lecture 13 Corpus Linguistics I CS 4705.
LECTURE 23: INFORMATION THEORY REVIEW
LECTURE 15: REESTIMATION, EM AND MIXTURES
LECTURE 09: BAYESIAN LEARNING
CS621/CS449 Artificial Intelligence Lecture Notes
ARTIFICIAL INTELLIGENCE
Naïve Bayes CSC 576: Data Science.
Naïve Bayes Classifier
Presentation transcript:

CS621/CS449 Artificial Intelligence Lecture Notes Set 7: 08/10/2004, 15/10/2004 13/08/2004 CS-621/CS-449 Lecture Notes

Outline Probability, Statistics & AI The Speech Recognition problem Bayesian Decision Theory A Probabilistic Spell Checker 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes

Probability, Statistics & AI Importance of knowledge understood from 1950’s research on Machine Translation. Field of AI – characterized by the need for representation & processing of enormous amount of knowledge. Manual input of knowledge difficult. So, consider possibility of acquiring the knowledge or training E.g. Web data  knowledge - not easy by any means!! 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes

Probabilistic approaches? Fundamental question : Can probabilistic approaches be used? “Next-Word Guessing Program” – a speech recognition problem : Given w1, w2….wn , we need to guess wn+1 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes

The Speech Recognition Problem Text T1: I like Artificial Intelligence and Cognitive Sciences. They excite my imagination. Text T2: I hate communal riots. They disrupt life and work. 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes

Sample Word Frequencies Word-list w Frequency P(w) I 2 2/20 like 1 1/20 Artificial Intelligence and Cognitive Sciences They … …. 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes

Guessing the next word Word-guesses or Predictions = Next-Word (Model of domain, speaker, hearer) Having seen w1 = “I” taking next word as word with the highest probability, we have: Highly unlikely sequence!! w1 w2 I I/and/they??? 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes

A Better Solution Next word = word with highest conditional probability P(w2 | w1) P(like | I) = 0.5 P(hate | I) = 0.5 But, “disrupt”, “work”, “excite” are also possibilities not covered by this rule.  Language Model not adequate! Ideally, “I” is followed by not only “hate” and “like” but by verbs in general 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes

Bayesian Decision Theory Bayes Theorem : Given the random variables A & B, Where is the posterior probability (the conditional probability of A given that B has occurred), is the prior probability and is the likelihood of B given A 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes

Bayes Theorem Derivation 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes

Example It is known that in a population, 1 in 50000 has meningitis and 1 in 20 has stiff neck. It is also observed that 50% of the meningitis patients have stiff neck. A doctor observes that a patient has stiff neck. What is the probability that the patient has meningitis? Solution: Record the event probabilities P(s) & P(m) s – stiff neck, m - meningitis 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes

Probabilities P(m) = P(an individual has meningitis) = , P(s) = Prior probability = P(s | m) = 0.5 Posterior probability is l It is most likely that the person does not have meningitis since 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes

Some Issues Important question : Is P(m|s) greater or lesser than P(s |m)? P(m|s) could have been found as Some questions: Which is more reliable to compute, P(s|m) or P(m|s)? Which evidence is more sparse , P(s|m) or P(m|s)? Test of significance : The counts are always on a sample of population. Which probability count has sufficient statistics? 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes

A Probabilistic Spell Checker 3 broad problems: Non-word deletion : Apple  Aple Isolated word correction : Apple Aple Applet Maple Detection & correction from context (very difficult and needs full NLP) : piece  peace E.g. : The piece for which he struggled long and hard was stolen at night one day. The peace for which he struggled long and hard was destroyed in a single day with the outbreak of sect rivalries. 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes

Types of Errors Isolated word errors: Errors which come from typing Deletion : Apple  Aple Insertion : Apple  Applet Substitution : Apple  Apqle Transposition : Apple  Aplpe Errors from OCR (Optical Character Recognizer) Blank Deletion : I watchcricket Blank Insertion : I wa tch cricket Mangling : I wdch cricket at 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes

Noisy Channel Model The problem formulation for spell checker is based on the Noisy Channel Model s t (sn, sn-1, … , s1) (tm, tm-1, … , t1) Given t, find the most probable s : Find that ŝ for which P(s|t) is maximum, where s, t and ŝ are strings Is this a Bayesian problem case?? Noisy Channel ŝ 08/10/2004 - 15/10/2004 CS-621/CS-449 Lecture Notes