CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 26: Probabilistic Parsing.

Slides:

Advertisements

Similar presentations

Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.

Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.

CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 24-25: Argmax Based Computation.

1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.

Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.

(Some issues in) Text Ranking. Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran.

Pushpak Bhattacharyya CSE Dept., IIT Bombay

1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Bayesian Networks. Male brain wiring Female brain wiring.

Text Classification, Active/Interactive learning.

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.

SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 17: Parsing Ambiguity, Probabilistic Parsing, sample seminar 17.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

Classification Techniques: Bayesian Classification

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.

Albert Gatt Corpora and Statistical Methods Lecture 11.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

CSA2050 Introduction to Computational Linguistics Parsing I.

PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.

CS621: Artificial Intelligence

NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

DERIVATION S RULES USEDPROBABILITY P(s) = Σ j P(T,S) where t is a parse of s = Σ j P(T) P(T) – The probability of a tree T is the product.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.

Improving the Classification of Unknown Documents by Concept Graph Morteza Mohagheghi Reza Soltanpour

Statistical Models for Automatic Speech Recognition Lukáš Burget.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 9– Uncertainty.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:

Sentiment analysis algorithms and applications: A survey

Statistical Models for Automatic Speech Recognition

CS : Speech, NLP and the Web/Topics in AI

Probabilistic and Lexicalized Parsing

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24– Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing)

Statistical Models for Automatic Speech Recognition

CS : Language Technology for the Web/Natural Language Processing

Pushpak Bhattacharyya CSE Dept., IIT Bombay

Information Retrieval

David Kauchak CS159 – Spring 2019

Presentation transcript:

CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 26: Probabilistic Parsing

Example of Sentence labeling: Parsing [ S1 [ S [ S [ VP [ VB Come][ NP [ NNP July]]]] [,,] [ CC and] [ S [ NP [ DT the] [ JJ IIT] [ NN campus]] [ VP [ AUX is] [ ADJP [ JJ abuzz] [ PP [ IN with] [ NP [ ADJP [ JJ new] [ CC and] [ VBG returning]] [ NNS students]]]]]] [..]]]

Noisy Channel Modeling Noisy Channel Source sentence Target parse T*= argmax [P(T|S)] T = argmax [P(T).P(S|T)] T = argmax [P(T)], since given the parse the Tsentence is completely determined and P(S|T)=1

Corpus A collection of text called corpus, is used for collecting various language data With annotation: more information, but manual labor intensive Practice: label automatically; correct manually The famous Brown Corpus contains 1 million tagged words. Switchboard: very famous corpora 2400 conversations, 543 speakers, many US dialects, annotated with orthography and phonetics

Discriminative vs. Generative Model W * = argmax (P(W|SS)) W Compute directly from P(W|SS) Compute from P(W).P(SS|W) Discriminative Model Generative Model

Notion of Language Models

Language Models N-grams: sequence of n consecutive words/chracters Probabilistic / Stochastic Context Free Grammars:  Simple probabilistic models capable of handling recursion  A CFG with probabilities attached to rules  Rule probabilities  how likely is it that a particular rewrite rule is used?

PCFGs Why PCFGs?  Intuitive probabilistic models for tree-structured languages  Algorithms are extensions of HMM algorithms  Better than the n-gram model for language modeling.

Formal Definition of PCFG A PCFG consists of  A set of terminals {w k }, k = 1,….,V {w k } = { child, teddy, bear, played…}  A set of non-terminals {N i }, i = 1,…,n {N i } = { NP, VP, DT…}  A designated start symbol N 1  A set of rules {N i   j }, where  j is a sequence of terminals & non-terminals NP  DT NN  A corresponding set of rule probabilities

Rule Probabilities  Rule probabilities are such that E.g., P( NP  DT NN) = 0.2 P( NP  NN) = 0.5 P( NP  NP PP) = 0.3  P( NP  DT NN) = 0.2  Means 20 % of the training data parses use the rule NP  DT NN

Probabilistic Context Free Grammars S  NP VP1.0 NP  DT NN0.5 NP  NNS0.3 NP  NP PP 0.2 PP  P NP1.0 VP  VP PP 0.6 VP  VBD NP0.4 DT  the1.0 NN  gunman0.5 NN  building0.5 VBD  sprayed 1.0 NNS  bullets1.0

Example Parse t 1` The gunman sprayed the building with bullets. S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunman sprayed P (t 1 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = VP 0.4

Is NLP Really Needed

Post-1 POST----5 TITLE: "Wants to invest in IPO? Think again" | Hereâ€™s a sobering thought for those who believe in investing in IPOs. Listing gains â€” the return on the IPO scrip at the close of listing day over the allotment price â€” have been falling substantially in the past two years. Average listing gains have fallen from 38% in 2005 to as low as 2% in the first half of 2007.Of the 159 book-built initial public offerings (IPOs) in India between 2000 and 2007, two-thirds saw listing gains. However, these gains have eroded sharply in recent years.Experts say this trend can be attributed to the aggressive pricing strategy that investment bankers adopt before an IPO. â€œWhile the drop in average listing gains is not a good sign, it could be due to the fact that IPO issue managers are getting aggressive with pricing of the issues,â€ says Anand Rathi, chief economist, Sujan Hajra.While the listing gain was 38% in 2005 over 34 issues, it fell to 30% in 2006 over 61 issues and to 2% in 2007 till mid-April over 34 issues. The overall listing gain for 159 issues listed since 2000 has been 23%, according to an analysis by Anand Rathi Securities.Aggressive pricing means the scrip has often been priced at the high end of the pricing range, which would restrict the upward movement of the stock, leading to reduced listing gains for the investor. It also tends to suggest investors should not indiscriminately pump in money into IPOs.But some market experts point out that India fares better than other countries. â€œInternationally, there have been periods of negative returns and low positive returns in India should not be considered a bad thing.

Post-2 POST----7TITLE: "[IIM-Jobs] ***** Bank: International Projects Group - Manager"| Please send your CV & cover letter to ***** Bank, through its International Banking Group (IBG), is expanding beyond the Indian market with an intent to become a significant player in the global marketplace. The exciting growth in the overseas markets is driven not only by India linked opportunities, but also by opportunities of impact that we see as a local player in these overseas markets and / or as a bank with global footprint. IBG comprises of Retail banking, Corporate banking & Treasury in 17 overseas markets we are present in. Technology is seen as key part of the business strategy, and critical to business innovation & capability scale up. The International Projects Group in IBG takes ownership of defining & delivering business critical IT projects, and directly impact business growth. Role: Manager Â– International Projects Group Purpose of the role: Define IT initiatives and manage IT projects to achieve business goals. The project domain will be retail, corporate & treasury. The incumbent will work with teams across functions (including internal technology teams & IT vendors for development/implementation) and locations to deliver significant & measurable impact to the business. Location: Mumbai (Short travel to overseas locations may be needed) Key Deliverables: Conceptualize IT initiatives, define business requirements

Sentiment Classification Positive, negative, neutral – 3 class Sports, economics, literature - multi class Create a representation for the document Classify the representation The most popular way of representing a document is feature vector (indicator sequence).

Established Techniques Naïve Bayes Classifier (NBC) Support Vector Machines (SVM) Neural Networks K nearest neighbor classifier Latent Semantic Indexing Decision Tree ID3 Concept based indexing

Successful Approaches The following are successful approaches as reported in literature. NBC – simple to understand and implement SVM – complex, requires foundations of perceptions

P(C + |D) > P(C - |D) Mathematical Setting We have training set A: Positive Sentiment Docs B: Negative Sentiment Docs Let the class of positive and negative documents be C + and C -, respectively. Given a new document D label it positive if Indicator/feature vectors to be formed

Priori Probability Docu ment VectorClassif ication D1V1+ D2V2- D3V3+.. D 4000 V Let T = Total no of documents And let |+| = M So,|-| = T-M Priori probability is calculated without considering any features of the new document. P(D being positive)=M/T

Apply Bayes Theorem Steps followed for the NBC algorithm: Calculate Prior Probability of the classes. P(C + ) and P(C - ) Calculate feature probabilities of new document. P(D| C + ) and P(D| C - ) Probability of a document D belonging to a class C can be calculated by Baye’s Theorem as follows: P(C|D) = P(C) * P(D|C) P(D) Document belongs to C +, if P(C + ) * P(D|C + ) > P(C - ) * P(D|C - )

Calculating P(D|C + ) Identify a set of features/indicators to represent a document and generate a feature vector (V D ). V D = Hence, P(D|C + ) = P(V D |C + ) = P( | C + ) = |, C + | | C + | Based on the assumption that all features are Independently Identically Distributed (IID) = P( | C + ) = P(x 1 |C + ) * P(x 2 |C + ) * P(x 3 |C + ) *…. P(x n |C + ) =∏ i=1 n P(x i |C + )

Baseline Accuracy Just on Tokens as features, 80% accuracy 20% probability of a document being misclassified On large sets this is significant

To improve accuracy… Clean corpora POS tag Concentrate on critical POS tags (e.g. adjective) Remove ‘objective’ sentences ('of' ones) Do aggregation Use minimal to sophisticated NLP