Text classification Day 35 LING 681.02 Computational Linguistics Harry Howard Tulane University.

Slides:



Advertisements
Similar presentations
Classifying text NLTK Chapter 6. Chapter 6 topics How can we identify particular features of language data that are salient for classifying it? How can.
Advertisements

Decision Trees Decision tree representation ID3 learning algorithm
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Classifying text NLTK Chapter 6. Chapter 6 topics How can we identify particular features of language data that are salient for classifying it? How can.
Indian Statistical Institute Kolkata
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
Chapter 1: Introduction to Pattern Recognition
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
Three kinds of learning
Lecturing with Digital Ink Richard Anderson University of Washington.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ELN – Natural Language Processing Giuseppe Attardi
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Structured programming 4 Day 34 LING Computational Linguistics Harry Howard Tulane University.
1 LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora Chien-Chung Huang Shui-Lung Chuang Lee-Feng Chien Presented by: Vu LONG.
Chapter 3 Developing an algorithm. Objectives To introduce methods of analysing a problem and developing a solution To develop simple algorithms using.
NLTK & Python Day 7 LING Computational Linguistics Harry Howard Tulane University.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Structured programming 3 Day 33 LING Computational Linguistics Harry Howard Tulane University.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
A first program Sept 14, 2010 – Day 3 Object-oriented Programming thru Video Games TIDE 1840 Harry Howard Tulane University.
Natural Language Processing Course Project: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Software Engineering Chapter 3 CPSC Pascal Brent M. Dingle Texas A&M University.
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
CLASSIFICATION: Ensemble Methods
Mebi 591D – BHI Kaggle Class Baselines kaggleclass.weebly.com/
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Information extraction 2 Day 37 LING Computational Linguistics Harry Howard Tulane University.
Prediction of Influencers from Word Use Chan Shing Hei.
Relation Alignment for Textual Entailment Recognition Cognitive Computation Group, University of Illinois Experimental ResultsTitle Mark Sammons, V.G.Vinod.
NLTK & Python Day 5 LING Computational Linguistics Harry Howard Tulane University.
Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :
Semantics Day 38 LING Computational Linguistics Harry Howard Tulane University.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Regularization (Additional)
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Finite-state automata Day 12 LING Computational Linguistics Harry Howard Tulane University.
CSC 594 Topics in AI – Text Mining and Analytics
M Machine Learning F# and Accord.net.
©2012 Paula Matuszek CSC 9010: Text Mining Applications Lab 3 Dr. Paula Matuszek (610)
NLTK & Python Day 8 LING Computational Linguistics Harry Howard Tulane University.
COMPUTATION WITH STRINGS 3 DAY 4 - 9/03/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Regular expressions Day 11 LING Computational Linguistics Harry Howard Tulane University.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
Applying Deep Neural Network to Enhance EMPI Searching
Softmax Classifier + Generalization
Control 3 Day /05/16 LING 3820 & 6820 Natural Language Processing
NLP 2 Day /07/16 LING 3820 & 6820 Natural Language Processing
Design and Implementation
Regular expressions 3 Day /26/16
Supervised vs. unsupervised Learning
Lecture 6: Introduction to Machine Learning
Overfitting and Underfitting
Basics of ML Rohan Suri.
Computation with strings 4 Day 5 - 9/09/16
CS639: Data Management for Data Science
Control 1 Day /30/16 LING 3820 & 6820 Natural Language Processing
THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU
Presentation transcript:

Text classification Day 35 LING Computational Linguistics Harry Howard Tulane University

18-Nov-2009LING , Prof. Howard, Tulane University2 Course organization 

Learning to classify text NLPP §6

18-Nov-2009LING , Prof. Howard, Tulane University4 Classification  What is it?  Supervision  A classifier is supervised if it is built on training corpora containing the correct label for each input.  This usually means that the program can calculate an error when the predicted label does not match the correct label.  A classifier is unsupervised if it is built on training corpora that does not contain the correct label for each input.  There is no way to calculate an error.

18-Nov-2009LING , Prof. Howard, Tulane University5 Diagram of supervised classification

18-Nov-2009LING , Prof. Howard, Tulane University6 Philosophical question  Does supervised classification work for the majority of stuff that you learned spontaneously as a child?  NO, life does not come neatly labelled.

18-Nov-2009LING , Prof. Howard, Tulane University7 Algorithm  Divide the corpus into three sets:  training set  test set  development (dev-test) set  Choose an initial set of features that will be used to classify the corpus.  The part of the program that looks for the features in the corpus is called a feature extractor.  Train the classifier on the training set.  Run it on the development set.  Refine the feature extractor from any errors produced on the development set.  Run the improved classifier on the test set.

18-Nov-2009LING , Prof. Howard, Tulane University8 Choosing the right features  Use too few, and the data will be underfitted.  The classifier is too vague and makes too many mistakes.  Use too many, and the data will be overfitted.  The classifier is too specific and will not generalize to new examples.

18-Nov-2009LING , Prof. Howard, Tulane University9 Example: gender id  What would the features be?  A female name ends in a, e, i.  A male name ends in k, o, r, s, t.  Explain how classification would work.  NLTK code pp

18-Nov-2009LING , Prof. Howard, Tulane University10 More examples  Classify movie reviews as positive or negative.  How?  Classify POS of words.  How?

18-Nov-2009LING , Prof. Howard, Tulane University11 Beyond the word  Look at word's context.  As we have seen, this is crucial to POS tagging.  Classify IMs as to dialogue acts that they instantiate.  What could be some such acts?  statement, emotion, yes-no question  How?  Recognizing textual entailment  … is the task of determining whether a given piece of text T entails another text called the "hypothesis".  How?

18-Nov-2009LING , Prof. Howard, Tulane University12 RTE example  T: Parviz Davudi was representing Iran at a meeting of the Shanghai Co-operation Organisation (SCO), the fledgling association that binds Russia, China and four former Soviet republics of central Asia together to fight terrorism.  H: China is a member of SCO.

Next time Finish NLPP §6 Go on to NLPP §7 Extracting info from text