The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.

Slides:



Advertisements
Similar presentations
Data Mining Tools Overview Business Intelligence for Managers.
Advertisements

Classification Classification Examples
Machine learning continued Image source:
Naïve Bayes Advanced Statistical Methods in NLP Ling572 January 19,
Maximum Entropy Advanced Statistical Methods in NLP Ling 572 January 31, 2012.
POS Tagging & Chunking Sambhav Jain LTRC, IIIT Hyderabad.
Maximum Entropy Model (I) LING 572 Fei Xia Week 5: 02/05-02/07/08 1.
Final review LING572 Fei Xia Week 10: 03/13/08 1.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
Reducing Multiclass to Binary LING572 Fei Xia Week 9: 03/04/08.
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Decision List LING 572 Fei Xia 1/18/06. Outline Basic concepts and properties Case study.
Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.
Introduction LING 572 Fei Xia Week 1: 1/3/06. Outline Course overview Problems and methods Mathematical foundation –Probability theory –Information theory.
Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
1 Introduction LING 575 Week 1: 1/08/08. Plan for today General information Course plan HMM and n-gram tagger (recap) EM and forward-backward algorithm.
Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
Introduction to machine learning
Crash Course on Machine Learning
Final review LING572 Fei Xia Week 10: 03/11/
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
Some Key Questions about you Data Damian Gordon Brendan Tierney Brian Mac Namee.
Text Classification, Active/Interactive learning.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Source-Selection-Free Transfer Learning
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Optimal Bayes Classification
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
John Lafferty Andrew McCallum Fernando Pereira
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Course Review #2 and Project Parts 3-6 LING 572 Fei Xia 02/14/06.
CIS 335 CIS 335 Data Mining Classification Part I.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.
Perceptrons Lirong Xia.
Lecture 15: Text Classification & Naive Bayes
Data Mining Lecture 11.
Machine Learning Week 1.
Statistical NLP: Lecture 9
Revision (Part II) Ke Chen
Revision (Part II) Ke Chen
Perceptrons Lirong Xia.
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1

Outline Probability theory The classification task => Both were covered in LING570, and are therefore part of prerequisites. 2

Probability theory 3

Three types of probability Joint prob: P(x,y)= prob of x and y happening together Conditional prob: P(x|y) = prob of x given a specific value of y Marginal prob: P(x) = prob of x for all possible values of y 4

Common tricks (I): Marginal prob  joint prob 5

Common tricks (II): Chain rule 6

Common tricks (III): Bayes rule 7

Common tricks (IV): Independence assumption 8 A and B are conditionally independent given C: P(A|B,C) = P(A|C) P(A,B|C) = P(A|C) P(B|C)

Classification problem 9

Definition of classification problem Task: –C= {c 1, c 2,.., c m } is a finite set of pre-defined classes (a.k.a., labels, categories). –Given an input x, decide on its category y. Multi-label vs. single-label problem –Single-label: for each x, only one class is assigned to it. –Multi-label: a x could have multiple labels. Multi-class vs. binary classification problem –Binary: |C| = 2. –Multi-class: |C| > 2 10

Conversion to single-label binary problem Multi-label  single-label –If labels are unrelated, we can convert a multi-label problem into |C| binary problems: e.g., does x have label c 1 ? Does it have label c 2 ? … Does it have label c m ? Multi-class  binary problem –We can convert multi-class problem to several binary problems. We will discuss this in Week #6. => We will focus on single-label binary classification problem. 11

Examples of classification tasks Text classification Document filtering Language/Author/Speaker id WSD PP attachment Automatic essay grading … 12

Sequence labeling tasks Tokenization / Word segmentation POS tagging NE detection NP chunking Parsing Reference resolution …  We can use classification algorithms + beam search 13

Steps for solving a classification problem Split data into training/test/validation Data preparation Training Decoding Postprocessing Evaluation 14

The three main steps Data preparation: represent the data as feature vectors Training: A trainer takes the training data as input, and outputs a classifier. Decoding: A decoder takes a classifier and test data as input, and output classification results. 15

Data An instance: (x, y) Labeled data: y is known Unlabeled data: y is unknown Training/test data: a set of instances. 16

Data preparation: creating attribute-value table f1f1 f2f2 …fKfK Target d1d1 yes1no-1000c2c2 d2d2 d3d3 … dndn 17

Attribute-value table Each row corresponds to an instance. Each column corresponds to a feature. A feature type (a.k.a. a feature template): w -1 A feature: w -1 =book Binary feature vs. non-binary feature 18

The training stage Three types of learning –Supervised learning: the training data is labeled. –Unsupervised learning: the training data is unlabeled. –Semi-supervised learning: the training data consists of both. We will focus on supervised learning in LING572 19

The decoding stage A classifier is a function f: f(x) = {(c i, score i )}. Given the test data, a classifier “fills out” a decision matrix. d1d1 d2d2 d3d3 …. c1c … c2c … c3c3 … 20

Important tasks (for you) in LING 572 Understand various learning algorithms. Apply the algorithms to different tasks: –Convert the data into attribute-value table Define feature types Feature selection Convert an instance into a feature vector –Choose an appropriate learning algorithm. 21

Important concepts in a classification task –Instance: a (x, y) pair, y may be unknown –Labeled data, unlabeled data –Training data, test data –Feature, feature type/template –Feature vector –Attribute-value table –Trainer, classifier –Training stage, test stage 22 Summary