Semi-Supervised Natural Language Learning Reading Group I set up a site at: ervised/

Slides:



Advertisements
Similar presentations
Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”
Advertisements

Co Training Presented by: Shankar B S DMML Lab
Albert Gatt Corpora and Statistical Methods Lecture 13.
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
JavaConLib GSLT: Java Development for HLT Leif Grönqvist – 11. June :30.
Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Co-training Internal and External Extraction Models By Thomas Packer.
Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 23, 2011.
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Spatial Semi- supervised Image Classification Stuart Ness G07 - Csci 8701 Final Project 1.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Analysis of Semi-supervised Learning with the Yarowsky Algorithm
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
Distributed Representations of Sentences and Documents
CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.
Bootstrapping Goals: –Utilize a minimal amount of (initial) supervision –Obtain learning from many unlabeled examples (vs. selective sampling) General.
Semi-Supervised Learning
Corpus-Based Approaches to Word Sense Disambiguation
Guiding Motif Discovery by Iterative Pattern Refinement Zhiping Wang, Mehmet Dalkilic, Sun Kim School of Informatics, Indiana University.
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Oh-Woog Kwon KLE Lab. CSE POSTECH.
Word Sense Disambiguation (WSD)
Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Machine Learning.
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
Lexical Semantics & Word Sense Disambiguation CMSC Natural Language Processing May 15, 2003.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Word Sense Disambiguation Kyung-Hee Sung Foundations of Statistical NLP Chapter 7.
Part 5. Minimally Supervised Methods for Word Sense Disambiguation.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
1 Bootstrapping without the Boot Jason Eisner Damianos Karakos HLT-EMNLP, October 2005.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.
Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Analysis of Bootstrapping Algorithms Seminar of Machine Learning for Text Mining UPC, 18/11/2004 Mihai Surdeanu.
Machine Learning Queens College Lecture 7: Clustering.
Learning Extraction Patterns for Subjective Expressions 2007/10/09 DataMining Lab 안민영.
Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”
Semi-Supervised Clustering
Advanced data mining with TagHelper and Weka
Restricted Boltzmann Machines for Classification
CSC 594 Topics in AI – Natural Language Processing
Lecture 21 Computational Lexical Semantics
Statistical NLP: Lecture 9
Finding Functionally Significant Structural Motifs in Proteins
Statistical Learning Dong Liu Dept. EEIS, USTC.
Unsupervised Learning: Clustering
Perceptron Learning Rule
Perceptron Learning Rule
Perceptron Learning Rule
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Semi-Supervised Natural Language Learning Reading Group I set up a site at: ervised/ ervised/ Cover other applications of semi- supervised learning? Volunteers? Every week or bi-weekly? Time change? 1pm? Noon?

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Author: David Yarowsky (1995) Presented by: Andy Carlson

Word Sense Disambiguation Determining what sense of a word is meant in a given sentence “Toyota is considering opening a plant in Detroit.” “The banana plant is grown all over the tropics for its fruit.” Different from sense induction– we assume we already know distinct senses

Using unlabeled data Two properties of language let us use unlabeled data: One sense per collocation –Nearby words provide strong and consistent clues One sense per discourse –With a document, the sense of a word is highly consistent We can base an iterative bootstrapping algorithm on these two properties

One sense per discourse How accurate? How frequently does it apply?

Decision Lists List of rules of the form “collocation => sense” Example: life (within 2-10 words) => biological sense of plant Rules are ordered by log-likelihood ratio

The algorithm – step 1 Find all occurrences of the given polysemous word We follow examples for the word plant

Step 2 – Initial Labeling For each sense of the word, identify a small number of training examples Strategies: dictionary words, human- labelling of most frequent collocates, or human-chosen collocates Example: the words life and manufacturing are used as seed collocations

Labeled as ‘living’ plant

Unlabeled examples

Labeled as ‘factory’ plant

Sample initial state

Step 3a Train the decision list based on the current labeling of the state space

Step 3b Apply learned classifier to all examples

Step 3c Optionally, apply the one-sense-per- discourse constraint

Step 3c

After steps 3b and 3c

Step 3d Repeat step 3 iteratively Details – grow window size for collocations, and randomly perturb the class inclusion threshold

Step 4 Stop. The algorithm converges to a stable residual set.

Sample final state

Final decision list

Results