Word classes Chris Brew The Ohio State University.

Slides:

Advertisements

Similar presentations

1/32 Assignments Basic idea is to choose a topic of your own, or to take a study found in the literature Report is in two parts –Description of problem.

Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 13.

Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.

Text Similarity David Kauchak CS457 Fall 2011.

Unsupervised learning

Deep Learning in NLP Word representation and how to use it for Parsing

Introduction to Bioinformatics

Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.

CS347 Lecture 8 May 7, 2001 ©Prabhakar Raghavan. Today’s topic Clustering documents.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.

Dimensionality Reduction

University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.

Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

CMSC724: Database Management Systems Instructor: Amol Deshpande

Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.

CSE182-L17 Clustering Population Genetics: Basics.

Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.

Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.

Introduction to Hierarchical Clustering Analysis Pengyu Hong 09/16/2005.

1 Cluster Analysis EPP 245 Statistical Analysis of Laboratory Data.

CLUSTERING Eitan Lifshits Big Data Processing Seminar Prof. Amir Averbuch Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeffery.

Word sense induction using continuous vector space models

Clustering Vertices of 3D Animated Meshes

1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.

An Information Theoretic Approach to Bilingual Word Clustering Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.

1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Image Segmentation by Clustering using Moments by, Dhiraj Sakumalla.

Advanced Statistical Methods for Research Math 736/836

CSE 185 Introduction to Computer Vision Pattern Recognition.

1 Statistical NLP: Lecture 10 Lexical Acquisition.

Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,

Introductions and Conclusions. Introductions Do begin your paper with: ◦A quotation ◦A surprising statement ◦A question ◦An anecdote ◦A definition Do.

Victor Lee.  What are Social Networks?  Role and Position Analysis  Equivalence Models for Roles  Block Modelling.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

Adding Vectors Vectors are ‘magnitudes’(ie: values) with a direction

Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining

1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.

2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.

A Comparative Evaluation of Three Skin Color Detection Approaches Dennis Jensch, Daniel Mohr, Clausthal University Gabriel Zachmann, University of Bremen.

1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.

Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.

Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”

Lucene. Lucene A open source set of Java Classses ◦ Search Engine/Document Classifier/Indexer 

BIRCH: An Efficient Data Clustering Method for Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny University of Wisconsin-Maciison Presented.

1 A classification approach for structure discovery in search spaces of combinatorial optimization problems Daniel Porumbel 1, 2, *, Jin Kao Hao 2, Pascale.

Using Semantic Relatedness for Word Sense Disambiguation

Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia.

2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.

Further vectors. Vector line equation in three dimensions.

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.

Community detection via random walk Draft slides.

Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.

6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.

1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.

Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.

Kernel nearest means Usman Roshan. Feature space transformation Let Φ(x) be a feature space transformation. For example if we are in a two- dimensional.

Classification Categorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects.

Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.

Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,

Spatial Data Management

The PoP Introduction Humanities 8.

Statistical NLP: Lecture 9

Statistical NLP : Lecture 9 Word Sense Disambiguation

Presentation transcript:

Word classes Chris Brew The Ohio State University

Word senses  The problem is that many words have several distinct meanings. Bank “the ground bordering a river” Bank “an establishment for custody of money”  In word-sense disambiguation we try to find out which meaning goes with which instance  Perhaps surprisingly, this idea of word sense is seriously problematic.

A more realistic case  Slide (n): the slide of a trombone a childrens’ toy The act of sliding A landslide (and its metaphorical uses) A loss in stock value A kind of musical grace note A specific kind of shot in Curling A head-first slide into third base A fracture in a lode resulting in the dislocation or displacement of a portion of it.

Dictionary definitions  One could just say, each dictionary definition is a sense, but if we also want intuitions, we may have to compromise  Even if we do this, which dictionary to use?

Word classes  Two potentially conflicting notions Use word classes to predict next word Use word classes to capture semantic commonalities  If we use distributional statistics to build classes, what will they be like?

Distributional clustering Define the properties of a word that one cares about, and give them numerical values. Pull them together into a vector Viewing the vector as a point in space, cluster the words to form classes

Dimensions of variation  What goes into the vector The most important influence  How one measures distance between vectors Options include cosine, KL-divergence, information radius  Which algorithm to use Exhaustive enumeration of all potential clusters is way too costly, heuristics are needed.

Things to cluster by  Next word (Brown et al)  Syntactic relations (Pereira,Tishby)  Parallel corpora (Brown et al, Gale et al)  Words in window

Distance measures  Euclidean distance  Cosine distance (avoids over-dependence on length)

Algorithms  Top-down tree construction (McMahon and Smith)  Bottom-up tree construction (Brown et al.) Guided by loss of MI  Classical clustering algorithms K-means Hierarchical clustering Ward’s method

Where to learn more  M&S ch 7 (v. good on background  Charniak ch 9 and 10 (v. good on algorithms)  Manual for the R statistics system, especially the mva module  Schulte im Walde’s thesis, our joint papers