1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.

Slides:



Advertisements
Similar presentations
1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19,
Advertisements

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
DISTRIBUTIONAL WORD SIMILARITY David Kauchak CS159 Fall 2014.
The Google Similarity Distance  We’ve been talking about Natural Language parsing  Understanding the meaning in a sentence requires knowing relationships.
13 th September 2007 UK e-Science All Hands Meeting Text Mining Services to Support e-Research Brian Rea and Sophia Ananiadou National Centre for Text.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A
Optimizing Text Classification Mark Trenorden Supervisor: Geoff Webb.
CIS392Semester Projects1 CIS392 Text Processing, Retrieval, and Mining Overview of Semester Projects.
1 CS 502: Computing Methods for Digital Libraries Lecture 12 Information Retrieval II.
Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization Department of Systems Engineering & Engineering Management The.
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.
Albert Gatt Corpora and Statistical Methods. Probability distributions Part 2.
Information Retrieval in Practice
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Bayesian Networks. Male brain wiring Female brain wiring.
Poisson Random Variable Provides model for data that represent the number of occurrences of a specified event in a given unit of time X represents the.
Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Feature Selection: Why?
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Uses of Statistics: 1)Descriptive : To describe or summarize a collection of data points The data set in hand = the population of interest 2)Inferential.
Chapter 12 Probability. Chapter 12 The probability of an occurrence is written as P(A) and is equal to.
Processing of large document collections Part 3 (Evaluation of text classifiers, term selection) Helena Ahonen-Myka Spring 2006.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 7. Topic Extraction.
Summary -1 Chapters 2-6 of DeCoursey. Basic Probability (Chapter 2, W.J.Decoursey, 2003) Objectives: -Define probability and its relationship to relative.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
1 CS 430: Information Discovery Lecture 25 Cluster Analysis 2 Thesaurus Construction.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei Dept. of Computer Science, Princeton University, USA CVPR ImageNet1.
Information Theory Metrics Giancarlo Schrementi. Expected Value Example: Die Roll (1/6)*1+(1/6)*2+(1/6)*3+(1/6)*4+(1/6)*5+( 1/6)*6 = 3.5 The equation.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 5. Document Representation and Information Retrieval.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Chapter 23: Probabilistic Language Models April 13, 2004.
Two Main Uses of Statistics: 1)Descriptive : To describe or summarize a collection of data points The data set in hand = the population of interest 2)Inferential.
Frequent-Itemset Mining. Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small.
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
CSC 594 Topics in AI – Text Mining and Analytics
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Automatic Labeling of Multinomial Topic Models
+ Chapter 5 Overview 5.1 Introducing Probability 5.2 Combining Events 5.3 Conditional Probability 5.4 Counting Methods 1.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 8. Text Clustering.
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Using Semantic Relations to Improve Information Retrieval
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Plan for Today’s Lecture(s)
Semantic Processing with Context Analysis
Information Retrieval: Models and Methods
Multimodal Learning with Deep Boltzmann Machines
Terminology problems in literature mining and NLP
and Knowledge Graphs for Query Expansion Saeid Balaneshinkordan
CSC 594 Topics in AI – Natural Language Processing
Vector-Space (Distributional) Lexical Semantics
Multimedia Information Retrieval
PURE Learning Plan Richard Lee, James Chen,.
Word embeddings (continued)
Presentation transcript:

1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association

What is Word Association? Word association is a relation that exists between two words. There are two types of relations: Paradigmatic and Syntagmatic. Paradigmatic: A & B have paradigmatic relation if they can be substituted for each other (i.e., A & B are in the same class) –e.g. “cat” and “dog”; “Monday” and “Tuesday” Syntagmatic: A & B have syntagmatic relation if they can be combined with each other (i.e., A & B are related semantically) –e.g. “cat” and “scratch”; “car” and “drive” These two basic and complementary relations can be generalized to describe relations of any items in a language Coursera “Text Mining and Analytics”, ChengXiang Zhai 2

3 Why Mine Word Associations? They are useful for improving accuracy of many NLP tasks –POS tagging, parsing, entity recognition, acronym expansion –Grammar learning They are directly useful for many applications in text retrieval and mining –Text retrieval (e.g., use word associations to suggest a variation of a query) –Automatic construction of topic map for browsing: words as nodes and associations as edges –Compare and summarize opinions (e.g., what words are most strongly associated with “battery” in positive and negative reviews about iPhone 6, respectively?)

Word Context Coursera “Text Mining and Analytics”, ChengXiang Zhai 4

Word Co-occurrence Coursera “Text Mining and Analytics”, ChengXiang Zhai 5

Mining Word Associations Paradigmatic –Represent each word by its context –Compute context similarity –Words with high context similarity likely have paradigmatic relation Syntagmatic –Count how many times two words occur together in a context (e.g., sentence or paragraph) –Compare their co-occurrences with their individual occurrences –Words with high co-occurrences but relatively low individual occurrences likely have syntagmatic relation Paradigmatically related words tend to have syntagmatic relation with the same word  joint discovery of the two relations These ideas can be implemented in many different ways! Coursera “Text Mining and Analytics”, ChengXiang Zhai 6

Word Context as “Pseudo Document” Coursera “Text Mining and Analytics”, ChengXiang Zhai 7

Computing Similarity of Word Context Coursera “Text Mining and Analytics”, ChengXiang Zhai 8

9

Syntagmatic Relation – Word Collocation Syntagmatic relation is word co-occurrence – called Collocation –If two words occur together in a context more often than chance, they are in the syntagmatic relation (i.e., related words). Coursera “Text Mining and Analytics”, ChengXiang Zhai 10

Word Probability Word probability – how likely would a given word appear in a text/context? Coursera “Text Mining and Analytics”, ChengXiang Zhai 11

Binomial Distribution Word (occurrence) probability is modeled by Binomial Distribution. Coursera “Text Mining and Analytics”, ChengXiang Zhai 12

Entropy as a Measure of Randomness Entropy is a measure in Information Theory, and indicates purity or (un)even/skewed distribution -- a large entropy means the distribution is even/less skewed. Entropy takes on a value [0, 1] (between 0 and 1 inclusive). Entropy of a collection S with respect to the target attribute which takes on c number of values is calculated as: This is the average number of bits required to encode an instance in the dataset. For a boolean classification, the entropy function yields: 13

Entropy for Word Probability Coursera “Text Mining and Analytics”, ChengXiang Zhai 14

Mutual Information (MI) as a Measure of Word Collocation Mutual Information is a concept in probability theory, and indicates the two random variables' mutual dependence – or the reduction of entropy. How much reduction in the entropy of X can we obtain by knowing Y? (where reduction give more predictability) 15

Mutual Information (MI) and Word Collocation 16 Coursera “Text Mining and Analytics”, ChengXiang Zhai 16

Probabilities in MI 17 Coursera “Text Mining and Analytics”, ChengXiang Zhai 17

Estimation of Word Probability 18 Coursera “Text Mining and Analytics”, ChengXiang Zhai 18

Point-wise Mutual Information 19 Point-wise Mutual Information (PMI) is often used in place of MI. PMI is a specific event of the two random variables.

Other Word Collocation Measures 20

Conditional Counts: Concept Linking diabetes (63/63) Concept linked term: a term that co-occurs with a centered term  In this diagram, the centered term is diabetes, which occurred in 63 documents. The term insulin (and its stemmed variations) occurred in 58 documents, 14 of which also contained diabetes. +insulin (14/58) Centered term: a term that is chosen to investigate

continued... The term diabetes occurs in 63 documents.

The term insulin and its variants occur in 58 documents, and 14 of those documents also contain the term diabetes.

Terms that are primary associates of insulin are secondary associates of diabetes.