CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.

Slides:



Advertisements
Similar presentations
Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.
Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 13.
A new Machine Learning algorithm for Neoposy: coining new Parts of Speech Eric Atwell Computer Vision and Language group School of Computing University.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Presented by Zeehasham Rasheed
1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Introduction to Machine Learning Approach Lecture 5.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Mining and Summarizing Customer Reviews
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Active Learning for Class Imbalance Problem
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.
Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Exploiting Flickr Tags and Groups for Finding Landmark Photos short paper at ECIR 2009 Rabeeh Abbasi, Sergey Chernov, Wolfgang Nejdl, Raluca Paiu, and.
C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.
Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.
Chapter 23: Probabilistic Language Models April 13, 2004.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Hierarchical Clustering for POS Tagging of the Indonesian Language Derry Tanti Wijaya and Stéphane Bressan.
Machine Learning for Spam Filtering 1 Sai Koushik Haddunoori.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Post-Ranking query suggestion by diversifying search Chao Wang.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
5/6/04Biolink1 Integrated Annotation for Biomedical IE Mining the Bibliome: Information Extraction from the Biomedical Literature NSF ITR grant EIA
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Web Intelligence and Intelligent Agent Technology 2008.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
iSRD Spam Review Detection with Imbalanced Data Distributions
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces

Implicit or Explicit Goals  While some goals contained in search queries might be very explicit, other queries might contain more implicit goals.  In terms of intentional explicitness, the query “car miami” differs significantly from the query “buy a used car in Miami”.

Benefit  Different degrees of intentional explicitness in search queries appears relevant on at least two different levels:  On a theoretical level  increase our knowledge about the levels of abstractions users employ when searching  for example, the way users refine or generalize goals during search

Benefit  On a practical level-  understanding different degrees of intentional explicitness in search queries could improve the ability of search engine vendors to better tailor their search results.

Challenges  First and foremost, all goals contained in search query logs are of hypothetical nature in the sense that verification is extremely hard.  Second, query logs represent huge text corpora in terms of size, which renders manual elicitation of goals by experts practically impossible.

Challenges  Third, the length of search queries is significantly shorter, the words used in search queries do not necessarily appear in lexica, and the text is not necessarily represented as natural language text but in some artificial language.

Goal  (1) increase our understanding about the notion of different degrees of explicitness in intentional artifacts theoretically.  (2) explore related challenges, potentials, and implications empirically.

DEGREES OF EXPLICITNESS  An explicit intentional query is a query that can be related to a specific goal in a recognizable, unambiguous way.  Examples of explicit intentional queries, i.e. queries that have more precise goals, would be “buy a car”, “maximize adsense revenue” or “how to get revenge on neighbor within limits of law”.

DEGREES OF EXPLICITNESS  We define an implicit intentional query as a query where it is difficult or extremely hard to elicit some specific goal from the intentional artifact.  Examples include blank queries, or queries such as “car” or “travel”,

Reason to distinguish between two type  explicit (“better”) intentional queries could be used to disambiguate or refine implicit intentional queries.  Second, some users organize their search in a way that can be understood as a traversal of goal graphs including iterative goal refinement and generalization.  Third, our own recent research has indicated that only 1.69% to 3.01% of queries have a high degree of intentional explicitness, While this percentage is rather small, we do not know.

Experiment  we distinguish between explicit and implicit intentional queries based on the following arbitrary criteria  (A) whether a query contains at least one verb.  (B) whether the goal elicited from the intentional artifact conforms to our definition of a goal.

WHAT ARE GOALS?  Condition or state of affairs in the world that some agent would like to achieve or avoid.

Experiment  Our experimental classification approach consists of two parts:  Part-of-speech (POS) tagging.  Supervised learning of syntactical goal features.

Part of Speech Tagging  Experimental setup incorporated a fast and reasonably accurate part-of-speech tagger trained on a sample of the Penn Treebank corpus.  We have focused on tagging queries with query length > 2 only.

A sample of Penn Treebank tags

Supervised Learning of Goal Features  We use part-of-speech n-grams instead of word n-grams as features.  We introduced markers ($ $) at the beginning and the end of a query to take the query.

Supervised Learning of Goal Features  The query "buying/VBG a/DT car/NN" would be composed of the following trigrams:  $ $ VBG, $ VBG DT, VBG DT NN, DT NN $, NN $ $

Supervised Learning of Goal Features  To obtain a training set, we drew a uniform random sample from the set of queries which contain at least one verb.  Two of the authors labeled instances in the sample consensually.  This resulted in a training set consisting of 98 instances, 59 positives and 39 negatives.

Supervised Learning of Goal Features  We trained a naive bayesian classifier on the feature vectors described above using 10-fold cross-validation.  In order to increase the performance of our classifier we applied a chi-squared feature selection algorithm.

Most predictive features based on chi-squared feature selection

Statistical overview of the condensed dataset (based on a sample containing 500 random queries from this set)

Examples of correctly classified queries

Examples of incorrectly classified queries

Verb frequency histogram  The top 10stemmed verbs in the condensed dataset are make, get, buy, wed, is, find, live, play, use, write.

CONCLUSIONS  This paper introduced a novel perspective on analyzing search query logs: different degrees of intentional explicitness.  Help search engine vendors to better assess the level of service they can or should provide for different user queries.

Thank You Q & A