CIKM 20071 1 Recognition and Classification of Noun Phrases in Queries for Effective Retrieval Wei Zhang 1 Shuang Liu 2 Clement Yu 1

Slides:



Advertisements
Similar presentations
© Paradigm Publishing, Inc Word 2010 Level 2 Unit 1Formatting and Customizing Documents Chapter 2Proofing Documents.
Advertisements

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.
QA and Language Modeling (and Some Challenges) Eduard Hovy Information Sciences Institute University of Southern California.
Creating a Similarity Graph from WordNet
CHAITALI GUPTA, RAJDEEP BHOWMIK, MICHAEL R. HEAD, MADHUSUDHAN GOVINDARAJU, WEIYI MENG PRESENTED BY: SIDDHARTH PALANISWAMI A Query-based System for Automatic.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Search Engines and Information Retrieval
1 Block-based Web Search Deng Cai *1, Shipeng Yu *2, Ji-Rong Wen * and Wei-Ying Ma * * Microsoft Research Asia 1 Tsinghua University 2 University of Munich.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Federated Search of Text Search Engines in Uncooperative Environments Luo Si Language Technology Institute School of Computer Science Carnegie Mellon University.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Overview of Search Engines
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Mining and Summarizing Customer Reviews
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Search Engines and Information Retrieval Chapter 1.
A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
Lecture 6: The Ultimate Authorship Problem: Verification for Short Docs Moshe Koppel and Yaron Winter.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A semantic approach for question classification using.
Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.
IL Step 2: Searching for Information Information Literacy 1.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
Word Sense Disambiguation in Queries Shaung Liu, Clement Yu, Weiyi Meng.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
A Language Independent Method for Question Classification COLING 2004.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Search Engine Architecture
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
VLDB Demo WISE-Integrator: A System for Extracting and Integrating Complex Web Search Interfaces of the Deep Web Hai He, Weiyi Meng, Clement Yu, Zonghuan.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
UIC at TREC 2006: Genomics Track Wei Zhou, Clement T. Yu University of Illinois at Chicago Nov. 16, 2006.
MedKAT Medical Knowledge Analysis Tool December 2009.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Internet Research – Illustrated, Fourth Edition Unit B.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Unit B Constructing Complex Searches Internet Research Third Edition.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
IL Step 2: Searching for Information
Introduction to Information Retrieval
Information Retrieval and Web Design
Presentation transcript:

CIKM Recognition and Classification of Noun Phrases in Queries for Effective Retrieval Wei Zhang 1 Shuang Liu 2 Clement Yu 1 Chaojing Sun 3 Fang Liu 4 Weiyi Meng 5 1 Department of Computer Science, University of Illinois at Chicago 2 Ask.com 3 Broadcom Corporation 4 Microsoft 5 Department of Computer Science, Binghamton University

CIKM Motivation Our definitions of the phrases Proper noun and dictionary phrase recognition Simple and complex phrase recognition Experimental results CIKM Outline

CIKM Motivation Terms in a query are related semantically “John Smith” Recognize this relationship Partition the query terms to groups (phrases) Document retrieval using phrases Adding phrases into searching and ranking

CIKM Types of Noun Phrases Phrases that have fixed writing formats Names of Locations, people, companies, … Well defined concepts. E.g. “computer science” Freely written phrases Not formally defined but used in the real language

CIKM Four Types of Noun Phrases Proper Noun (PN) A noun phrase that names a specific person, place or thing. First letters of the content words are capitalized E.g. “John Smith”, “Atlantic Ocean” Dictionary Phrase (DP) A phrase that has a definition in a dictionary, excluding PN These two types may overlap “Atlantic Ocean” They can not replace each other E.g. “Lina’s Pizza”, “public transportation”

CIKM Four Types of Noun Phrases Simple Noun Phrase (SNP) A grammatically valid noun phrase other than PN and DP 2 words E.g. “white car”, “good hotel” Complex Noun Phrase (CNP) A grammatically valid noun phrase other than PN and DP 3 or more words May contain PN/DP/SNP E.g. “small white car”, “city public transportation”

CIKM Noun Phrase Recognition General procedure Recognize PN and dictionary phrases first Then simple and complex noun phrases A n-word query Check the original query Check the 2 (n-1)-term arrays … Check the (n-1) 2-term arrays Totally n*(n-1)/2 candidates E.g. “World Trade Organization” “World Trade” and “Trade Organization”

CIKM Noun Phrase Recognition Tools for phrase recognition Dictionaries (Wikipedia, WordNet) Large text corpus (Google for experiments) Parsers (Minipar, Collins parser) and POS tagger

CIKM PN and DP Recognition Wekipedia For proper nouns and dictionary phrases DP: existence of the entry page PN: content words in the first instance of the phrase in the main text should be capitalized

CIKM PN and DP Recognition WordNet For PN and DP recognition DP: defined in a dictionary PN: has a hypernym of city, province, country, organization, geographic area, person, syndrome, region, building, or nation.

CIKM PN and DP Recognition Minipar For PN recognition only (1) “PN” label in the parse tree (2) Semantic label of person, country, corpname, location, corpdesig, fname, gname, or date

CIKM PN and DP Recognition List of first names, last names and rules First_initial last_name First_initial mid_initial last_name First_name middle_initial last_name First_name last_name

CIKM PN and DP Recognition Text corpus For less well-known PNs Three instances, first letters of the content words capitalized Not a sub-phrase of a longer PN “if you choose windows by Vista Window Company, …” “if you choose windows by Super Vista Window Company, …”

CIKM PN and DP Recognition Overlapped phrases Search all words together Count the instances of each phrase in the returned documents e.g. “Native American Casino” “Native American” and “American Casino” Compare ( Count(“Native American”), Count(“American Casino”) )

CIKM SNP and CNP Recognition Only check the phrase candidates that are not sub-phrases of a recognized PN/DP do not overlap with a recognized PN/DP

CIKM SNP and CNP Recognition Implicit phrases “and” / “or” “main and contributing factor”  “main factor” “contributing factor”

CIKM SNP and CNP Recognition Head word replacement Replace the whole phrase by its head word Collins parser Label the noun phrases NP/sedan(head word) Compact/JJBest/JJSSedan/NN NP/sedan(head word)

CIKM SNP and CNP Recognition Phrase verification To verify that a phrase is used in the world For CNP: it also means to find all the words in a text window “Colin Farrell wallpaper” and “wallpaper of Colin Farrell”

CIKM SNP and CNP Recognition Overlapped phrases Two potential SNP/CNP: Search all words, compare the numbers of the instances. “sony dvd handyam”  “sony dvd” and “dvd handycam”

CIKM Document Retrieval Using Phrases Search a phrase in a document Exact match: PN/DP Search all words in a text window: SNP/CNP

CIKM Document Retrieval Using Phrases Sim(Query, Doc) = Phrase similarity Sim_P(P_i) = idf(P_i) Sim_P = sum ( sim_P(P_i) ) Term similarity Okapi/BM-25 similarity Document ranking D1 is ranked higher than D2, if (Sim_P1>Sim_P2) OR (P1=P2 AND T1>T2)

CIKM Experimental Results Phrase recognition experiments Tuned by using TREC queries

CIKM Experimental Results Phrase recognition experiments Tested by using Web queries

CIKM Experimental Results Performance of individual tools Wikipedia is better than WordNet and Minipar Need for a complete dictionary Collins parser alone is not enough for SNP/CNP recognition Lack of real world usage information

CIKM Experimental Results Document retrieval experiments Ad-hoc TREC 6, 7 and 8, robust TREC 12, 13 and 14 1.Retrieval without using phrases 2.Using Wikipedia for PN/DP and just collins parser for SNP/CNP 3.Using phrases from the full recognition algorithm 33% MAP increase and 44.27% GMAP increase from 1 to 2 5.8% MAP increase and 12.58% GMAP increase from 2 to 3

CIKM Conclusions Our algorithm can effectively recognize the four types of phrases in the short Web queries The recognized phrases help improve the retrieval effectiveness

CIKM Questions?