Presentation is loading. Please wait.

Presentation is loading. Please wait.

CIKM 20071 1 Recognition and Classification of Noun Phrases in Queries for Effective Retrieval Wei Zhang 1 Shuang Liu 2 Clement Yu 1

Similar presentations


Presentation on theme: "CIKM 20071 1 Recognition and Classification of Noun Phrases in Queries for Effective Retrieval Wei Zhang 1 Shuang Liu 2 Clement Yu 1"— Presentation transcript:

1 CIKM 20071 1 Recognition and Classification of Noun Phrases in Queries for Effective Retrieval Wei Zhang 1 Shuang Liu 2 Clement Yu 1 wzhang@cs.uic.edu shuang.liu@ask.com yu@cs.uic.edu Chaojing Sun 3 Fang Liu 4 Weiyi Meng 5 chaojing@gmail.com fangliu@microsoft.com meng@cs.binghamton.edu 1 Department of Computer Science, University of Illinois at Chicago 2 Ask.com 3 Broadcom Corporation 4 Microsoft 5 Department of Computer Science, Binghamton University

2 CIKM 20072 Motivation Our definitions of the phrases Proper noun and dictionary phrase recognition Simple and complex phrase recognition Experimental results CIKM 20072 Outline

3 CIKM 20073 Motivation Terms in a query are related semantically “John Smith” Recognize this relationship Partition the query terms to groups (phrases) Document retrieval using phrases Adding phrases into searching and ranking

4 CIKM 20074 Types of Noun Phrases Phrases that have fixed writing formats Names of Locations, people, companies, … Well defined concepts. E.g. “computer science” Freely written phrases Not formally defined but used in the real language

5 CIKM 20075 Four Types of Noun Phrases Proper Noun (PN) A noun phrase that names a specific person, place or thing. First letters of the content words are capitalized E.g. “John Smith”, “Atlantic Ocean” Dictionary Phrase (DP) A phrase that has a definition in a dictionary, excluding PN These two types may overlap “Atlantic Ocean” They can not replace each other E.g. “Lina’s Pizza”, “public transportation”

6 CIKM 20076 Four Types of Noun Phrases Simple Noun Phrase (SNP) A grammatically valid noun phrase other than PN and DP 2 words E.g. “white car”, “good hotel” Complex Noun Phrase (CNP) A grammatically valid noun phrase other than PN and DP 3 or more words May contain PN/DP/SNP E.g. “small white car”, “city public transportation”

7 CIKM 20077 Noun Phrase Recognition General procedure Recognize PN and dictionary phrases first Then simple and complex noun phrases A n-word query Check the original query Check the 2 (n-1)-term arrays … Check the (n-1) 2-term arrays Totally n*(n-1)/2 candidates E.g. “World Trade Organization” “World Trade” and “Trade Organization”

8 CIKM 20078 Noun Phrase Recognition Tools for phrase recognition Dictionaries (Wikipedia, WordNet) Large text corpus (Google for experiments) Parsers (Minipar, Collins parser) and POS tagger

9 CIKM 20079 PN and DP Recognition Wekipedia For proper nouns and dictionary phrases DP: existence of the entry page PN: content words in the first instance of the phrase in the main text should be capitalized

10 CIKM 200710 PN and DP Recognition WordNet For PN and DP recognition DP: defined in a dictionary PN: has a hypernym of city, province, country, organization, geographic area, person, syndrome, region, building, or nation.

11 CIKM 200711 PN and DP Recognition Minipar For PN recognition only (1) “PN” label in the parse tree (2) Semantic label of person, country, corpname, location, corpdesig, fname, gname, or date

12 CIKM 200712 PN and DP Recognition List of first names, last names and rules First_initial last_name First_initial mid_initial last_name First_name middle_initial last_name First_name last_name

13 CIKM 200713 PN and DP Recognition Text corpus For less well-known PNs Three instances, first letters of the content words capitalized Not a sub-phrase of a longer PN “if you choose windows by Vista Window Company, …” “if you choose windows by Super Vista Window Company, …”

14 CIKM 200714 PN and DP Recognition Overlapped phrases Search all words together Count the instances of each phrase in the returned documents e.g. “Native American Casino” “Native American” and “American Casino” Compare ( Count(“Native American”), Count(“American Casino”) )

15 CIKM 200715 SNP and CNP Recognition Only check the phrase candidates that are not sub-phrases of a recognized PN/DP do not overlap with a recognized PN/DP

16 CIKM 200716 SNP and CNP Recognition Implicit phrases “and” / “or” “main and contributing factor”  “main factor” “contributing factor”

17 CIKM 200717 SNP and CNP Recognition Head word replacement Replace the whole phrase by its head word Collins parser Label the noun phrases NP/sedan(head word) Compact/JJBest/JJSSedan/NN NP/sedan(head word)

18 CIKM 200718 SNP and CNP Recognition Phrase verification To verify that a phrase is used in the world For CNP: it also means to find all the words in a text window “Colin Farrell wallpaper” and “wallpaper of Colin Farrell”

19 CIKM 200719 SNP and CNP Recognition Overlapped phrases Two potential SNP/CNP: Search all words, compare the numbers of the instances. “sony dvd handyam”  “sony dvd” and “dvd handycam”

20 CIKM 200720 Document Retrieval Using Phrases Search a phrase in a document Exact match: PN/DP Search all words in a text window: SNP/CNP

21 CIKM 200721 Document Retrieval Using Phrases Sim(Query, Doc) = Phrase similarity Sim_P(P_i) = idf(P_i) Sim_P = sum ( sim_P(P_i) ) Term similarity Okapi/BM-25 similarity Document ranking D1 is ranked higher than D2, if (Sim_P1>Sim_P2) OR (P1=P2 AND T1>T2)

22 CIKM 200722 Experimental Results Phrase recognition experiments Tuned by using TREC queries

23 CIKM 200723 Experimental Results Phrase recognition experiments Tested by using Web queries

24 CIKM 200724 Experimental Results Performance of individual tools Wikipedia is better than WordNet and Minipar Need for a complete dictionary Collins parser alone is not enough for SNP/CNP recognition Lack of real world usage information

25 CIKM 200725 Experimental Results Document retrieval experiments Ad-hoc TREC 6, 7 and 8, robust TREC 12, 13 and 14 1.Retrieval without using phrases 2.Using Wikipedia for PN/DP and just collins parser for SNP/CNP 3.Using phrases from the full recognition algorithm 33% MAP increase and 44.27% GMAP increase from 1 to 2 5.8% MAP increase and 12.58% GMAP increase from 2 to 3

26 CIKM 200726 Conclusions Our algorithm can effectively recognize the four types of phrases in the short Web queries The recognized phrases help improve the retrieval effectiveness

27 CIKM 200727 Questions? wzhang@cs.uic.edu http://www.cs.uic.edu/~wzhang/


Download ppt "CIKM 20071 1 Recognition and Classification of Noun Phrases in Queries for Effective Retrieval Wei Zhang 1 Shuang Liu 2 Clement Yu 1"

Similar presentations


Ads by Google