Presentation is loading. Please wait.

Presentation is loading. Please wait.

Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized.

Similar presentations


Presentation on theme: "Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized."— Presentation transcript:

1 Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized by Park,Sung Eun, IDS Lab., Seoul National University Presented by Park,Sung Eun,IDS Lab., Seoul National University

2 Copyright  2008 by CEBT Contents  Introduction  Contextual Shortcuts  Concept Ranking Method  Feature Space Interestingness and Relevance of a Concept  Evaluation Cross Validation Approach, Editorial Evaluation, Real World Results  Conclusion 2

3 Copyright  2008 by CEBT Introduction  Determining and ranking the key concepts in a document  Goal Given the candidate set of entities, learn a ranking function which orders the entities by their interestingness and relevance  Applications Contextual advertising systems Text summarization User centric entity detection systems – Detect entities and concepts within text – Transform those detected entities into actionable like “intelligent hyperlinks” 3

4 Copyright  2008 by CEBT Contextual Shortcut 4

5 Copyright  2008 by CEBT  A concept vector Concepts : A piece of text that refers to an abstract thought or idea. Ex) car insurance, justice Generating concept vector – Term vector : TF/IDF from documents in Yahoo! Search – Unit vector : all units found in the document Units are constructed from query logs in an iterative statistical approach using the frequencies of the distinct queries – Concept vector : the term vector and the unit vector are merged Contextual Shortcut 5

6 Copyright  2008 by CEBT Previous Concept Ranking Method  AG(TF,Unit) 1.A term appears in the term vector, but not in the unit vector – punish its term vector weight 2.A term appears in the unit vector, but not in the term vector – its unit weight 3.add this term to the concept vector with its unit weight – um its term vector and unit vector weights 6 Document Concept AG(TF,Unit) ScoreRanking President bush1.15491 Iraq war1.18332 Political parties0.61473 …

7 Copyright  2008 by CEBT Proposed Concept Ranking Method  Ranking Function : SVM(Support Vector Machine) SVM light : an open source library for ranking SVM  Interestingness : 9 Features of a concept  Relevance: pre-mined terms of the concept 7 Term 1 Term 2 Term 3 Term 4 Term 5 Term 7 Term 6 … InterestingnessRelevanceRanking Concept1I1R11 Concept2I2R22 Concept3I3R33 ……… TermsFeatures SVM light

8 Copyright  2008 by CEBT Interestingness of a concept CategoryFeaturesDetails Search Engine Query Logs Freq exact # of queries received that are exactly same as the concept Freq phrase contained # of queries that are exactly same as the concept Unit score The score in the unit vector Search Engine Result Pages Search engine phrase The number of pages returned to the concept as a query Text Based Features Concept size # of terms in the concept Number of characters # of characters in the concept Subconcepts # of subconcepts contained in the concept Taxanomy High level type If the concept exists in one of the editorially maintained lists, use it as a feature Others Wiki word count The length of the Wikipedia articles 8

9 Copyright  2008 by CEBT Relevance of a Concept in a Context  A mining approach to obtain a good relevance scoring mechanism  Use pre-mined keywords for each concepts Relevant terms of Relevance of the concept can be computed based on the co- occurrence of the pre-mined keyword. 9

10 Copyright  2008 by CEBT Relevance of a Concept in a Context  Relevant term scoring 1.Search engine snippets – Using Yahoo! Developer Network API – Treat returned snippets as a document and compute score= tf*idf – Top m=100 terms based on the score 2.Prisma query refinement tool – Prisma is a tool which assists users to augment or replace their queries by providing feedback terms by considering the top 50 documents in a large collection based on factors such as count and position of the terms, document rank, occurrence of query terms within the input phrase. – Construct single document from the concepts returned by Prisma for concept c i and compute the score based on the tf*idf values 10

11 Copyright  2008 by CEBT Relevance of a Concept in a Context  Relevant term scoring 3.Related query suggestions – Using Yahoo! Developer Network API – 300 suggestions and the query frequencies of the suggestions – Say k is the number of term appeared in suggestion lists 11 Snippet Prisma Query Suggetions

12 Copyright  2008 by CEBT Intuition of Query Suggestion and Prisma 12

13 Copyright  2008 by CEBT Evaluation  Cross Validation Approach Data – Randomly sampled news stories that were annotated by Contextual Shortcuts – The number of times these stories viewed and the number of clicks received by each concept that was detected in the stories – 870 stoires,6420 concepts of 16549 sample clicks Weighted Error Rate Where Click-through-rate=(the number of clicks) / (the number of views) 13

14 Copyright  2008 by CEBT Evaluation NDCG(Normalized discounted cumulative gain measure) – A valuable metric for those applications that require high precision at top ranks – Score for a sorted list of k concepts on document i – Where score(j)=bucketNo(CTR(j)/100), bucketNo() returns a bucket number between 0 and 1000 considering all the CTR values observed in the system in increasing order. 14

15 Copyright  2008 by CEBT Evaluation  Interestingness features 15

16 Copyright  2008 by CEBT Evaluation  Relevance score 16

17 Copyright  2008 by CEBT Evaluation  Interestingness Features and Relevance Score 17

18 Copyright  2008 by CEBT Evaluation  Editorial Evaluation 1.Processed set of documents is presented to the judges 2.A judge is asked to select a document from the pool. 3.Ask to read the document and rate each entity or concept highlighted in the document in terms of its interestingness and relevance 18

19 Copyright  2008 by CEBT Contributions  We propose to use implicit user feedback in the form of click data to determine the most interesting and relevant concepts in a context via a machine learning approach.  We describe a feature space pertinent to the interestingness of a concept, and present algorithms to identify relevance of a concept in a given context.  We evaluate the proposed techniques extensively using click data, an editorial study, and an analysis on production system. The results show significant improvements.  We provide a detailed description of a framework that enables efficient implementation of the proposed techniques in a production system. 19

20 Copyright  2008 by CEBT Discussion  No theoretical base on their feature selection assumptions. No references or base theory at all  Depending on the technology already developed in previous studies.  Huge advantage on having valuable dataset. 20

21 Q&A Thank you 21


Download ppt "Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized."

Similar presentations


Ads by Google