Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09.

Slides:



Advertisements
Similar presentations
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
Advertisements

DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
Introduction to Information Retrieval
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
ONLINE EXPANSION OF RARE QUERIES FOR SPONSORED SEARCH attack Chih-Hung Wu.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Iterative Set Expansion of Named Entities using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
Chapter 5: Information Retrieval and Web Search
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Date: 2012/3/5 Source: Marcus Fontouraet. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou 1 Efficiently encoding term co-occurrences in inverted.
Lecture 4 Title: Search Engines By: Mr Hashem Alaidaros MKT 445.
Chapter 6: Information Retrieval and Web Search
Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
The Business Model of Google MBAA 609 R. Nakatsu.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Improving Search Engines Using Human Computation Games (CIKM09) Date: 2011/3/28 Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting 1.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
User Modeling for Personal Assistant
Improving Search Relevance for Short Queries in Community Question Answering Date: 2014/09/25 Author : Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Information Retrieval in Practice
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Web Information retrieval (Web IR)
Chapter 5: Information Retrieval and Web Search
Information Retrieval and Web Design
Connecting the Dots Between News Article
Presentation transcript:

Advisor: Koh Jia-Ling Nonhlanhla Shongwe EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Preview  Introduction  AdSearch  Bid phrase clustering  Index structure for efficient ad search  Query processing  Experimental evaluation  Conclusion

Introduction  Web has become an important venue for advertising e.g Google, Yahoo  Mainly two kinds of advertising channels  Contextual advertising  Sponsored advertising  Ranking: derived from  relevance to the user query  page content

Introduction cont’s  Ad’s are characterized by bid phrases  keywords the advertisers choose for their ads  Syntactic approaches suffer low recallrecall  Example  Query: “job training”  Ad: career college  Ad does not have a syntactic match and is not proposed

Introduction cont’s  The problem is even worse because  Shorter lengths of ads  Sparsity of the bid phrases  Propose an efficient adsearch solution  Tackle the issues with query expansion

AdSearch Overview

AdSearch cont’s  Bid phrase clustering  Bipartite Graph Construction for Bid Phrase and Ads  Agglomerative Iterative Clustering

Bipartite Graph Construction for Bid Phrase and Ads A, B, C Ad0, Ad1, Ad2, Ad3, Ad4 1. B = 2. A = 3. G = v ba, v bb, v bc 4. G = v a0, v a1, v a2, v a3, v a4 Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4

Agglomerative Iterative Clustering  Jaccard Similarity Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 (A,B) = 1/4 (B,C) = 2/4

Agglomerative Iterative Clustering cont’s Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 A, B, C Ad0, Ad1, Ad2, Ad3, Ad4 Bid-phrases Ads

Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 A, B, C Ad0, Ad1, Ad2, Ad3, Ad4 Bid-phrases (A, B) = 0.25 (A, C) = 0.25 (B, C) = 0.5 Bipartite graph Ads Ad0 = A, Ad1 = B, Ad2 = B, C Ad3 = B, A, C Ad4 = C Ad0, Ad1 = 0 Ad0, Ad2 = 0 Ad0, Ad3 = 0.33 Ad0, Ad4 = 0 Ad1, Ad2 = 0.5 Ad1, Ad3 = 0.33 Ad1, Ad4 = 0 Ad2, Ad3 = 0.66 Ad2, Ad4 =0.5 Ad3, Ad4 =0.33 Merge: Ad2, Ad3 Ad2, Ad4 Ad1, Ad2 Ad0, Ad3 Merge B to C Then A A B, C Ad0 Ad1, Ad4 Ad2, Ad3

AdSearch cont’s Index structure for efficient adsearch  Mapping clusters of Bid Phrases to Index Terms  Block-based Index Structure  Dictionaries

Mapping clusters of Bid Phrases to Index Terms Clusters B A C D E

Block-based Index Structure 3 inverted lists Contains: Index =bid phrase List = ad 1 inverted list Contains: Index =3 bid phrases List = ad and bid phrase Query =B

Block-based Index Structure cont’s  Advantages over the traditional method  Similar bit phrases and their corresponding ads are placed together  Merge operations become fewer or even can be avoided  Expanding phrase B with phrase A and C, in the traditional method is not efficient.

Dictionaries  Dictionary D  used to record the mapping  Bid phrase to its corresponding artificial words  Locate corresponding block to a bid phrase Bid phrase artificial words (path) A6:0 B6_5:1 C6_5:2

Cluster path Number of distinct ads Dictionaries cont’s  Dictionary C (counter dictionary)  used to record number of distinct Ads per cluster Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 Cluster path Distinct ads 6|Ad0, Ad3|=2 6_5|Ad1, Ad2, Ad3, Ad4| = 4 (6, 2) (6_5, 4)

AdSearch cont’s Query processing  Finding Related Bid phrases with Corresponding Ads  Ranking Top-k Relevant Ads

Finding Related Bid phrases with Corresponding Ads  The process to find related bid phrases  Input: user queries  Look up the dictionary D to get corresponding artificial words  Find minimum clusters that contain enough ads Bid phrase artificial words (path) A6:0 B6_5:1 C6_5:2 Query: ABD Cluster path Distinct ads 6|Ad0, Ad3|=2 6_5|Ad1, Ad2, Ad3, Ad4| = 4 e.g. Top 2 ads M=1.5 *2 = 3 Bid phrase artificial words (path) A6:0 B6_5:1 C6_5:2 Cluster path Distinct ads 6|Ad0, Ad3|=2 6_5|Ad1, Ad2, Ad3, Ad4| = 4

Finding Related Bid phrases with Corresponding Ads  The process to find related bid phrases  Return clusters, those containing at least one bid are stored in one group  Perform a multi-way merge operation to get the final results. AdAd1Ad2Ad3Ad4 Bid phrases AB,CA,B,CC AdAd1Ad2Ad3Ad4 Bid phrases A B,CA,B,C C

Ranking Top-k Relevant Ads  A procedure to expand the user query with related bid phrases and get a list of ads  To get the top K  User a scoring function QQuery B(x)Set of related bid phrases Similarity between x and y tfidf(y, ad) term frequency and inverse document frequency

Experimental evaluation  Both Chinese and English

Experimental evaluation cont’s NameDescription CQS1 (Chinese )or EQS1 (English)Randomly sampled 100 bid phrases and each bid phrase is associated with few distinct ads CQS2 (Chinese )or EQS2 (English)Selected 100 pairs bid phrases, each pair could return ads associated with both bid phrases inside it CQS3 (Chinese )or EQS3 (English)Constructed similarly with queries composed of 3 to 4 bid phrases CQF ( Chinese Frequent Query set)and EQS( English Frequency Query Set ) 100 popular bid phrases to build the CQF and EQF

Experimental evaluation cont’s  Evaluation of the clusters step

Experimental evaluation cont’s  Efficiency evaluation The adSearch was implemented in fixed and unfixed block sizes The block size is defined as the fraction of distinct ads in the block with regards to the whole ads. AdSearch(0.001) number of distinct ads in each block. For example Chinese data 524, 868 * = 525 Chinese data set = 525 Inv= perform query expansion on top of the traditional inverted index

Experimental evaluation cont’s  Effectiveness valuation Randomly selected 50 queries 10 people invited to evaluate the returned ads by AdSearch and Baidu.

Experimental evaluation cont’s  Effectiveness evaluation

Conclusion  Introduced a AdSearch system which consists  Bid phrase clustering  For each bid phrase and ad, it will contract a bipartite graph  Used the agglomerative iterative clustering to cluster similar ads  Index structure for efficient ad search  Used a block-based index structure to index all ads and bid phrases  Used the dictionary to record mappings between bid phrases and ads  Query processing  Explained how ads we retrieved and ranked to get the top-k results

THANK YOU

Introduction cont’s Back All Docs Relevant Ads Relevant Docs (R) Relevant Ads in the Ads set (Ra ) Q = “job training”