Download presentation
Presentation is loading. Please wait.
Published byὍμηρος Ευσταχηιος Μητσοτάκης Modified over 6 years ago
1
Web Information retrieval (Web IR)
Handout #13: Ranking based on User Behavior Ali Mohammad Zareh Bidoki ECE Department, Yazd University Autumn 2011
2
Finding Ranking Function
R=f( Query, User behavior, web graph & content features) How can we use the user behavior? Explicit Implicit 80% of user clicks are related to query Click-through data From search Engines log بهترین معیار برای ترکیب رفتار کاربر است. چون در نهایت رضایت کاربر مهم است. Autumn 2011
3
Click-through data (by Joachims )
Triple (q,r,c) q=query r=ranked list c=set of clicked docs c q r Autumn 2011
4
Benefits of Using Click through data
Democracy in Web Filling gap between user needs and results User clicks are more valuable that a page content (Search engine precision is evaluated by user no page creators) Degree of relevancy between query and documents will increase (Adding click metadata to document) Autumn 2011
5
Web Entities 1 2 n Web graph 1 2 n 1 2 w 1 2 q 1 2 m Docs Docs Words
Queries 1 2 m Users Autumn 2011
6
Document Expansion Using Click TD
First time Google used Anchortext as a document content Anchor text is view of a document from another document Autumn 2011
7
Long term incremental learning
Di vector of a document in ith iteration Q is vector of the query that this document is clicked Alpha is learning rate Autumn 2011
8
Naïve Method (NM) A bipartite graph for docs and queries
Mij is number of clicks on document j for query i Autumn 2011
9
Naïve Method (Cont.) The weight between query qj and document di:
The meta data for document i is: Autumn 2011
10
Co-Visited Method If two pages are clicked by the same query they called co-visited. The similarity between two docs i and j is (visited(di) shows number of clicks on di and visited(di,dj) shows number of queries in which both are clicked): Autumn 2011
11
Co-Visited Disadvantages
It only considers documents similarity (not query similarity) As users clicks on top 10 pages, click data are sparse (1.5 queries for each page) So similarity is not precise Autumn 2011
12
Iterative Method (IM) O(q): set of clicked page for q
Oi(q): the ith clicked page for q I(d): set of queries in which it is clicked on d Ii(d): The ith query in which it is clicked on d Autumn 2011
13
Experimental Results Experimental results on a real large query click-through log, i.e. MSN query log data, indicate that the proposed algorithm relatively outperforms the baseline search system by 157%, naïve query log mining by 17% and co-visited algorithm by 17% on top 20 precision respectively. Autumn 2011
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.