Evidence from Behavior LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007.

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Relevance Feedback and Query Expansion
Kevin Knight,
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Personalization and Search Jaime Teevan Microsoft Research.
Recommender Systems Aalap Kohojkar Yang Liu Zhan Shi March 31, 2008.
Search Engines and Information Retrieval
K nearest neighbor and Rocchio algorithm
Information Retrieval Review
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Evidence from Behavior LBSC 796/CMSC 828o Douglas W. Oard Session 5, February 23, 2004.
Search Session 12 LBSC 690 Information Technology.
1 CS 430: Information Discovery Lecture 21 Web Search 3.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Web Mining Research: A Survey
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Information Access Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies Design Understanding.
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
Web Characterization Week 11 LBSC 690 Information Technology.
Recognizing User Interest and Document Value from Reading and Organizing Activities in Document Triage Rajiv Badi, Soonil Bae, J. Michael Moore, Konstantinos.
Information Retrieval
The Internet 8th Edition Tutorial 1 Browser Basics.
Information Filtering LBSC 796/INFM 718R Douglas W. Oard Session 10, November 12, 2007.
August 1, 2003SIGIR Implicit Workshop Protecting the Privacy of Observable Behavior in Distributed Recommender Systems Douglas W. Oard University of Maryland.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Web Data Mining and Applications Part I
Slide 3.1 Saunders, Lewis and Thornhill, Research Methods for Business Students, 5 th Edition, © Mark Saunders, Philip Lewis and Adrian Thornhill 2009.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
1 Announcements Research Paper due today Research Talks –Nov. 29 (Monday) Kayatana and Lance –Dec. 1 (Wednesday) Mark and Jeremy –Dec. 3 (Friday) Joe and.
Search Engines and Information Retrieval Chapter 1.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.
Tutorial 1: Browser Basics.
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Evidence from Behavior INST 734 Doug Oard Module 7.
Support.ebsco.com EBSCOhost Basic Searching for Academic Libraries Tutorial.
Information Filtering LBSC 796/INFM 718R Douglas W. Oard Session 10, April 13, 2011.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
| 1 › Gertjan van Noord2014 Search Engines Lecture 7: relevance feedback & query reformulation.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Personalized Search Xiao Liu
Information Filtering LBSC 878 Douglas W. Oard and Dagobert Soergel Week 10: April 12, 1999.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
CS349 – Link Analysis 1. Anchor text 2. Link analysis for ranking 2.1 Pagerank 2.2 Pagerank variants 2.3 HITS.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Unit 2 Capstone (Research Option) Sources. Overview of the paper (1 of 2) ● The research paper is a double-spaced 25- to 30- page paper on the topic agreed.
COMP4210 Information Retrieval and Search Engines Lecture 9: Link Analysis.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Evidence from Behavior
Chapter 3 Critically reviewing the literature
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Relevance Feedback Hongning Wang
Why Decision Engine Bing Demos Search Interaction model Data-driven Research Problems Q & A.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Lecture 9: Query Expansion. This lecture Improving results For high recall. E.g., searching for aircraft doesn’t match with plane; nor thermodynamic with.
Recommender Systems & Collaborative Filtering
Search Engine Architecture
Lecture 12: Relevance Feedback & Query Expansion - II
Quality of a search engine
Lesson 14 Sharing Documents
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
CSA3212: User Adaptive Systems
Filtering and Recommendation
Presentation transcript:

Evidence from Behavior LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007

Picture of Relevance Feedback x x x x o o o Revised query x non-relevant documents o relevant documents o o o x x x x x x x x x x x x  x x Initial query  x

Rocchio Formula q m = modified query vector; q 0 = original query vector; α,β,γ: weights (hand-chosen or set empirically); D r = set of known relevant doc vectors; D nr = set of known irrelevant doc vectors

Rocchio Example Original query Positive Feedback Negative feedback (+) (-) New query Typically,  < 

Using Positive Information Source: Jon Herlocker, SIGIR 1999

Using Negative Information Source: Jon Herlocker, SIGIR 1999

Rating-Based Recommendation Use ratings as to describe objects –Personal recommendations, peer review, … Beyond topicality: –Accuracy, coherence, depth, novelty, style, … Has been applied to many modalities –Books, Usenet news, movies, music, jokes, beer, …

Motivations to Provide Ratings Self-interest –Use the ratings to improve system’s user model Economic benefit –If a market for ratings is created Altruism

Explicit Feedback: Assumptions A1: User has sufficient knowledge for a reasonable initial query A2: Selected examples are representative A3: The user will try giving feedback A4: The user will keep giving feedback

A1: Good Initial Query? Two problems: –User may not have sufficient initial knowledge –Few or no relevant documents may be retrieved Examples: –Misspellings (Brittany Speers) –Cross-language information retrieval –Vocabulary mismatch (e.g., cosmonaut/astronaut)

A2: Representative Examples? There may be several clusters of relevant documents Examples: –Burma/Myanmar –Contradictory government policies –Opinions

A3: Will People Use It? Efficiency –Longer queries require more processing time Understandability –Harder to see why subsequent documents retrieved Risk –Users are reluctant to provide negative feedback

A4: Self-Interest Decreases! Number of Ratings Value of ratings Marginal value to rater Marginal value to community Few Lots Marginal cost

Solving the Cost vs. Value Problem Maximize the value –Provide for continuous user model adaptation Minimize the costs –Use implicit feedback rather than explicit ratings –Minimize privacy concerns through encryption –Build an efficient scalable architecture –Limit the scope to noncompetitive activities

Solution: Reduce the Marginal Cost Number of Ratings Marginal value to rater Marginal value to community Few Lots Marginal cost

View Listen Select Print Bookmark Save Purchase Delete Subscribe Copy / paste Quote Forward Reply Link Cite Mark up Tag Publish Organize Type Edit

View Listen Select Print Bookmark Save Purchase Delete Subscribe Copy / paste Quote Forward Reply Link Cite Mark up Tag Publish Organize Behavior Category Examine Retain Reference Annotate Create Type Edit

Minimum Scope SegmentObjectClass View Listen Select Print Bookmark Save Purchase Delete Subscribe Copy / paste Quote Forward Reply Link Cite Mark up Tag Publish Organize Behavior Category Examine Retain Reference Annotate Create Type Edit

Estimating Authority from Links Authority Hub

Anchor Text Indexing Sometimes yields unexpected results: –Google ranked Microsoft #1 for “evil empire” Armonk, NY-based computer giant IBM announced today Joe’s computer hardware links Compaq HP IBM Big Blue today announced record profits for the quarter Adapted from: Manning and Raghavan

Collecting Click Streams Browsing histories are easily captured –Make all links initially point to a central site Encode the desired URL as a parameter –Build a time-annotated transition graph for each user Cookies identify users (when they use the same machine) –Redirect the browser to the desired page Reading time is correlated with interest –Can be used to build individual profiles –Used to target advertising by doubleclick.com

No Interest Low Interest Moderate Interest High Interest Rating Reading Time (seconds) Full Text Articles (Telecommunications)

More Complete Observations User selects an article –Interpretation: Summary was interesting User quickly prints the article –Interpretation: They want to read it User selects a second article –Interpretation: another interesting summary User scrolls around in the article –Interpretation: Parts with high dwell time and/or repeated revisits are interesting User stops scrolling for an extended period –Interpretation: User was interrupted

No Interest No Interest Low Interest Moderate Interest High Interest Abstracts (Pharmaceuticals)

Recommending w/Implicit Feedback Estimate Rating User Model Ratings Server User Ratings Community Ratings Predicted Ratings User Observations User Ratings User Model Estimate Ratings Observations Server Predicted Observations Community Observations Predicted Ratings User Observations

Critical Issues Protecting privacy –What absolute assurances can we provide? –How can we make remaining risks understood? Scalable rating servers –Is a fully distributed architecture practical? Non-cooperative users –How can the effect of spamming be limited?

Gaining Access to Observations Observe public behavior –Hypertext linking, publication, citing, … Policy protection –EU: Privacy laws –US: Privacy policies + FTC enforcement Statistical assurance of privacy –Distributed architecture –Model and mitigate privacy risks

Search Engine Query Logs A: Southeast Asia (Dec 27, 2004) B: Indonesia (Mar 29, 2005) C; Pakistan (Oct 10, 2005) D; Hawaii (Oct 16, 2006) E: Indonesia (Aug 8, 2007) F: Peru (Aug 16, 2007)

Search Engine Query Logs

AOL User