Download presentation
Presentation is loading. Please wait.
1
PROBLEM BEING ATTEMPTED Privacy -Enhancing Personalized Web Search Based on: User's Existing Private Data Browsing History E-Mails Recent Documents All Data Stored on User's Personal Computer Classify Interests into 2 Categories General Interests (Less Sensitive to Privacy) Specific Interests (More Sensitive to Privacy)
2
PROPOSED SOLUTION Automatically build user profile from available source data Similar Terms: Two terms that cover the document set with heavy overlaps might indicate the same interest area Use Jaccard Similarity Parent-Child Terms: Specific terms often appear together with general terms, but opposite not true. e.g. Badminton and Sports Use Conditional Probabilities Control the information sent to the search engine minDetail: Determines which part of the user profile is protected expRatio: Measures how much private information is exposed to the server Inversely Related Wrapper to personalize results Use previously constructed profile Parameter α that determines amount of personalization PPRank = α PersonalRank + (1-α) SearchEngineRank
3
CRITICISMS Number of Returned Results Considered Only top 50 results from the original search engine results considered, but user's preferences could project a lower result into the top 10 Measure of Search Quality Average Precision is used; but authors fail to explain how this ties in with the quality of the personalization in the search Testing the Effect of Manual Privacy Settings The authors don't provide experiments that test how the system would behave if users, instead of using minDetail all the time, manually excluded some terms from their profile Classification of Terms A specific term might be classified wrongly as general if it occurs often enough in the user's corpus; but it is extremely revealing as far as privacy is concerned
4
RELATIONS TO COURSE TOPICS Personalized Search Query Disambiguation: The authors use a variant of the course example; searching for Rockets, a sports fan wants the basketball team, not links related to space exploration Relevance Feedback: This system is implicit in seeking relevance feedback through the user's browsing history and other personal information, rather than explicit. Dealing with Unstructured Data Browsing history, e-mails and recent documents all unstructured New system to classify terms into either similarity or parent-child relationships based on occurrences Term Similarity Used to classify terms; use Jaccard Similarity and Conditional Probability Weighted Re-Ranking of Results Similar idea, but not same, as PageRank Get the results from search engine for given query Then re-rank the results based on user's personal profile, using weighting factor α
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.