Using ODP Metadata to Personalize Search Presented by Lan Nie 09/21/2005, Lehigh University.

Slides:



Advertisements
Similar presentations
Predicting User Interests from Contextual Information
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Topic-Sensitive PageRank Presented by : Bratislav V. Stojanović University of Belgrade School of Electrical Engineering Page 1/29.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
Link Analysis David Kauchak cs160 Fall 2009 adapted from:
Web Search – Summer Term 2006 VI. Web Search - Ranking (c) Wolfgang Hürst, Albert-Ludwigs-University.
CS246: Page Selection. Junghoo "John" Cho (UCLA Computer Science) 2 Page Selection Infinite # of pages on the Web – E.g., infinite pages from a calendar.
Automatic Web Page Categorization by Link and Context Analysis Giuseppe Attardi Antonio Gulli Fabrizio Sebastiani.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Link Analysis, PageRank and Search Engines on the Web
How Search Engines Work Source:
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
1 Crawling the Web Discovery and Maintenance of Large-Scale Web Data Junghoo Cho Stanford University.
How to Crawl the Web Junghoo Cho Hector Garcia-Molina Stanford University.
Google and the Page Rank Algorithm Székely Endre
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
How Search Engines Work. Any ideas? Building an index Dan taylor Flickr Creative Commons.
Information Retrieval in Folksonomies Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Google Directory By, Dixie E. Oyola. Google Directory The Google Web Directory integrates Google's sophisticated search technology with Open Directory.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Scaling Personalized Web Search Authors: Glen Jeh, Jennfier Widom Stanford University Written in: 2003 Cited by: 923 articles Presented by Sugandha Agrawal.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Publication Spider Wang Xuan 07/14/2006. What is publication spider Gathering publication pages Using focused crawling With the help of Search Engine.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape.
Algorithmic Detection of Semantic Similarity WWW 2005.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Augmenting Focused Crawling using Search Engine Queries Wang Xuan 10th Nov 2006.
The Structure of Broad Topics on the Web Soumen Chakrabarti Mukul M. Joshi Kunal Punera (IIT Bombay) David M. Pennock (NEC Research Institute)
The Structure of Broad Topics on the Web Soumen Chakrabarti, Mukul M. Joshi, etc Presentation by Na Dai.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
1 Context-Aware Ranking in Web Search (SIGIR 10’) Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, Hang Li 2010/10/26.
Autumn Web Information retrieval (Web IR) Handout #11:FICA: A Fast Intelligent Crawling Algorithm Ali Mohammad Zareh Bidoki ECE Department, Yazd.
Perfect Search Media  Search Engine Optimization  Search engine optimization (SEO) is the process and strategy of influencing the correlation.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Using ODP Metadata to Personalize Search University of Seoul Computer Science Database Lab. Min Mi-young.
Neighborhood - based Tag Prediction
DATA MINING Introductory and Advanced Topics Part III – Web Mining
Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms By Monika Henzinger Presented.
Search Engines and Link Analysis on the Web
PageRank and Markov Chains
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Federated & Meta Search
Search Engines & Subject Directories
A Comparative Study of Link Analysis Algorithms
Rank Aggregation.
Information Retrieval
Information retrieval and PageRank
Improved Algorithms for Topic Distillation in a Hyperlinked Environment (ACM SIGIR ‘98) Ruey-Lung, Hsiao Nov 23, 2000.
Identify Different Chinese People with Identical Names on the Web
Search Engines & Subject Directories
Search Engines & Subject Directories
UNINFORMED SEARCH -BFS -DFS -DFIS - Bidirectional
Web Mining Research: A Survey
Information Retrieval and Web Design
Connecting the Dots Between News Article
Presentation transcript:

Using ODP Metadata to Personalize Search Presented by Lan Nie 09/21/2005, Lehigh University

Introduction ODP metadata  4 million sites, 590,000 categories  Tree Structure Categories: inner node Pages: leaf node, high quality, representative Using ODP Metadata to personalize Search  4 billion vs. 4 million  Using ODP Metadata for personalized search  Is biasing possible in the ODP context? Extend ODP classifications from its current 4 million to a 4 billion Web automatically by biasing

Using ODP Metadata For Personalized Search User Profile: several topics from ODP selected by user Personalized Search  Send Q to a search Engine S(E.g., Google, ODP Search)  Res=URLs returned by S  For i= 1 to size(Res) Dist[i]=Distance(Res[i], Prof)  Resort Res based on Dist Representation  Both user profile and URL(50% in Google directory) can be represented as a set of nodes in the directory tree Distance ( Profile, URL)  Minimum distance between the 2 set of nodes.

 Naïve Distances Minimum tree distance  Intra-topic links  Subsumer Graph shortest path  Inter-topic links  Combing with Google PageRank Some Google Results are not annotated  Complex Distance The bigger the subsumer’s depth is, the more related are the nodes

Experimental Results

Extending ODP Annotations To The Web  Manual annotation for the whole web is impossible  Biasing is an implicit way for extending annotations to the Web  Is basing possible in the ODP context? Are ODP entries good biasing sets to obtain relevant results: generate rankings which are different enough from the non- biased ranking  When does biasing make a difference? Find the characteristics the biasing set has to exhibit in order to obtain relevant results

Compare the similarity between top 100 non-biased PageRank results and biased results Similarity Measure  OSIM: degree of overlap between the top n elements of two rank lists  KSim: degree of agreement on ordering between the two rank lists Experimental Setup

Choice of Biasing Sets  Top [0-10]% PageRank pages  Top[0-2]% PageRank pages  Randomly selected pages  Low PageRank pages Varied the sum of score within the set between % and 10% of the total sum over all pages (TOT). Experiments are done on a crawl of 3 million pages, and then applied on Stanford WebBase crawl.

Biasing set consists of good pages

Biasing set consists of random selected pages

According to the random model of biasing, every set with TOT below 0.015% is good for biasing. Results are not influence by the crawl size (3 million crawl vs 120 million WebBase crawl) Entries in ODP have TOT below than 0.015% thus biasing is possible in the ODP context

Conclusions A Personalized search algorithm to rank urls based on the distance between user profile and url in the ODP taxonomy. Biasing on ODP entries will take effect, thus it is feasible to extend the manual ODP classification to the Web is feasible