Presentation is loading. Please wait.

Presentation is loading. Please wait.

Block-level Link Analysis Presented by Lan Nie 11/08/2005, Lehigh University.

Similar presentations


Presentation on theme: "Block-level Link Analysis Presented by Lan Nie 11/08/2005, Lehigh University."— Presentation transcript:

1 Block-level Link Analysis Presented by Lan Nie 11/08/2005, Lehigh University

2 Introduction Web page often contains multiple semantics  Different parts of the page have different importance and topic  Links contained in different semantic blocks point to pages of different topics Importance of page may be mis-calculated by PageRank and topic drift may happen in HITS Split page into semantic blocks Apply link analysis on block-level

3

4 Vision-Based Page Segmentation Construct a semantic tree for a page based on layout structure  Extract blocks from the html DOM tree  Constructed blocks into a semantic tree based on seperators  Node: block with a value (DOC) to indicate how coherent of the content in the block.

5 Block Level Web Graph P: set of all the pages B: set of all the blocks X: page-to-block matrix (layout structure) f is block importance function: b ig size and centered position vs small size and margin position Z: block-to-page matrix (link structure) Is the number of pages that block i links to

6 W P: Page-to-Page Graph A weighted adjacency matrix: Links in blocks with high importance value get more weights than those in blocks with low importance value

7 W B: Block-to-Block Graph (didn’t use in this paper) Extension: the probability of jump from a block a to block b within a page is DOC value of the smallest block containing both block a and block b

8 Block Level Page Rank(BLPR) Apply PageRank on weighted adjacency matrix W P Edge is weighted by block’s importance value.  Pages pointed by advertisement hyperlinks might not be assigned a large score since such links are always in less important blocks. Block level PageRank can reflect the semantic structure of the web

9 Block level HITS(BLHITS) Apply HITS on block-to-page matrix Z A page will have only authority score A and a block will have only hub score H Different parts of the page are treated differently, thus the links in these hubs are treated differently.

10 Main difference between BLHITS and HITS Links from blocks to pages vs Links from pages to pages Root set is made up of top ranked blocks rather than top ranked pages. When expanding the root set, only consider out-links contained in top ranked blocks of a page instead of all links. Combine content analysis in block-level instead of page- level. Weight links: importance value of the block /maximum block importance value

11 Experiments DataSet: TREC2003 Relevance weighting: BM2500 PR and BLPR HITS and BLHITS  Size of rootset:200  In-link parameter d:50  Adopting Bharat and Henzinger’s idea Eliminate mutually reinforcing relationship between hosts Combine connectivity and content analysis

12 Results on PR & BLPR 1. First 15 pages in.GOV dataset

13 2. Results on TREC2003 Combine relevance score (using BM2500) and importance score (using ranking algorithm)

14 Results on HITS & BLHITS

15 summary


Download ppt "Block-level Link Analysis Presented by Lan Nie 11/08/2005, Lehigh University."

Similar presentations


Ads by Google