Presentation is loading. Please wait.

Presentation is loading. Please wait.

Context-Based Metrics For Evaluating Changes to Web Pages Thesis Defense By Suvendu Kumar Dash Texas A&M University.

Similar presentations


Presentation on theme: "Context-Based Metrics For Evaluating Changes to Web Pages Thesis Defense By Suvendu Kumar Dash Texas A&M University."— Presentation transcript:

1 Context-Based Metrics For Evaluating Changes to Web Pages Thesis Defense By Suvendu Kumar Dash Texas A&M University

2 Overview The Walden’s Paths Project The Path Manager Context-Based Module Results Future Work Questions

3 The Walden’s Paths Project Goals: Walden’s Paths is an application to be used by K-12 educators to organize World-Wide Web material for their students use. Students using the path get a cohesive view of the material, and may browse off the path freely, with the assurance that they can return to the path with ease.

4 Goals: Path Manager is the module in the Walden’s Paths project that manages the ever-changing web pages. Previous Work: Detect the Content-Based Change, Presentation Change, and the Structural Changes in the web pages. Path Manager

5 Context-Based Module Goal- To give meaning to the changes with respect to the path not the previous version of the page. Input- The output of the parser in Path Manager that gives the terms present in the document Output- Context-Based as well as the Content-Based Metrics of Change

6 Creation of Context-Based and Content-Based Metrics of Change Parser Signature File Contextual Analyzer Context-Based Metrics And Content-Based Metrics Path File

7 Implementation Steps: Find the Term Vector of the individual web pages in the path. This is done by putting all the words present in the document (except the stop words like a, and, the, etc…) in a vector. The Page Term Vectors for all pages are saved in the Signature File of the Path. A Path Vector is computed using a composition of the Term Vectors for all the web pages in a particular path except the page whose change is being evaluated. Calculate the Cosine Similarity angle between the Path Vector and Page Term Vector. Then compare this angle to that for the previous version of the page. The difference between these two angles is used to compute the degree of change to the web page. The algorithm was tested with existing paths to determine change values that convey the different degrees of change for web pages.

8 Testing Steps:  Pages were collected from Yahoo! Directories from a particular Category and paths were built on those pages.  One of the page was changed to a page talking about Elephants and the Context-Based Metrics were evaluated.  The Same page was changed to a CNN Financials page and the Context- Based Metrics were evaluated.  Then the same page was changed to a similar page (within the same context) and the Context-Based Metrics were evaluated.

9 Results (From 20 Collections/Paths) The whole page changed to a page talking about Elephants The whole page changed to a page on CNN Financials The whole page changed to a Similar Page Angle of the changed page to the original page (in degrees) Average78.099681.8591575.13672 Ranges30.77 to 88.1577.025 to 87.72635.091 to 84.5424 Standard Deviation15.654332.89085910.75518 Proportional Algorithm Degrees of change (High Level, Medium and Lowest) High Level Angle of the page to the path (in degrees) Average-7.81489-9.075151.94304 Ranges-23.19 to 1.639-45.03 to 0.876-15.18 to 14.3 Standard Deviation6.94608810.574546.801181

10 Results (Path About Movies) The whole page changed to a page talking about Elephants The whole page changed to a page on CNN Financials The whole page changed to a page of NY Times Movies (Similar Page) Angle of the changed page to the original page (in degrees) 87.02583.06580.023 Proportional Algorithm High Level Degrees of change (High Level, Medium and Lowest) -15.622-5.67592.8108

11 Results (Path About Search Engines) The whole page changed to a page talking about Elephants The whole page changed to a page on CNN Financials The whole page changed to a page talking about Internet (Similar Page) Angle of the changed page to the original page (in degrees) 85.74678.53773.729 Proportional Algorithm High Level Degrees of change (High Level, Medium and Lowest) -18.737-7.425+0.932

12 Results (Path About Texas History ) The whole page changed to a page about Mexican History The whole page changed to a page talking about Elephants The whole page changed to a page on CNN Financials The whole page changed to a page about Texas History (Similar Page) Angle of the changed page to the original page (in degrees) 82.52 85.51 80.9277.374 Proportional Algorithm High Level High Level Degrees of change (High Level, Medium and Lowest) 1.38-6.66 -3.195.35

13 Results (Path About Indian History ) The whole page changed to a page talking about Giraffes The whole page changed to a page talking about Elephants The whole page changed to a page on CNN Financials The whole page changed to a page about Indian History (Similar Page) Angle of the changed page to the original page (in degrees) 87.916 30.77 87.72684.5424 Proportional Algorithm High Level High Level Degrees of change (High Level, Medium and Lowest) -18.8211.639 -18.371-3.012

14 Future Work Give more weight to the headings, bold text etc. For this to work, the Parser in the Path Manager needs to be modified so that it can get this information.

15 Results: Path about Elephants (with Headings given more weight) Page with one paragraph changed The whole page changed to a page talking about Giraffe The whole page changed to a page on CNN Financials The whole page changed to a page talking about Elephants (Similar Page) Angle of the changed page to the original page (in degrees) 20.59 80.668 81.041 75.5819 Angle of the changed page to the original page (in degrees) with more weights given to headings 8.36686.3786.524.32 Proportional Algorithm Degrees of change (High Level, Medium and Lowest) Medium Level (Green)High Level Angle of the page to the path (in degrees) 0.913-4.658-7.202-1.837 Angle of the page to the path (in degrees) with more weights given to headings -0.254-52.707-54.179+0.756

16 Questions?


Download ppt "Context-Based Metrics For Evaluating Changes to Web Pages Thesis Defense By Suvendu Kumar Dash Texas A&M University."

Similar presentations


Ads by Google