Presentation is loading. Please wait.

Presentation is loading. Please wait.

Managing Distributed Collections: Evaluating Web Page Change, Movement, and Replacement Richard Furuta and Frank Shipman Center for the Study of Digital.

Similar presentations


Presentation on theme: "Managing Distributed Collections: Evaluating Web Page Change, Movement, and Replacement Richard Furuta and Frank Shipman Center for the Study of Digital."— Presentation transcript:

1 Managing Distributed Collections: Evaluating Web Page Change, Movement, and Replacement Richard Furuta and Frank Shipman Center for the Study of Digital Libraries and the Department of Computer Science Texas A&M University

2 Distributed Collections The Web is continuously changing –.gov and.edu pages change less frequently than.com pages (1999) Collections are needed to “organize” the Web –Bookmark lists –Yahoo! directories –Web portals (NSDL) –Walden’s Paths Collection managers cannot control changes

3 Changes to Items in Collections Items in collections –Play specific roles –Are semantically related To each other To the collection Change to an item may –Change its relationship to the collection Less coherent with other items (default assumption) More or no change in relationship –Affect the role it plays in the collection Less suitable (default assumption) More suitable or no effect on the role

4 Research Focuses Develop techniques to help collection managers cope with changes –Change, migration, disappearance Categories of Change –Missing pages (migration and disappearance) Find exact matches Suggest similar pages –Changed pages: characterizing change Quantity of change Nature of change Relevance to the collection Implementation: Path Manager – A tool that helps collection managers cope with changes

5 Management of Distributed Collections Detection of change is easy Determination of –Quantity of change is relatively easy –Relevance of change is less easy –Meaning of change is difficult Approaches –Human validation (Yahoo! surfers) –Automatic detection of change (Path Manager)

6 Path Manager – The tool Collection-level overviewPage-level overviewPage details Types of change –Content changes (what) –Presentation changes (how) –Structural changes (linking) –Behavioral changes (scripting – not addressed)

7 Collection-level Overview

8 Page-level Overview Little Change Server unreachable 404 error No change Drastic change

9 Page Details Page Information Modification details

10 Content-based Metrics Replaced withPage about elephants CNN Financials page Average78.181.9 Range30.8 to 88.177.0 to 87.7 Standard deviation 15.652.89 Angle between original and replacing pages (in degrees) High angle of change for all cases Change is change…

11 Context-based Change Detection Context consists of –Content from other pages in the path –Annotations created by the author –Additional metadata provided by the author Distinguishes between edited and replaced pages

12 Evaluation 20 paths, pages selected from Yahoo! Directories Each path consisted of 10 to 12 pages Pages were randomly selected –no flash presentations or images A page in each path was randomly selected for replacement Each selected page was replaced by 3 pages –CNN Financials (large change) –Elephants (large change) –A page from the same Yahoo! Directory (small change)

13 Results – Distribution of Context-based changes More than -4-4 to 2More than 2 Replacement by a member of the Yahoo! Directory 1 (5%)10 (50%)9 (45%) Replacement by non- member 25 (62.5%)15 (37.5%)0 (0%) Replacements resulting in moving towards and away from the context vector Experimental thresholds Negative angle = divergence from the collection Distinction between similar and different pages Managers can now focus on divergent pages

14 For more information on Walden’s Paths http://www.csdl.tamu.edu/walden/ walden@csdl.tamu.edu Principal Investigators: Richard Furuta (furuta@csdl.tamu.edu) Frank Shipman (shipman@csdl.tamu.edu)


Download ppt "Managing Distributed Collections: Evaluating Web Page Change, Movement, and Replacement Richard Furuta and Frank Shipman Center for the Study of Digital."

Similar presentations


Ads by Google