Managing Distributed Collections: Evaluating Web Page Change, Movement, and Replacement Richard Furuta and Frank Shipman Center for the Study of Digital.

Slides:



Advertisements
Similar presentations
Automatically Annotating and Integrating Spatial Datasets Chieng-Chien Chen, Snehal Thakkar, Crail Knoblock, Cyrus Shahabi Department of Computer Science.
Advertisements

Alvin Kwan Division of Information & Technology Studies
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
1 Quality Control in Scholarly Publishing. What are the Alternatives to Peer Review? William Y. Arms Cornell University.
Opinion Spam and Analysis Nitin Jindal and Bing Liu Department of Computer Science University of Illinois at Chicago.
Managing Distributed Collections: Evaluating Web Page Changes, Movement, and Replacement Zubin Dalal, Suvendu Dash, Pratik Dave, Luis Francisco-Revilla,
Supporting Multilingual Paths on the WWW Unmil P. Karadkar, Luis Francisco-Revilla, Richard Furuta, Frank M. Shipman III, Avital Arora Texas A&M University.
Interfaces for Selecting and Understanding Collections.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Managing Change in Distributed Collections Frank M. Shipman III Luis Francisco-Revilla Richard Furuta Center for the Study of Digital Libraries Texas A&M.
Focus Group Methodology  Five focus groups science educators (n = 38)  K-5, 6-12 (inservice and preservice group), undergraduate faculty (two groups)
Dynamically Growing Hypertext Collections - Pratik Dave, - Paul Logasa Bogen II - Unmil Karadkar Luis Francisco-Revilla, Richard Furuta, Frank Shipman.
The Walden ’ s Paths Quiz Engine Avital Arora, Emily Barker, Unmil P. Karadkar, Pratik Dave, Luis Francisco-Revilla, Richard Furuta, Frank Shipman, Suvendu.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Link Analysis, PageRank and Search Engines on the Web
Walden's Paths Richard Furuta, Frank M. Shipman III, Hugh Wilson Avital Arora, Luis Francisco-Revilla, Unmil P. Karadkar, Emily Luke, James Vasek Center.
Template-based Authoring of Educational Artifacts Texas A & M University Center for the Study of Digital Libraries *Department of Educational Psychology.
Richard Furuta Texas A&M University Center for the Study of Digital Libraries and The Department of Computer Science Firing a transition.
Managing Change on the Web Luis Francisco-Revilla Frank M. Shipman Richard Furuta Unmil Karadkar Avital Arora Center for the Study of Digital Libraries.
Managing Distributed Collections: Evaluating Web Page Change, Movement, and Replacement Richard Furuta and Frank Shipman Center for the Study of Digital.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Recognizing User Interest and Document Value from Reading and Organizing Activities in Document Triage Rajiv Badi, Soonil Bae, J. Michael Moore, Konstantinos.
Context-aware Trellis (caT) Principal Investigator: Richard Furuta Center for the Study of Digital Libraries and the Department of Computer Science Texas.
The Walden's Paths Virtual Directories Unmil P. Karadkar, Luis Francisco-Revilla, Richard Furuta, Frank M. Shipman III Texas A&M University Structuring.
Context-Based Metrics For Evaluating Changes to Web Pages Thesis Defense By Suvendu Kumar Dash Texas A&M University.
What are competencies – some definitions ……… Competencies are the characteristics of an employee that lead to the demonstration of skills & abilities,
ECDL 2002 Employing Smart Browsers to Support Flexible Information Presentation in Petri net-based Digital Libraries Unmil P. Karadkar, Richard Furuta.
Projects in the Intelligent User Interfaces Group Frank Shipman Associate Director, Center for the Study of Digital Libraries.
Walden’s Paths Principal Investigators: Richard Furuta, Frank Shipman Center for the Study of Digital Libraries Texas.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Web Site Evaluation (or “What Makes a Good the Kenmore West High School Library Media Center.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
Ensemble Computing in the National Science Digital Library (NSDL)
Creating and Operating a Digital Library for Information and Learning– the GROW Project Muniram Budhu Department of Civil Engineering & Engineering Mechanics.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Serving society Stimulating innovation Supporting legislation Joint Research Centre The Inspire Geoportal Validator.
Perception of Content, Structure, and Presentation Changes in Web-based Hypertext Luis Francisco-Revilla Frank M. Shipman III Richard Furuta Unmil Karadkar.
Short-Term Economic Statistics Working PartyJune Short Term Economic Statistics Timeliness Framework Richard McKenzie OECD.
A Conceptual Overview Presentation: 60 minutes Q&A: 30 minutes.
CONCLUSION & FUTURE WORK Given a new user with an information gathering task consisting of document IDs and respective term vectors, this can be compared.
MTA SZTAKI Department of Distributed Systems The problems of persistent identifiers in the context of the National Digital Data Archives of Hungary András.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
Directions for Hypertext Research: Exploring the Design Space for Interactive Scholarly Communication John J. Leggett & Frank M. Shipman Department of.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
January 2005MERLOT Reusable Learning Design Guidelines OVERVIEW FOR MERLOT Copyright 2005 Reusable Learning This work is licensed under a Attribution-NoDerivs-NonCommercial.
Algorithmic Detection of Semantic Similarity WWW 2005.
PLACING AND LINKING GRAPHICS
Flickr Tag Recommendation based on Collective Knowledge BÖrkur SigurbjÖnsson, Roelof van Zwol Yahoo! Research WWW Summarized and presented.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Resources and Reflections: Using Data in Undergraduate Geosciences Cathy Manduca SERC Carleton College DLESE Annual Meeting 2003.
1 Integrating Human Factors into Designing User Interface for Digital Libraries Sung Been Moon
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Artificial Intelligence Techniques Internet Applications 4.
Unified Relevance Feedback for Multi-Application User Interest Modeling Sampath Jayarathna PhD Candidate Computer Science & Engineering.
Context-driven Access to Personalized Digital Multimedia Libraries Invited Talk at the 1st International Conference on Digital Libraries New Dehli, India.
Connecting Interface Metaphors to Support Creation of Path-based Collections Unmil P. Karadkar, Andruid Kerne, Richard Furuta, Luis Francisco-Revilla,
Introduction Multimedia initial focus
Personalized Social Image Recommendation
Sampling Distribution
Technical Issues in Sustainability
ACM Digital Repository Classification Results
Lesson 2: Gathering and Organizing Information Using ICT KEY QUESTION: HOW DO YOU GATHER AND ORGANIZE INFORMATION USING THE COMPUTER AND INTERNET?
Presentation transcript:

Managing Distributed Collections: Evaluating Web Page Change, Movement, and Replacement Richard Furuta and Frank Shipman Center for the Study of Digital Libraries and the Department of Computer Science Texas A&M University

Distributed Collections The Web is continuously changing –.gov and.edu pages change less frequently than.com pages (1999) Collections are needed to “organize” the Web –Bookmark lists –Yahoo! directories –Web portals (NSDL) –Walden’s Paths Collection managers cannot control changes

Changes to Items in Collections Items in collections –Play specific roles –Are semantically related To each other To the collection Change to an item may –Change its relationship to the collection Less coherent with other items (default assumption) More or no change in relationship –Affect the role it plays in the collection Less suitable (default assumption) More suitable or no effect on the role

Research Focuses Develop techniques to help collection managers cope with changes –Change, migration, disappearance Categories of Change –Missing pages (migration and disappearance) Find exact matches Suggest similar pages –Changed pages: characterizing change Quantity of change Nature of change Relevance to the collection Implementation: Path Manager – A tool that helps collection managers cope with changes

Management of Distributed Collections Detection of change is easy Determination of –Quantity of change is relatively easy –Relevance of change is less easy –Meaning of change is difficult Approaches –Human validation (Yahoo! surfers) –Automatic detection of change (Path Manager)

Path Manager – The tool Collection-level overviewPage-level overviewPage details Types of change –Content changes (what) –Presentation changes (how) –Structural changes (linking) –Behavioral changes (scripting – not addressed)

Collection-level Overview

Page-level Overview Little Change Server unreachable 404 error No change Drastic change

Page Details Page Information Modification details

Content-based Metrics Replaced withPage about elephants CNN Financials page Average Range30.8 to to 87.7 Standard deviation Angle between original and replacing pages (in degrees) High angle of change for all cases Change is change…

Context-based Change Detection Context consists of –Content from other pages in the path –Annotations created by the author –Additional metadata provided by the author Distinguishes between edited and replaced pages

Evaluation 20 paths, pages selected from Yahoo! Directories Each path consisted of 10 to 12 pages Pages were randomly selected –no flash presentations or images A page in each path was randomly selected for replacement Each selected page was replaced by 3 pages –CNN Financials (large change) –Elephants (large change) –A page from the same Yahoo! Directory (small change)

Results – Distribution of Context-based changes More than -4-4 to 2More than 2 Replacement by a member of the Yahoo! Directory 1 (5%)10 (50%)9 (45%) Replacement by non- member 25 (62.5%)15 (37.5%)0 (0%) Replacements resulting in moving towards and away from the context vector Experimental thresholds Negative angle = divergence from the collection Distinction between similar and different pages Managers can now focus on divergent pages

For more information on Walden’s Paths Principal Investigators: Richard Furuta Frank Shipman