Managing Change on the Web Luis Francisco-Revilla Frank M. Shipman Richard Furuta Unmil Karadkar Avital Arora Center for the Study of Digital Libraries.

Slides:



Advertisements
Similar presentations
Opportunities for the Use of Recommendation and Personalization Algorithms in meLearning Environments Tom E. Vandenbosch World Agroforestry Centre (ICRAF)
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
New technologies and disaster information resources Part 2. The right information, at the right time, the right way.
Information Retrieval in Practice
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
Managing Distributed Collections: Evaluating Web Page Changes, Movement, and Replacement Zubin Dalal, Suvendu Dash, Pratik Dave, Luis Francisco-Revilla,
Supporting Multilingual Paths on the WWW Unmil P. Karadkar, Luis Francisco-Revilla, Richard Furuta, Frank M. Shipman III, Avital Arora Texas A&M University.
Interfaces for Selecting and Understanding Collections.
Instructional Information in Adaptive Spatial Hypertext Luis Francisco-Revilla and Frank Shipman Presented By : Ananda Man Shrestha.
Managing Distributed Collections: Evaluating Web Page Change, Movement, and Replacement Richard Furuta and Frank Shipman Center for the Study of Digital.
Managing Change in Distributed Collections Frank M. Shipman III Luis Francisco-Revilla Richard Furuta Center for the Study of Digital Libraries Texas A&M.
Dynamically Growing Hypertext Collections - Pratik Dave, - Paul Logasa Bogen II - Unmil Karadkar Luis Francisco-Revilla, Richard Furuta, Frank Shipman.
The Walden ’ s Paths Quiz Engine Avital Arora, Emily Barker, Unmil P. Karadkar, Pratik Dave, Luis Francisco-Revilla, Richard Furuta, Frank Shipman, Suvendu.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
Walden's Paths Richard Furuta, Frank M. Shipman III, Hugh Wilson Avital Arora, Luis Francisco-Revilla, Unmil P. Karadkar, Emily Luke, James Vasek Center.
Template-based Authoring of Educational Artifacts Texas A & M University Center for the Study of Digital Libraries *Department of Educational Psychology.
Richard Furuta Texas A&M University Center for the Study of Digital Libraries and The Department of Computer Science Firing a transition.
Managing Distributed Collections: Evaluating Web Page Change, Movement, and Replacement Richard Furuta and Frank Shipman Center for the Study of Digital.
Recognizing User Interest and Document Value from Reading and Organizing Activities in Document Triage Rajiv Badi, Soonil Bae, J. Michael Moore, Konstantinos.
The Walden's Paths Virtual Directories Unmil P. Karadkar, Luis Francisco-Revilla, Richard Furuta, Frank M. Shipman III Texas A&M University Structuring.
Context-Based Metrics For Evaluating Changes to Web Pages Thesis Defense By Suvendu Kumar Dash Texas A&M University.
Parallel and Distributed IR
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
ECDL 2006 An Exploration of Space-Time Constraints on Contextual Information in Image-based Testing Interfaces Unmil Karadkar, Marlo Nordt Richard Furuta.
Projects in the Intelligent User Interfaces Group Frank Shipman Associate Director, Center for the Study of Digital Libraries.
Walden’s Paths Principal Investigators: Richard Furuta, Frank Shipman Center for the Study of Digital Libraries Texas.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Scholars Tracking Archival Resources Ciaran Trace, Unmil P. Karadkar School of Information The University of Texas at Austin.
Overview of Search Engines
Understanding End User Role in PDF Accessibility Brad Hodges, AFB Consulting Pete De Vasto, Adobe Systems.
Student Research Week 2006 Image-based Evaluation of Video-acquired Research Skills Unmil Karadkar, Marlo Nordt Richard Furuta Cody Lee Christopher Quick.
Chapter 16 The World Wide Web Chapter Goals ( ) Compare and contrast the Internet and the World Wide Web Describe general Web processing.
PeopleFinder: Searching for People, not just for Documents Technologies for Knowledge Sharing ICT-Centre CSIRO Alistair McLean, Anne-Marie Vercoustre,
JENNIE MATHEWS ST. JOHN’S UNIVERSITY LIS 239 Can the Addition of Social Software Tools & Tags Improve the Productivity of an Academic Library OPAC? 1.
Master Thesis Defense Jan Fiedler 04/17/98
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
Evaluating Web Resources Hosted by Lee Anne Morris.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
Perception of Content, Structure, and Presentation Changes in Web-based Hypertext Luis Francisco-Revilla Frank M. Shipman III Richard Furuta Unmil Karadkar.
Seungwon Yang, Haeyong Chung, Chris North, and Edward A. Fox Virginia Tech, Blacksburg, VA USA 1ETD 2010, June 16-18, Austin, TX.
Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be.
1 CS430: Information Discovery Lecture 18 Usability 3.
Third Annual Meeting December 2002 Welcome  2002 Annual Meeting  Initial Library Launch.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Directions for Hypertext Research: Exploring the Design Space for Interactive Scholarly Communication John J. Leggett & Frank M. Shipman Department of.
Mar del Plata, Argentina, 31 Aug – 1 Sep 2009 ITU-T Kaleidoscope 2009 Innovations for Digital Inclusion José Simões Fraunhofer Institute FOKUS
Evaluation: Preliminary Results from the Server Side Frank A. Settle Elizabeth Blackmer Thomas Whaley The Alsos Digital Library for Nuclear Issues Washington.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
Research Academic Computer Technology Institute (RACTI) Patras Greece1 An Algorithmic Framework for Adaptive Web Content Christos Makris, Yannis Panagis,
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
Unified Relevance Feedback for Multi-Application User Interest Modeling Sampath Jayarathna PhD Candidate Computer Science & Engineering.
Recognizing Document Value from Reading and Organizing Activities in Document Triage Rajiv Badi, Soonil Bae, J. Michael Moore, Konstantinos Meintanis,
Information Retrieval in Practice
Visualizing User Activity History
Search Engine Architecture
Improving searches through community clustering of information
Connecting Interface Metaphors to Support Creation of Path-based Collections Unmil P. Karadkar, Andruid Kerne, Richard Furuta, Luis Francisco-Revilla,
Processes and Threads Processes and their scheduling
Chapter 11: File System Implementation
Chapter 11: File System Implementation
Chapter 11: File System Implementation
WRITING FOR THE WEB ® Copyright 2012 Adobe Systems Incorporated. All rights reserved.
Chapter 16 The World Wide Web.
digital libraries and human information behavior
Chapter 11: File System Implementation
ACM Digital Repository Classification Results
Presentation transcript:

Managing Change on the Web Luis Francisco-Revilla Frank M. Shipman Richard Furuta Unmil Karadkar Avital Arora Center for the Study of Digital Libraries Texas A&M University

What is this talk about? A system approach to help in managing digital libraries with collections of fluid resources with distributed location and ownership

Modern paradigms of digital libraries Pointers rather than the resources Web-based collections NSDL ( Meta-documents High fluidity Changes vary in relevance Little system aid for assessing relevance of changes

This is a problem everybody has: Bookmark lists Yahoo! catalogues Search engines indices

Related work David Johnson PhD Dissertation, University of Washington Document distance Weighted, asymmetric Change monitoring systems AIDE, URL Minder, WatzNew Fine-grained yes/no detection WebWatcher (evolving) “Interesting” Identification Syskill & Webert, Do-I-Care-Agent, Letizia Personal, reader specific, profile-based

Motivation Managing Walden’s Paths collection Paths are meta-documents Sequential arrangement of Web pages Rhetorically coherent Contextualized Distributed ownership Distributed authorship Continuous revision of the collection

Mechanisms for addressing the issue Caching the pages Caching strategies Some changes are desirable Fluid paths Ephemeral paths Rhetorical coherence

The real issue Mechanisms only allowed limited reaction to changes Detecting changes is easy but determining the relevance is difficult Humans are still required to determine the significance of changes In order to react to changes the assessment of their relevance is required

The perception of change (overview) Observe how humans perceive changes of Web pages Inform and evaluate the approach and design Questions 1. Do people view the same changes in a different way when given different amounts of time? 2. What kind of changes are easily perceived? 3. Of what kind of changes do users want to be notified?

Kinds of change Content changes (what) Presentation changes (how) Structural changes (linking) Behavioral changes

Results and implications Presentation changes were usually perceived as irrelevant The desire of notification and the perception of overall change increased as the degree of content change did Time played a larger role for the perception of structural changes than for the content changes As the degree of structural change increased, so did the desire of notification Links are useful metrics

Path Manager: the system Java based Paths or bookmark lists HTML pages Functional state of the document Original Valid Last-time

Algorithms Variation of Johnson Weighted sum of additions, deletions and modifications for each metric Added metric for structure changes Flexible Asymmetric Lack normalization Proportional Determines the proportion of modification for each metric Simple Symmetrical Normalized

Initial interface

Overall change relevance assessment

Document signatures Paragraph checksums Headlines Links Keywords Global checksum

View of change metrics

Detailed view of page metrics

Path information

Web page retrieval and connectivity Potentially slow and unpredictable Parallel retrieval Multi-threaded Multiple attempts and retries Different states Connection state Retrieval state Analysis state

Challenges and limitations Heuristic identification of document structure (I.e. headings) Indirection Behavior Dynamic pages

Conclusions Managing distributed collections of documents remains challenging and time consuming requiring the assistance of humans The Path Manager supports the maintenance of collection of Web pages by recognizing, evaluating and informing the user of relevant changes keeps track of the original, valid and last-time state of Web pages The study conducted indicated the desire for structural changes to be included in the determination of overall change

Contact information Luis Francisco-Revilla Frank M. Shipman, III Richard Furuta Unmil Karadkar Avital Arora