The Management of a Website’s Historical Resources David Chao College of Business San Francisco State University.

The Management of a Website’s Historical Resources David Chao College of Business San Francisco State University

Introduction An organization’s websites change constantly to reflect the dynamic nature of its environment causing changes in website structure, contents and the supporting technologies.

Types of Change Website structure: –Causing web pages’ URL to change Website content: –Changes to web pages: Insertions, deletions, modifications –Changes to content databases Technology

What are a website’s historical resources? Outdated URLs Outdated web pages: –Web page snapshots –Content database snapshots –Deleted web pages Replaced technologies

The Objective of Managing Historical Resources The major objective of the management of historical resources is to satisfy users’ needs for historical information by enabling the website to recreate or retrieve web page snapshots. –Web page snapshot is the state of a web page at a specific point in time.

Factors Affecting The State Of A Web Page Content factors: –Web page code –The state of internal resources it references: Images, style sheet, components, script files, databases, etc. –The state of external resources it references: External resources are files not managed by the web site but can be referenced in creating the web site’s contents. Environment factors: –Web site host environment variables: System clock –Web technologies implemented on the server-side as well as on the client-side

Levels of Web Page Snapshot Level 1 snapshot: A web document snapshot is the state of web document code at snaptime. –Creating level 1 snapshot enables a web site to trace the changes to the web document code over time. Level 2 snapshot: A level 2 snapshot is a level 1 snapshot with the additional requirement that all the internal resources it references are at least level 1 snapshots at the same snaptime. –Referencing database snapshots Level 3 snapshot: A level 3 snapshot is a level 2 snapshot with the additional requirement that all the external resources it references are at least level 2 snapshots at the snaptime.

Enforcing Environment Factors Page (1) Plus 0: If both environment factors are not enforced. (2) Plus 1: If the host variables are reset to the snapshot time. (3) Plus 2: If web technologies are compatible with the technologies at the snapshot time. (4) Plus 3: If both factors are enforced.

Possible Levels of Snapshot States

Schemes for Tracking Changes Scheme for tracking website structure changes and web page code changes –A logging and archiving scheme Scheme for tracking content database changes.

Design of a Logging and Archiving Scheme for Tracking Website Changes The log, named TemporalURLLog, has five fields: URL, PublishDate, DocExpireDate, URLExpireDate, and NewURL. Those archived documents are saved in the Archive using URL + PublishDate as file name.

Impacts of Website Changes to Historical Links and Archive TimeWebsite ChangesCurrent Web Pages Historical Links Generated Snapshots in Archive T0T0 P1, P2, P3None T1T1 P1 renamed to P4 P5 is added P2, P3, P4, P5P1+ T 0 T2T2 P2 is deleted P3 is modified P3, P4, P5P2+ T 0, P3+ T 0 P2+ T 0, P3+ T 0 T3T3 P3, P4, P5 is modified P1, P6 are added P1, P3, P4, P5, P6P3+ T 2, P4+ T 1 P5+ T 1 P3+ T 2, P4+ T 1, P5+ T 1 T4T4 P3 is deleted P4 is renamed to P8 P5 is renamed to P7 A new page P3 is added P1, P3, P6, P7, P8P3+ T 3, P4+ T 3 P5+T 3 P3+ T 3

The contents of TemporalURLLog URLPublishDateDocExpireDateURLExpireDateNewURL P1T0T0 NullT1P4 P2T0T0 T2T2 T2Null P3T0T0 T2T2 Null P4T1T1 T3T3 Null P4T3NullT4P8 P5T1T1 T3T3 Null P3T2T2 T3T3 Null P3T3T3 T4T4 T4Null P5T3T3 NullT4P7 P6T3T3 Null P1T3T3 Null P8T4T4 Null P7T4T4 Null P3T4T4 Null

Examples of Using the Log Retrieve a snapshot of a current web page: Retrieve a deleted page: Retrieve the snapshot of a deleted web page: –The snapshot of P3 at T2 is in the Archive: P3+ T2. Retrieve the current web page of an out-dated URL: –An old URL P5 is now renamed to P7. If users submit a request for P5, it can be traced to P7. Retrieve the web page previously associated with a current link: –A historical link P1 is now renamed to P8, and a current link P1 points to a new web page. If the current web page associated with P1 is not what the users need, it can be redirected to P8. Determine if an invalid URL ever exists: – A URL P12 has never existed in the web site.

Tracking Changes to Content Databases A web page may use content databases: –(1) as a source for querying. –(2) as storage for contents of placeholders on a web page.

Database Snapshot Management Defining snapshots: CREATE SNAPSHOT snapshotname AS query AS OF snaptime Refreshing snapshots: REFRESH SNAPSHOT snapshotname AS OF new snaptime

Issues in Tracking Changes to Content Databases The content data databases may exist in many formats: –XML, delimited text files, Etc. –Not all content databases are supported by a snapshot management system. The website may not have the authority in the management of the content databases. A web page may retrieve data from many databases. There is no single way in designing content databases.

Tracking Content Database Changes Using Log – An Example Assuming: –One content database supports many web pages. –Each page contains many placeholders. Log design: –PageID + PlaceHolderID + Content + Update Flag + Time Stamp PageID is (URL + Page publish time)

Working with the TemporalLog Because a web page’s URL may change, the content database log needs the support of the TemporalURL log to track the changes of URL. Example:

Delivering Historical Resources to Users A website consists of: –(1) a current website where current web pages are published. –(2) a historical website where historical resources are stored and accessed. A typical web server serves requests for current web pages only and is inadequate to serve a request for historical information.

The Design of a Web Page Snapshot Management System

Summary We developed a scheme to track changes to website structure, web pages and files referenced by web pages, and a second scheme to track changes to content databases so that the website is capable of creating Level 2 snapshots.

The Management of a Website’s Historical Resources David Chao College of Business San Francisco State University.

Similar presentations

Presentation on theme: "The Management of a Website’s Historical Resources David Chao College of Business San Francisco State University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Management of a Website’s Historical Resources David Chao College of Business San Francisco State University.

Similar presentations

Presentation on theme: "The Management of a Website’s Historical Resources David Chao College of Business San Francisco State University."— Presentation transcript:

Similar presentations

About project

Feedback