My Website Was Lost, But Now It’s Found Frank McCown CS 110 – Intro to Computer Science April 23, 2007.

Slides:



Advertisements
Similar presentations
Introduction to Web Science Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial-
Advertisements

1 CS 502: Computing Methods for Digital Libraries Lecture 16 Web search engines.
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
C. Lee Giles David Reese Professor, College of Information Sciences and Technology Graduate Professor of Computer Science and Engineering Courtesy Professor.
An Introduction to DuraCloud Carissa Smith, Partner Specialist Michele Kimpton, Project Director Bill Branan, Lead Software Developer Andrew Woods, Lead.
Maria T. Sack Department of Educational Technology University of Hawaii Manoa Honolulu, HI, U.S.A April, 2013 Effective Google Search.
Lazy Preservation: Reconstructing Websites from the Web Infrastructure Frank McCown Advisor: Michael L. Nelson Old Dominion University Computer Science.
Search Engines and their Public Interfaces: Which APIs are the Most Synchronized? Frank McCown and Michael L. Nelson Department of Computer Science, Old.
Web Archiving Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
Introductions Search Engine Development COMP 475 Spring 2009 Dr. Frank McCown.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
Website Reconstruction using the Web Infrastructure Frank McCown Doctoral Consortium June.
Web Archiving Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Search Engine optimization.  Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a search engine's.
HT'061 Evaluation of Crawling Policies for a Web-Repository Crawler Frank McCown & Michael L. Nelson Old Dominion University Norfolk, Virginia, USA Odense,
TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013.
How Does a Search Engine Work? Part 1 Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial.
Scott Ainsworth, Ahmed AlSum, Hany SalahEldeen, Michele C. Weigle, Michael L. Nelson Old Dominion University, USA {sainswor, aalsum, hany, mweigle,
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
Visualizing Digital Collections at Archive-It Michele C. Weigle, Michael L. Nelson Web Sciences and Digital Libraries (WS-DL) Lab Department of Computer.
CS 110: Introduction to Computer Science Frequently asked questions about a CS major and CS career.
Dynamic Web File Format Transformations with Grace Daniel S. Swaney, Frank McCown, and Michael L. Nelson Old Dominion University Computer Science Department.
LibQUAL+™, Libraries, and Google™ CNI Spring 2005 Task Force Meeting Washington, DC 4/4/2005 Martha Kyrillidou Fred Heath Jonathan D. Sousa old.libqual.org.
Archive What I See Now Mat Kelly, Michael L. Nelson, Michele C. Weigle Old Dominion University Web Science and Digital.
Client-Side Preservation Techniques for ORE Aggregations Michael L. Nelson & Sudhir Koneru Old Dominion University, Norfolk VA OAI-ORE Specification Roll-Out.
Google, Bing, MSN, Yahoo! and many more!. How useful are search Engines? We discussed some of the techniques involved in the previous lesson. Search Engines.
Evaluation of the NSDL and Google for Obtaining Pedagogical Resources Frank McCown, Johan Bollen, and Michael L. Nelson Old Dominion University Computer.
Ph.D. Progress Report Frank McCown 4/14/05. Timeline Year 1 : Course work and Diagnostic Exam Year 2 : Course work and Candidacy Exam Year 3 : Write and.
Digital Literacy Concepts and basic vocabulary. Digital Literacy Knowledge, skills, and behaviors used in digital devices (computers, tablets, smartphones)
Lazy Preservation, Warrick, and the Web Infrastructure Frank McCown Old Dominion University Computer Science Department Norfolk, Virginia, USA JCDL 2007.
Factors Affecting Website Reconstruction from the Web Infrastructure Frank McCown, Norou Diawara, and Michael L. Nelson Old Dominion University Computer.
Archive Ingest and Handling Test: ODU’s Perspective Michael L. Nelson Department of Computer Science Old Dominion University
The Availability and Persistence of Web References in D-Lib Magazine Frank McCown, Sheffan Chan, Michael L. Nelson and Johan Bollen Old Dominion University.
Lazy Preservation: Reconstructing Websites by Crawling the Crawlers Frank McCown, Joan A. Smith, Michael L. Nelson, & Johan Bollen Old Dominion University.
Enterprise Archiving, Retention and Discovery System Jim Albert Deputy Director Department of Information Services April 19 th 2007.
C. Lee Giles David Reese Professor, College of Information Sciences and Technology Graduate Professor of Computer Science and Engineering Courtesy Professor.
Setting up a search engine KS 2 Search: appreciate how results are selected.
COMPOSITION I Library Resources: Frederic Murray Assistant Professor MLIS, University of British Columbia BA, Political Science, University of Iowa Instructional.
COMPOSITION I Evolution of Technology: Resources Frederic Murray Assistant Professor MLIS, University of British Columbia BA, Political Science, University.
Client-Side Preservation Techniques for ORE Aggregations Michael L. Nelson & Sudhir Koneru Old Dominion University, Norfolk VA OAI-ORE Specification Roll-Out.
Brass: A Queueing Manager for Warrick Frank McCown, Amine Benjelloun, and Michael L. Nelson Old Dominion University Computer Science Department Norfolk,
Introduction to Digital Libraries Week 15: Lazy Preservation Old Dominion University Department of Computer Science CS 751/851 Spring 2010 Michael L. Nelson.
1 Introduction to Digital Libraries Week 15: Web Infrastructure for Preservation Old Dominion University Department of Computer Science CS 751/851 Fall.
Web-Based Information Retrieval Week 1: Administrivia Old Dominion University Department of Computer Science CS 895 Spring 2013 Michael L. Nelson 01/15/13.
Introduction to Digital Libraries Week 13: Lazy Preservation Old Dominion University Department of Computer Science CS 751/851 Spring 2011 Michael L. Nelson.
CS 115: COMPUTING FOR THE SOCIO-TECHNO WEB FINDING INFORMATION WITH SEARCH ENGINES.
Can’t Find Your 404s? Santa Fe Complex March 13, 2009 Martin Klein, Frank McCown, Joan Smith, Michael L. Nelson Department of Computer Science Old Dominion.
Internet Searching How many Search Engines are there? What is a spider and how is it important to the Internet? What are the three main parts of a search.
Cloud-Computing Cloud Web-Blog Software Application Download Software.
Chapter Five Web Search Engines
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Lazy Preservation, Warrick, and the Web Infrastructure
Agreeing to Disagree: Search Engines and Their Public Interfaces
Introduction to Information Retrieval Week 1: Administrivia
Just-In-Time Recovery of Missing Web Pages
Introduction to Digital Libraries Assignment #3
News Event Detection Website Joe Acanfora, Briana Crabb, Jeff Morris
Characterization of Search Engine Caches
Web-Based Information Retrieval Week 1: Administrivia
Introduction to Digital Libraries Assignment #3
Introduction to Digital Libraries Assignment #3
Lecture 1a- Introduction
Web Programming Assignment 4 - Extra Credit
Introduction to Digital Libraries Assignment #3
Introduction to Digital Libraries Assignment #3
Introduction to Digital Libraries Assignment #1
Introduction to Digital Libraries Assignment #4
Web-Based Information Retrieval Week 2: Administrivia
Old Dominion University Computer Science IIPC New Member
Presentation transcript:

My Website Was Lost, But Now It’s Found Frank McCown CS 110 – Intro to Computer Science April 23, 2007

A Little About Me – Smoky Hill High School – B.S. in CS from Harding University – Software Engineer at Lockheed Martin – Instructor of CS at Harding Univ – M.S. in CS from Univ of Arkansas at Little Rock 2004-present – CS Ph.D. student working with Michael Nelson Fall 2007 – Assoc. Professor of CS at Harding Univ.

Frank McCown Education Ph.D. in Computer Science – Old Dominion Univ. (2007 expected) M.S. in Computer Science – Univ of Arkansas in Little Rock (2002) B.S. in Computer Science – Harding University (1996) Work Experience – Instructor of CS at Harding University (Searcy, AR) – Software Eng for Lockheed Martin (Denver, CO) 1995 – Software Engineer Intern for Auto-trol (Denver, CO) Honors 2007 – Outstanding Graduate Research Assistant 2006 – College of Sciences Dissertation Fellowship 2005 – Outstanding Graduate Assistant 2004 – Dominion Scholar

Industry vs. Academia No preference Academia Industry 2000 survey by The Scientist magazine asked their readers: Overall which environment do you prefer? 73% of survey respondents had held research positions in industry and academia.

Industry vs. Academia Movement Academia  Industry is common Industry  Academia very uncommon Flexibility Schedule Focus Compensation

Research Interests Digital preservation Will we be able to see our websites 20 years from now? Web crawling How can search engines and web archives duplicate/ download our websites more efficiently and effectively? Search engines How much/what content do commercial search engines index and cache? How synchronized are search engines APIs with what the general user sees?

Black hat: Virus image: Hard drive:

How much of the Web is indexed? Estimates from “The Indexable Web is More than 11.5 billion pages” by Gulli and Signorini (WWW’05)

Web Infrastructure

Cached Image

First developed in fall of 2005 Available for download at www2006.org – first lost website reconstructed (Nov 2005) www2006.org DCkickball.org – first website someone else reconstructed without our help (late Jan 2006) DCkickball.org – first website we reconstructed for someone else (mid Mar 2006) Internet Archive officially endorses Warrick (mid Mar 2006)

Warrick-related Publications Frank McCown, Norou Diawara, and Michael L. Nelson. Factors Affecting Website Reconstruction from the Web Infrastructure. JCDL June Vancouver, British Columbia, Canada.Factors Affecting Website Reconstruction from the Web Infrastructure Catherine C. Marshall, Frank McCown, and Michael L. Nelson. Evaluating Personal Archiving Strategies for Internet-based Information. IS&T Archiving May Arlington, Virginia. Evaluating Personal Archiving Strategies for Internet-based Information Frank McCown and Michael L. Nelson. Characterization of Search Engine Caches. IS&T Archiving May Arlington, Virginia, USA.Characterization of Search Engine Caches Frank McCown, Joan A. Smith, Michael L. Nelson, and Johan Bollen. Lazy Preservation: Reconstructing Websites by Crawling the Crawlers. WIDM November Arlington, Virginia.Lazy Preservation: Reconstructing Websites by Crawling the Crawlers Frank McCown and Michael L. Nelson. Evaluation of Crawling Policies for a Web-Repository Crawler. HYPERTEXT August Odense, Denmark.Evaluation of Crawling Policies for a Web-Repository Crawler

Search Engine APIs Frank McCown and Michael L. Nelson. Poster: Search Engines and Their Public Interfaces: Which APIs are the Most Synchronized? WWW 2007Search Engines and Their Public Interfaces: Which APIs are the Most Synchronized? Frank McCown and Michael L. Nelson. Agreeing to Disagree: Search Engines and their Public Interfaces. JCDL 2007Agreeing to Disagree: Search Engines and their Public Interfaces

Thank You Questions?