Presentation is loading. Please wait.

Presentation is loading. Please wait.

We go Way Back: Libraries & Community Web Archiving

Similar presentations


Presentation on theme: "We go Way Back: Libraries & Community Web Archiving"— Presentation transcript:

1 We go Way Back: Libraries & Community Web Archiving
Presenters: Jacquelyn Oshman - New Brunswick Free Public Library Diana Bowers-Smith - Brooklyn Public Library

2 Jacquelyn Oshman Senior Librarian at New Brunswick Free Public Library
Started in 2005 Specialize in Local History and Genealogy / Interlibrary Loan No prior experience with Web Archiving

3 Diana Bowers-Smith Archivist at Brooklyn Public Library since 2015
Primarily works with the Brooklyn Collection, BPL’s local history division No prior web archiving experience

4 What is Web Archiving? Web Archiving is the process of Evaluating, selecting, collecting, cataloging, providing access to, and preserving digital materials for researchers today and in the future. Source: Library of Congress

5 Why should we save websites?
How much room do you have for your vertical file or clipping collection? How old is the most recent article in those collections? How thin is your local daily newspaper getting? Do you find most of your local Reference answers on the computer? How many community churches/organizations/non-profits have a website but not a newsletter? Do you have a large foreign-language speaking population that you would like to reach out to? Next: How many people know what the current NJLA website looks like?

6 Next: here is what it looked like in August 2000, courtesy of the Wayback Machine

7 How many of you remember this version of the NJLA website
How many of you remember this version of the NJLA website? This was saved on August 16, This page alone has been save 288 times (up to December 2017). NEXT: This is another example of a site that should be saved for future research, which is only available on the WayBack Machine.

8 November 1, These are just some examples of how websites can change or be removed and why important community information should be saved for future reference and reseach.

9 Community Webs Grant 2 year IMLS and Internet Archive funded program to provide education, applied training, cohort support, and web archiving services for public librarians to develop expertise in web archiving. 10 Libraries were originally funded by IMLS and 17 more were added with Kahle/Austin Foundation and Internet Archive funds Together, the group will save 35 terabytes of web based community heritage materials for long-term access New Brunswick Free Public Library is the only New Jersey library in the cohort, but not the smallest!

10 Cohort Members Athens Regional Library System, GA;
Birmingham Public Library, AL; Brooklyn Public Library – Brooklyn Collection, NY; Buffalo & Erie County Public Library, NY; Cleveland Public Library, OH; Columbus Metropolitan Library, OH; County of Los Angeles Public Library, CA.; DC Public Library Washington, DC; Denver Public Library, CO; East Baton Rouge Parish Library, LA.; Forbes Library, MA.; Grand Rapids Public Library, MI.; Henderson District Public Libraries, NV.; Kansas City Public Library, MO.; Lawrence Public Library, KS.; Marshall Lyon County Library, MN.; New Brunswick Free Public Library, NJ.; Schomburg Center for Research in Black Culture (NYPL), NY.; Patagonia Library, AZ.; Pollard Memorial Library, MA.; Queens Library, NY.; San Diego Public Library, CA.; San Francisco Public Library, CA.; Sonoma County Public Library, CA.; The Urbana Free Library, IL.; West Hartford Public Library, CT.; Westborough Public Library, MA. (bold denotes “lead libraries”)

11 Challenges to Web Archiving

12 Terminology used on a daily basis
Seed - The starting point URL for a crawler and access point to archived collections. Crawl - A web archiving (or "capture") operation that is conducted by an automated agent, called a crawler, a robot, or a spider. Scope - What the crawler will capture and what it won’t - can be expanded or limited Robots.txt - Files that a site owner can add to their site to keep crawlers from accessing all or parts of it. Collection - A group of Seed URLs curated around a common theme, topic, or domain. Patch Crawl - A crawl to capture and patch in documents that may have been missing from your original crawl. One of the more overwhelming parts of starting a Web Archive is learning the new terminology. Before you can start saving websites to your archive, it is vital that you know what all the terms mean.

13 Getting Administration on board
Convincing your board and director that Web Archiving is worth using Should you create an entirely new policy or add to your Collection Development policy? Creating guidelines for what you will be saving and how it fits with your policy Teaching administrators the difference between digital born content and Getting good publicity so the community is aware of the project and onboard to suggest sites People may think - “Websites never go away!” or “We can always find something on the Internet” Creating a new policy is time consuming and will be under close scrutiny vs. adding one or two paragraphs to existing policies that might be passed quickly Will your collections fill in gaps in your physical collections? Will you be updating outdated collections? Will you be adding new collections based on local demand? Web Archiving means saving material “born” on the Internet, not s, items that are scanned and saved to a computer. (some newsletters are ed now and can’t be saved)

14 Permissions Include asking for permission in policy?
Do you think it is necessary to ask permission for open websites? Ethical Implications? Intellectual Property Rights? All this can be negated by a “Take Down Policy’ Favored by most Librarians in cohort Add the content, and when asked, be willing to remove it (at least publicly)

15 Learning to Crawl…. Be Patient
Very few will succeed on their first try, or maybe even their second Pay attention to details The results may be disappointing Get help from the experts Just like a baby learning to crawl, you can expect the same outcomes when learning web archiving!! (hopefully they laugh a little). Explain each step in regards to archiving

16 Problems we have encountered with crawls
Forgetting to “Test Crawl” first can result in incomplete pages and a waste of space Files are too large Test Crawls “Not in Archive” because of robots.txt, Scoping problems, time limits, not waiting the required 24 hours before trying to view results, Browsers showing incomplete pages Files too large - mention TapInto New Brunswick file that test crawled at 1GB and then ran again the following week at over 100 GB!!

17 This is NBFPL’s account page
This is NBFPL’s account page. You can see how much data we have used, how much is left, and names and sizes of the current collections. We have a total budget of 512 GB’s and most websites will use between 1 and 3 GB’s each crawl. The crawl will only save NEW information on the page, not what has already been saved in previous crawls.

18 This is the Patron’s view of the library account, which includes metadata. Clicking on one of the collections will bring you to a list of seeds within the collection, and then a calendar list showing the dates the website was saved (see next slide)

19

20

21

22

23

24 How do I get involved with Web Archiving?
“Archive-It is a subscription service of the Internet Archive. Subscriptions to AIT are paid annually and amounts vary depending on scope of collecting a new partner is looking to do. Our smallest account level is 3K/year, however we are somewhat flexible with pricing for smaller institutions who want to start WA programs so anyone interested should reach out!” - Maria Praetzellis Manually add URLs to the Wayback Machine Web Recorder Community Webs grant is currently a 2 year project, with subscription services for 5 years. It is not known yet if the grant will be offered again in the future. Webrecorder is a web archiving service anyone can use for free to save web pages. Making a capture is as easy as browsing a page like you normally would. Webrecorder automatically archives the page, along with any additional content triggered by interactions. This open-source project is brought to you by Rhizome at the New Museum. (Address the differences between these methods)

25 Thank you! Any Questions?


Download ppt "We go Way Back: Libraries & Community Web Archiving"

Similar presentations


Ads by Google