Presentation is loading. Please wait.

Presentation is loading. Please wait.

MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.

Similar presentations


Presentation on theme: "MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became."— Presentation transcript:

1 MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became MSU.  That was also our centennial year.  The entrance is unknown.

2 Web MSU Ed Busch March 14, 2014

3 Overview What We Did What We Learned What Are We Doing Now Suggestions

4 What We Did Our Goal: To “preserve and make accessible” MSU web sites of enduring historical and research value Almost every office and unit on campus has a web site with business information Content that isn’t preserved anywhere else Integral to mission of MSU This goal is what is driving our web archiving. Many of our campus publications are only on the web now as pdfs or html

5 What We Did Inventory of MSU related web sites (early 2011)
Top level domains = approx. 1,300 sites External domains = approx. 190 sites e.g. coachizzo.com or spartancash.com Trial ran “snapshots” of msu.edu using Archive-It Huge number of pages Example, there were over 3.6 million PDF files just within msu.edu at that time Numbers from “host master” at ATS Network Management Services (Doug Nelson) Probably more domains and pages now. Many units have started blogs using site such as wordpress. Highlighted vocabulary differences between archivists and IT professionals Many MSU affiliated sites outside msu.edu domain Much of the content on web sites is new; not available in print or other media formats Many sites have password protected content Many sites have dynamic content and updated frequently

6 What We Did Used list of known MSU websites from IT
Created 3 large collections and 2 smaller special collections Administration and Services; Colleges, Schools, Research Centers & Institutes; and Student Organizations and Groups Topical Events Web Sites; Decommissioned MSU Web Sites Added Landing Page to our web site Updated Retention Schedule to include web sites MSU Publications Created Web Site Collection Plan Added Metadata at collection level Identified crawl schedule Now have over 700 seeds assigned to collections. Because of subscription constraints, have to keep some inactive Our current retention schedule is online. A new retention schedule should be coming out in 2014 Draft of collection plan available online Always test crawls first

7

8 Archive-it.org

9 What We Learned Once you create a collection, you can’t split or combine easily What’s the best collection creation strategy- to lump or not? I’ve started splitting collections into smaller Collections by moving seeds Archive-It investigating adding a combine function Pluses and minuses to lump: leaning towards recommending smaller sized clumps What is useful metadata?

10 What We Learned Our New Collections
Michigan State University Libraries Collection MSU Administration and Services Collection MSU Alumni and Fan Sites Collection MSU Athletics Collection MSU Colleges, Schools, Research Centers & Institutes Collection MSU Employee Unions Collection MSU Related News Publications Collection MSU Social Media Collection MSU Sponsored Projects Collection MSU Student Organizations and Groups Collection MSU Topical Events and Subjects Web Sites Collection MSU Arts and Culture Collection

11 What We Learned Some sites are just difficult to crawl – recursive issues Using regular expressions and constraints – Archive-It staff very helpful Lots of test runs – takes time Creating useful metadata Archive-It provides 15 Dublin Core fields Collection – title, creator, subjects, description, publisher, contributor, type, format, source, relation, coverage, rights, collector, language Seed – title

12 What We Learned Web Archiving requires more staff time than expected
Websites are being created or modified every day New functionality often causes problems in next crawl Run Test crawls! Now have over 700 seeds assigned to collections. I have deactivated most of my scheduled runs so that I can do test crawls first. Maybe a feature wo9ld be to automatically do a test crawl x days before scheduled.

13 What Are We Doing Now Quality check Social Media sites
Historical Collection sites Can an old dog learn regular expressions? On-demand sync can be done by their staff Get help from units to point out problems To find on Worldcat, search by collection name and institution. Not sure how useful this will be for lumped collections

14 Suggestions Plan Start small Get the word out to site creators
What do you need to capture? How much time do you have? Can you afford Archive-It or need to use “free” tool? Start small Get the word out to site creators

15 Contact Ed Busch Electronic Records Archivist

16 Surveying class photo:  Taken in 1885, which was the beginning of the engineering course, the second major offered at the college.  The students are all juniors or seniors.  At this point in college history, there was no women’s course, so the women were taking the same courses as the men.


Download ppt "MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became."

Similar presentations


Ads by Google