Harvesting and showing complicated sites using archive-it – status for some of our tests from October 2014 – January 2015 January 2015 By Tue Hejlskov Larsen, netarchive.dk
Archive-it (AIT) Setup january 2015 Heritrix snapshot Umbra - all seed URLs in AIT are crawled using Umbra and Heritrix. > Harvesting using”Only one page” from october 2014 to january Following help instructions here (and sorry, if i’m missing some of the instructions – AIT updates the instructions from time to time !): Used Wayback browser in proxy mode : Internet Explore 9
dumper-internationalt
dumper-internationalt They can harvest jsincludes with articles
AIT Videoplayer No comments Missing some images
With Video playback in place - only with Firefox in proxy mode
With tweets, images, video links
No Mouse down Paging
Tiny url’s ok e.g.
Using AIT free text search found posts/comments older than showed – have some locale problems…
With linked videos - not inplace
Images, Posts and some comments Posts to page in full view History (mouse down) No view comments No view of previous comments Using freetext search I found comments which could not be showed on the page
it.org/4897/ /
Images 2 times mouse down paging No proveniens topbar No full image No show more button
Posts and images With big images No notes
With video - not in place
ens-museum-for-kunst?projectId=art-project Images not inplace No zoom No streetview
Comparison of display capabilities between Archive-it Wayback and NAS Wayback in proxy mode (AIT/NAS)
Complicated sites – Some Test Examples iframes/js with articles video, comments, images, paging tweets, images, paging, video, short links post/comments, images, paging images, paging post/comments, images, videos, paging video for-kunst?projectId=art-project street view, image list and zoom for-kunst?projectId=art-project