H UMAN R IGHTS W EB A RCHIVE P ORTAL – T ECHNICAL S UMMARY Columbia University Libraries.

Slides:



Advertisements
Similar presentations
Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
Advertisements

 Permanent Staff Analyst / Programmers (2.5) Digital Projects Librarian (1) Special Collections Analyst (1) Web Designer / Developer (.5) Director Grant.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
BnF projects and priorities On the collection side – Perform broad and focused crawls with a maximum of 100TB – Set up the legal deposit of ebooks.
A partnership of Truman Presidential Museum & Library, Truman Institute, and the MU Design Team at CTIE Project Whistlestop.
PREMIS Implementation for the Carolina Digital Repository Andrew Hart Head, Preservation Department University Library University of North Carolina at.
Archivematica-Islandora Integration Module Evelyn McLellan
XHTML Presenters : Jarkko Lunnas Sakari Laaksonen.
Renaud Comte [MVP]
Developing PANDORA Mark Corbould Director, IT Business Systems.
Archive-It Architecture Introduction April 18, 2006 Dan Avery Internet Archive 1.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
Progress in Access Technologies: NLM Video Search Jennifer Marill Chief, Technical Services Division Edward Luczak Systems Architect, Office of Computer.
NDR (resource references, metadata, collection data, etc.) NCS (& DDS) Expert Voices wiki.nsdl.org Harvest Manager OAI-PMH service (proai) NDR Search NCS.
Digital Repository Service (DRS) Harvard University Library OIS presented by: Wendy Gogel & Andrea Goethals.
1 Archive-It Training University of Maryland July 12, 2007.
Searching for Search Solutions Harvard IT Summit June 23, 2011
Digital Library Architecture and Technology
Utdanning.no (translated: A governmental service-oriented repository strategy. Trond Håvard Hanssen Project manager.
January 4, 2012 Christine White, Esri. A long time ago in a galaxy far, far away....
Using Hydra/Fedora for digital repository infrastructure 5. September 2013 Andreas Borchsenius Westh The Royal Library, Copenhagen.
Columbia Digital Preservation Planning & Implementation Status Report, August 2010.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
Digital Library Collections (DLC) Website A platform for integrated access to CUL/IS specialized, digital collections September 2014 Status Report.
IIPC GA Curator Tools Fair May 2014 WEB CURATOR TOOL Nicola Bingham Web Archivist.
Student Portal Specialized Training Series Presents:
PROJECT HYDRA SNEAK PEAK – ADVANCE SHOWING Brought to you by the Digital Repository Task Force Steve Marine (chair), Ted Baldwin, Dan Gottlieb, Kevin Grace,
Access Across Time: How the NAA Preserves Digital Records Andrew Wilson Assistant Director, Preservation.
File Name Extensions Computer Applications 7th grade.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
The DiVA System: Current Status and Ongoing Development Uwe Klosa Electronic Publishing Centre, Uppsala University, Sweden Eva Müller.
LOGO 2 nd Project Design for Library Programs Supervised By Dr: Mohammed Mikii.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
1 Library Assembly: Presenting the Confluence Wiki as an Alternative for Document Management.
PWG D OCUMENT L IBRARY PWG Meeting March 24, 2010.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Restricted Search Engine Laurent Balat Christophe Decis Thomas Forey Sebastien Leclercq ESSI2 Project Supervisor: Johny BOND June 2002.
The Seaside Research Portal: A Best of Breed Approach to Digital Exhibits and Collection Management Rick Johnson, Head of Digital Library Services University.
CDRS.COLUMBIA.EDU What is Academic Commons? Sarah Holsted Digital Repository Coordinator, CDRS 20 March 2009 CUL/IS Digital Library Seminar.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
Documenting Internet2 an IT perspective Eric Celeste University of Minnesota (Twin Cities) Libraries for the Coalition for Networked Information 6 December.
MOODy :) Investigations into Massive Open Online Discovery at IU Juliet Hardesty Courtney Greene McDonald Bryan J Brown
Metadata Extraction & Web Archives: Automating the Record Creation Process Abbie Grotke / Gina Jones /
VIRGINIA TECH BLACKSBURG CS 4624 MUSTAFA ALY & GASPER GULOTTA CLIENT: MOHAMED MAGDY IDEAL Pages.
SimDB Implementation & Browser IVOA InterOp 2008 Meeting, Theory Session 1. Baltimore, 26/10/2008 Laurent Bourgès This work makes use of EURO-VO software,
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
DSpace System Architecture 11 July 2002 DSpace System Architecture.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
Web Page Creation Standard Grade Computing. WWW n The World Wide Web is a collection of information held in multimedia form on the Internet. n This information.
Technical details of Finnish National Archives’ web services Digital formats used: PDF, TIFF Programming –VAKKA (main archive database) is outsourced –other.
ELISQ Systems Demonstration Sagnik Ray Choudhury Doha -- May 2015.
CERN Document Server 19 tth January 2006 CERN Document Server Jean-Yves Le Meur 19 th January 2006.
Breeda Herlihy, IR Manager, UCC Library. UCC selected DSpace in 2008 Software selection group Staff from Library IT, Computer Centre, Special Collections,
A strategic view of document and digital object management for the University of the Witwatersrand, Johannesburg Prof Derek W. Keats Deputy Vice Chancellor.
DArcMail Demonstration D igital Arc hive e Mail System Riccardo Smithsonian Institution Archiving.
Avalon's Role in the Digital Collections Ecosystem
Building Search Systems for Digital Library Collections
Archiving and Delivery of Student Portfolios
Virginia Tech Blacksburg CS 4624
Building an Observation Data Layer
Alison Valk Georgia Tech
CS6604 Digital Libraries IDEAL Webpages Presented by
Experiences of the Digital Repository of Ireland
DIGITAL LIBRARY.
CS6604 Digital Libraries IDEAL Webpages Presented by
Catherine Foley Director of Digital Archive and Library Projects MATRIX, Center for Digital Humanities and Social Sciences at MSU Mid-Michigan Digital.
NLM Digital Repository The Search for a New Book viewer
Presentation transcript:

H UMAN R IGHTS W EB A RCHIVE P ORTAL – T ECHNICAL S UMMARY Columbia University Libraries

HRWA S TATISTICS, THROUGH J ULY 31, 2012 ca. 500 web sites 26 million pages / documents HTML pages = 24.5 million Document files (e.g., doc) =.5 million PDFs =. 5 million XML = 100,000 Presentations (e.g., ppt) = ca. 1,800 Spreadsheets (e.g., xls) = ca. 700 ca. 65 languages

HRWA R ELEVANT T ECH T ERMS Archive-It – IA’s web archiving service SOLR (Lucene) – indexing tool Blacklight – Discovery Interface for SOLR MySQL – used as an intermediate index db WARC (Web Archive Format) – web storage Fedora – Columbia’s preservation repository

HRWA C HALLENGES Most challenging and innovative LDPD project to date. Most data in single project (ca. 2 TB) Largest indexes Greatest number of servers for indexing / production Most complex data (WARC / Web) Most challenging end-user design requirements Most uncharted in terms of users, possible uses, possible value added features, scoping, etc. Most cutting edge, most unanswered tech questions

HRWA M ORE I NFORMATION CUL/IS Behind the Scenes page CUL/IS Mellon Web Resources Wiki Archive-It: Columbia’s Web Archive Collections Columbia’s Human Rights Web Archive Portal