Márton Németh – László Drótos How to catalogue a web archive?

Slides:



Advertisements
Similar presentations
Theo van Veen, Koninklijke Bibliotheek The European Library: opportunities for new services.
Advertisements

UKOLN is supported by: Using the RSLP schema Ann Chapman Collection Description Focus A centre of expertise in digital information management
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
1 NODC, Russia GISC & DCPC developers meeting Langen, 29 – 31 March E2EDM technology implementation for WIS GISC development S. Sukhonosov, S. Belov.
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
Metadata for Digital Content Jane Mandelbaum, Ann Della Porta, Rebecca Guenther.
1 Minerva The Web Preservation Project. 2 Team Members Library of Congress Roger Adkins Cassy Ammen Allene Hayes Melissa Levine Diane Kresh Jane Mandelbaum.
Dublin Core as a tool for interoperability Common presentation of data from archives, libraries and museums DC October 2006 Leif Andresen Danish.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
1 Archive-It Training University of Maryland July 12, 2007.
OCLC Online Computer Library Center OCLC’s Digital Archive – Disseminating with METS Jay Goodkin Software Engineer Digital Collection and Preservation.
Bibliography in the Digital Age - IFLA Satellite Meeting Warsaw, 9 August Online materials published in Austria collecting, archiving and metadata.
World Bank, Africa Region, Africa Household Survey Databank - The World Bank - Africa.
4th project meeting 27-29/05/2013, Budapest, Hungary FP 7-INFRASTRUCTURES programme agINFRA agINFRA A data infrastructure for agriculture.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
IIPC GA Curator Tools Fair May 2014 WEB CURATOR TOOL Nicola Bingham Web Archivist.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
ERIKA Eesti Ressursid Internetis Kataloogimine ja Arhiveerimine Estonian Resources in Internet, Indexing and Archiving.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Digital Archiving in the Hungarian Széchényi Library The story and the plans of the Hungarian Electronic Library Rome, 21. Oct István Moldován OSZK,
PAN-European Exploitation of the Results of the Libraries Programme - EXPLOIT German Libraries Institute Berlin EXPLOIT 1 Electronic Access, Document Ordering.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Introduction to metadata
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
GPO’s Future Digital System (FDsys) November 2, 2006 LS&CM CENDI Presentation.
1 NetarchiveSuite Workshop Paris November , 2011.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
PAN-European Exploitation of the Results of the Libraries Programme - EXPLOIT German Libraries Institute Berlin EXPLOIT 1 Internal Services.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
1 Metadata: an overview Alan Hopkinson ILRS Middlesex University.
Data Management and Archival Storage Bojana Tasić FORS SEEDS Workshop I Belgrade, October.
CRAI Library Catalog of University of Barcelona
1 XML and XML in DLESE Katy Ginger November 2003.
7th Annual Hong Kong Innovative Users Group Meeting
CONTENT MANAGEMENT SYSTEM CSIR-NISCAIR, New Delhi
Who saves the memories for the Future? Libraries in the 21st century
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
ETHESIS Arbicon Visit, June 7, 2007
Building Search Systems for Digital Library Collections
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
A step-by-step guide to DOI registration
INSPIRE Geoportal Thematic Views Application
The Re3gistry software and the INSPIRE Registry
OAI and Metadata Harvesting
The New Face of Information Retrieval: The Ankara University Open Access Platform Prof. Dr. Sekine Karakaş Prof. Dr. Doğan.
Sharing of Eurostat predefined tables
Márton Németh – László Drótos (National Széchényi Library, Hungary)
DDP/DAP Design and Technology Overview
Sharing of Eurostat predefined tables
Health On-Line Patient Education Web Site
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
Updates on the XSLT stylesheets for DDI
Malte Dreyer – Matthias Razum
Web archives as a research subject
Metadata supported full-text search in a web archive
Presentation transcript:

Márton Németh – László Drótos How to catalogue a web archive? Some solutions for metadata management at the web harvesting pilot project of National Széchényi Library, Hungary INFINT 2018, Bratislava, October 23, 2018

questions: what is the subject of the description? what kind of metadata is needed? in what format? how this data can be produced? what can this data be used for?

Source: webverse.org

granularity the living web is one single document (a huge, ever-changing, unlimited hypermedia) the archived web is a versioned (time-stamped) file depository collection method: selective / event-based / domain-wide harvest, automatic submit, deposit levels of description: collection, sub-collection, website, website unit, document, file user needs, scale of archiving, available staff

metadata types (website level) bibliographic: e.g. title (lots of variations), creator/contributor/publisher (uncertain roles), rights (unclear legal status), dates (what kind of dates?), subject/type (very mixed content) ... administrative: e.g. curator, nominator, urgency, permission request, harvesting schedule, quality assurance, access ... technical: original CMS, harvester software, harvest parameters, size of the downloaded content, storage, long-term preservation ...

recommendations ISO/TR 14873:2013 – Statistics and quality issues for web archiving (collection level indicators) Descriptive Metadata for Web Archiving / OCLC Web Archiving Metadata Working Group (mostly site-level bibliographic data fields, based on the Dublin Core schema) Metadata Application Profile for Description of Websites with Archived Versions / New York Art Resources Consortium (site-level, MARC/RDA)

database plan for the Hungarian webarchive (website level)

our metadata records a small publicly available demo collection XSD (XML Schema Definition) and XSLT (Extensible Stylesheet Language Transformations) files predefined lists (e.g. genre, type, topic, subtopic, change frequency, harvest frequency, quality level) namespace links (person and geographic names) related sites (on the living web and in the archive) site-level and subcollection-level XML records manual data entry with XML Notepad (temporarily)

the mia.xsd file (website level)

a snippet from the public demo collection webpage icons: archived | screenshot | linkgraph | Internet Archive | original | metadata

metadata of the Óbuda Museum’s blog (original XML and converted HTML format)

metadata of the demo collection (original XML and converted HTML format)

future plans database and form-based data entry interface (as part of the new nation-wide library system) cooperation with other memory institutions (e.g. shared cataloging) automatic and semi-automatic metadata generation (mostly technical and administrative data) automatic entity identification and extraction from the full text (e.g. names, events, concepts) enriching metadata from external sources (e.g. DBpedia) incorporate metadata of important archived websites into the national bibliography faceted full text hit lists by metadata

filtering options by topic, subtopic, genre and type (in-house developed Solr-based full text search engine)

thank you for your attention! project homepage: http://mekosztaly.oszk.hu/mia/ project description in English: http://netpreserve.org/about-us/members/orszagos-szechenyi-konyvtar/ demo web archive: http://mekosztaly.oszk.hu/mia/demo/ “404 not found” workshop (Budapest, November 15, 2018): http://mekosztaly.oszk.hu/mia/404_workshop.html contact e-mail address: mia@mek.oszk.hu