Thomas Hickey Chief Scientist 2013 December 4 NISO/DCMI Webinar Cooperative Authority Control: Virtual International Authority File (VIAF)

Slides:



Advertisements
Similar presentations
The worlds libraries. Connected. VIAF & ISNI Virtual International Authority File (VIAF) & International Standard Name Identifier (ISNI) by Titia van der.
Advertisements

Bibliographic Framework Initiative Approach for MARC Data as Linked Data Sally McCallum Library of Congress.
Linked Data, Discovery and Discoverability John McCullough Senior Product Manager, OCLC December 3, 2014 UCL Discovery and Discoverability.
ISNI Overview The Management of Scholarly Identity Baltimore, April 4 th 2012 Beat Barblan Director, Identifier Services, Bowker.
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
VIAF for NAAC 2012 October Eric Childress OCLC Research.
The Virtual International Authority File Thomas Hickey ACIG 2009 July 12 ALA, Chicago IL.
RLG Programs Karen Smith-Yoshimura OCLC Research CEAL, Philadelphia 24 March 2010 Cooperative Identities Hub.
Authorities in a connected world Indiana Library Federation 2011 November 16 Thomas Hickey OCLC Chief Scientist.
The world’s libraries. Connected. VIAF and ISNI Interoperability Janifer Gatenby EMEA Program Manager Metadata OCLC VIAF Council Meeting Singapore
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
1 Cataloging for School Librarians — It Matters! Margaret Maurer Head, Catalog and Metadata Kent State University Libraries and Media Services 2006 ILF.
National libraries and identity in the Semantic Web Gordon Dunsire BNE, Madrid, 14 Dec 2011.
Leveraging Names with Linked Data Karen Smith-Yoshimura Ralph LeVan 2010 RLG Partnership Annual Meeting Chicago, IL 9 June 2010.
Is Cataloging Dead: Advocacy for Bibliographic Control Randy Roeder and Rebecca Routh ILA/ACRL Spring Conference Davenport, Iowa March 3, 2008.
Anila Angjeli 1 APARSEN - Interoperability of PI workshop, iPRES, Lisbon, 5 September 2013 VIAF and Member of the Board of directors of ISNI-IA.
Session 4B – User Experience (The Catalogue and You) New display models of bibliographic data and resources: cataloguing/resource description and search.
AGent 2.0 Cataloging AGCat –Replaces WindowsCat/FullCat UDMM Interactive authority control Subject heading translation Bibliographic resources Cataloging.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Matching names in parallel T. Hickey Access October.
Updated :02 Hong Kong University of Science & Technology Library XML Name Access Control Repository at the Hong Kong University of Science.
The world’s libraries. Connected. WorldShare platform & Management Services Integrate all of your collections: print, licensed & digital Chris Thewlis.
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
BEYOND THE OPAC: FUTURE DIRECTIONS FOR WEB-BASED CATALOGUES Martha M. Yee September 11, 2006 draft.
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
@LorcanD Lorcan Dempsey, OCLC 11 October 2013 ARL Fall Forum: Mobilizing the research enterprise #ARLforum13 SHARE : Discovery:Focus on papers.
Jenn Riley Metadata Librarian IU Digital Library Program New Developments in Cataloging.
OCLC Research OCLC Online Computer Library Center Research & New Technologies Interest Group 24 October 2005 DeweyBrowser & Curiouser Diane Vizine-Goetz.
Library needs and workflows Diane Boehr Head of Cataloging National Library of Medicine, NIH, DHHS
OCLC Research: Selected projects Eric Childress Larry Olszewski Presentation for Dpto. Biblioteconomía y Documentación Universidad Carlos III de Madrid.
VIAF Update T. Hickey, OCLC Chief Scientist Strasbourg
A Future for the Library Catalogue T. Hickey ACRL/DVC Bryn Mawr 3 November 2006.
RDA Toolkit is an integrated, browser-based, online product that allow user to interact with a collection of cataloging-related documents and resources.
The Future of Cataloging Codes and Systems: IME ICC, FRBR, and RDA by Dr. Barbara B. Tillett Chief, Cataloging Policy & Support Office Library of Congress.
FRBR information exchange Thomas Hickey & Jenny Toves OCLC Research.
1 Making Changes to Personal Name and Corporate Body Authority Records Module 7. Making Changes to Existing Name and Work/Expression Authority Records.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Cooperative Identities Hub Karen Smith-Yoshimura ALA Authority Control Interest Group, The Future Is Now: Global Authority Control ALA Annual, Chicago.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
Information for Scotland 816 Nov 2001 The potential of CORC Gordon Dunsire presented at Information for Scotland 8 16 November 2001, Edinburgh.
9/26/2007OCLC Orientation & Services1 What is OCLC?
The physical parts of a computer are called hardware.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
VIAF Update Thomas Hickey Chief Scientist OCLC Research Singapore, 2013.
Future of Cataloguing: how RDA positions us for the future for RDA Workshop June, 2010.
Renee Register Senior Product Manager OCLC Cataloging and Metadata Services Sandy Piver OCLC Publisher Services Consultant OCLC Services for the Publisher.
The Cataloging Department  Creates and maintains the libraries’ online catalog of both physical and virtual collections  Describes, classifies, and.
EuroCRIS strategic membership meeting Barcelona – 9-11 November 2015 Role of ISNI in research information management Titia van der Werf-Davelaar Senior.
Metadata Services for Publishers Bruce A. Miller Publisher Services Executive April 27, 2010.
Thomas Hickey Chief Scientist, OCLC Research 2015 August VIAF Council State of VIAF VI AF.
| Barbara Pfeifer | VIAF workshop Strasbourg | VIAF partners: Deutsche Nationalbibliothek (DNB) Barbara Pfeifer.
Jacquie Samples Duke University Libraries MARC Formats Interest Group January 8, 2011 Will RDA Mean the Death of MARC?
The ___ is a global network of computer networks Internet.
Catalogs, MARC and other metadata Kathryn Lybarger March 25, 2009.
ISNI and VIAF Transforming ways of trustfully consolidating identities Anila ANGJELI – Bibliothèque nationale de France & ISNI-IA ISNI
Web Services Overview Thomas Hickey. 2 What are Web Services? Machine-to-machine communication Run over standard Web protocols –XML syntax, HTTP packaging.
Respect My Authoritay! Mary S. Konkel, College of DuPage Illinois Library Association Conference 9/26/2008
Theory, Tools, History: A Brief Introduction August 17, 2016.
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Enhancing VIAF with WorldCat
Thomas Hickey Chief Scientist OCLC Research Singapore, 2013
Getting started With Linked Data.
A Future for the Library Catalogue
WorldCat: Broad Web visibility for our collection
Name authority control in an evolving landscape
Onboarding Webinar 13 April 2019 Presented by and.
FRBR and FRAD as Implemented in RDA
AUC’s Role In Facilitating Access To Knowledge In The Arab World
Presentation transcript:

Thomas Hickey Chief Scientist 2013 December 4 NISO/DCMI Webinar Cooperative Authority Control: Virtual International Authority File (VIAF)

Outline  Background and Philosophy  Visible VIAF  Challenges  New directions  Relationship with other identifiers  Coping with ambiguity 2

Why do we like authorities? 1. To enable a person to find a book of which either (A) the author (B) the title (C) the subject 2. To show what the library has (D) by a given author (E) on a given subject (F) in a given kind of literature 3. To assist in the choice of a book (G) as to its edition (bibliographically) (H) as to its character (literary or topical) is known. Charles A. Cutter: Rules for a printed dictionary catalog, 1876

What do authority files control? Names! – Persons – Corporations – Places – Uniform Titles – Families – Trademarks – Concepts

But we also control Collective authors Pseudonyms Imaginary characters Deities, saints, angels Whales, horses, dinosaurs Buildings Ships, telescopes, space ships, missiles Kings, Popes, Presidents Cities, lakes, mountains

A changing world Libraries – Local library – Library consortia – National cooperation – Within languages – Global Technology – Handwritten – Typed – Printed – Online – Pervasive EVERYBODY WANTS TO CHANGE THE WORLD BUT NOBODY WANTS TO CHANGE

A world of linked data

Challenges to libraries Reflect these links in our catalogs – RDA Link to external resources Have non-library resources link to us – Promote our links Be integrated in our users workflow

Library data is Trusted Understood Reasonably interoperable Complex Within the community, linked data of limited help

Shareable metadata Public Simple Supply data rather than APIs – Avoid idiosyncratic protocols Z39.50 MARC-21 ISO

Brief history of VIAF 12 VIAF Proof-of-concept project launched 1998 VIAF Consortium formed (Berlin) Library of Congress Die Deutsche Bibliothek OCLC Research 2011 After considering multiple options, consensus to transition VIAF to an OCLC service BnF joins VIAF becomes an OCLC service 2012 VIAF Council holds 1 st meeting (Helsinki) 4 Principals + 18 Contributors in 18 countries

VIAF’s Goals  Reduce cost of authority control  Increase the utility of library authority files  Provide links between equivalent names  Make the information Web friendly  Open API  Bulk downloads  Open Linked Data 13

Applications  FRBR matching  Better matching of non-English metadata  Uniform identifier across all languages  Authority control for cataloging  Better regionalization of catalogs  Minimize differences across languages of cataloging  More intelligent linking and searching

VIAF authority record counts 16

Web interface and usage 17

VIAF Use 21

Usage Browser usage for past year –953,020 visitors –1,531,493 –5,448,910 pages API usage –Went from 90% of usage to 98% – Peaks at ~20/second – ~ 5 million searches/week Downloads – ~150/week for links, 150 for clusters 22

23

24

Building VIAF 25

Enhancing authorities Bibliographic Record Derived Authority Record Processed Authority

Record Flow 37 million authority records 30 million links between authorities SWNL Bib & Authority BnF Bib & AuthorityLC Bib & Authority VIAF

Machine access to VIAF

Background  VIAF is available in bulk downloads  All online interaction with VIAF is RESTful Using SRU   -international-authority-file-viaf/using-api -international-authority-file-viaf/using-api

Bulk downloads  Go to  Variety of formats  Just links  RDF (XML and N-Triples)  MARC-21  Native XML clusters

SRU  Search/Retrieve via URLs   +all+dempsey&sortKeys=holdingscount +all+dempsey&sortKeys=holdingscount  +all+cervantes+and+local.sources+any+%22bn c+bne%22&sortKeys=holdingscount +all+cervantes+and+local.sources+any+%22bn c+bne%22&sortKeys=holdingscount

SRU Tricks  RSS feed pt=application/rss%2bxml  Exact with truncation %22cervantes*%22&sortKeys=holdingscount

URL Patterns       

New Directions for VIAF 35  Non-library sources  Information from WorldCat  Integration with WorldCat

VIAFbot – The Wikipedia Connection VIAFbot 25/  OCLC Wikipedian in residence Max Klein  Automatic comparison of VIAF and Wikipedia references  Initially English then German  Now working with WikiData

WikiData

38 WikiData

39 WikiData

40 WikiData

VIAF ↔ Wikidata Linking Benefits VIAF Enhancing Wikipedia language coverage 14,000+ New labels/aliases added

VIAF – in the Web of Bibliographic Data Worldcat.org/oclc/ The Hidden Face of Eve Nawal El Saadawi Nawal El Saadawi Nawal El Saadawi author sameAs The Sex customs about VIAF

Other non-library sources ISNI – International Standard Name Identifier Perseus Digital Library Syriac project names Fihirst Arabic names 43

Information from WorldCat 44

Multilingual Bibliographic Structure Project  Majority of WorldCat about non-English works  Much of the metadata is non-English  Hybrid records  Parallel records  FRBR work-level algorithm plus GLIMIR manifestation/expression level  Identify 3 levels of FRBR  Can’t we do something with these? 45

Approach Process at work-level when possible Extract most reliable information Use that to extract less reliable Find – Languages, original language – Translators – Titles (by language) 46

Benefits Localize metadata to various languages – Easier cataloging – Better cataloging Merge Fix – Better displays to fit the user Linking of translations Appropriate language Use all appropriate data! Better FRBR groupings 47

Records for VIAF Translated works – Work and expression records – More information about Languages Translators – Better links between work/expression records 48

Other possibilities Variant forms of names More titles Coauthors FAST subject headings 49

Identifier relationships 50

ISNI International Standard Name Identifier  Draft ISO standard: … aspires to provide a means to uniquely identify creators, including authors, composers, artists, cartographers and performers, among others. Such an authoritative identifier will serve to provide a link for occurrences of the identity across databases on the web  Driven by rights-holders  Publishers  Rights agencies representing authors, artists  Active disambiguation program

 Started with Thomson-Reuter’s Researcher ID  Most ‘social’  Claiming IDs  Interactive verification of associated works  Pulling together several current initiatives  Driven by STM, university communities  Primarily interested in researchers  Large number of participants  Mostly concerned with present and future names

Cooperation Challenges  What data can be shared?  How to fund the efforts?  Established by different types of institutions:  Libraries, Standards Organization, STM Publishers  Different  Technologies  Time scales  What does the name represent?  People, personas, organizations  Who is in charge?

Commonalities  All centered in not-for-profits  All interested in data exchange  All interested in global systems  All have an understanding of the problem  Personal author disambiguation and identification  Central to their operations

Coping with Ambiguity 1,520 headings found for smith, john

The problem  Two names in single source for same identity  Mixed identities  Different granularity  Pseudonyms  Presidents, Kings  Chains of matches  VIAF has ~ ½ million ambiguous groups

Goal 99+% sure of pair-wise assertions – Includes all pairs of records in resulting clusters

Another common issue 58

Harvest and ingest  Coping with – Duplicate identifiers – Deletes

Matching Authorities to Bibs  Sometimes identifier  Often ambiguity with just names  Multiple possibilities  May mix and identity

Cross references within sources  Strings can be ambiguous  Links not necessarily resolvable

Enhance the authority records Pull information from bibs, authority notes Cope with – Mistagged fields – Ambiguous dates – Errors in pulling titles, etc.

Pair-wise matching between sources Two dozen types of matches – Ranked by reliability/strength Major problems – Missing information – Mixed identities Can override the matching – xA

Duplicates within sources Rely primarily on – String similarity – Complexity of the preferred form Also look for multiple links from other sources Lonely names

Pulling together groups Only keep strongest links between records in different sources – A record in source A may match several records in source B – E.g. keep a double-date match over a coauthor match

Generate coherent clusters Look for cliques Merge subgraphs o Strength of the best link between the pair o Number of links between the pair o A metric based on  Strength of the match  Title closeness  Node type (corporate, personal, etc.)  Name closeness o Whether the nodes are personal names or not

Coherent clusters Avoid  Date conflicts  Incompatible names  Names that are cross references to each other  Names that differ only in a number

Assign VIAF IDs  Minimize moves of source records  Redirect unused VIAF IDs if possible

Create links between clusters Cross references Uniform titles Coauthors Other bibliographic titles In general, link only if not ambiguity

Lonely names 70

©2013 OCLC. This work is licensed under a Creative Commons Attribution 3.0 Unported License. Suggested attribution: “This work uses content from “Cooperative Authority Control: Virtual International Authority File (VIAF)” © OCLC, used under a Creative Commons Attribution license: Thank You! 71