Presentation is loading. Please wait.

Presentation is loading. Please wait.

VIAF and ISNI Synchronisation

Similar presentations


Presentation on theme: "VIAF and ISNI Synchronisation"— Presentation transcript:

1 VIAF and ISNI Synchronisation
VIAF Global Council - Lyon, France 15 August 2014 VIAF and ISNI Synchronisation Janifer Gatenby EMEA Program Manager Metadata

2 bridging-domains cross-domain
Text Rights Trade Sources Music Rights Archives and Museums Encyclopaedias Multiple domains. ISNI ingests data from these domain and makes links. Libraries Researchers & Professional Granting organisations Professional Societies Article databases Theses databases

3 ISNI Status at July 2014 8.01 million assigned ISNIs (was 1 million 2 years ago) 15.4 million links; ISNI as linked data ORCID Registration process is accessing ISNI New members: Harvard University, La Trobe University and COPYRUS (Russia) Linked Content Coalition names ISNI as # 1 strategy Databases Assigned Links Research 12 836,142 1,845,165 Text rights 7 129,816 692,580 Music 5 315,918 450,717 Libraries & trade 4 6.8 million 12,356,010 Organisations 3 446, 237 109,204 ì ORCID – is a self registering ID system for researchers. During the registration process researchers can access ISNI to import their ISNI and metadata. Harvard University Library are providing services for publishers for creating authors profiles and are including ISNI in the services and software. La Trobe University have uploaded data from its institution registry maintained by the library. COPYRUS is the Russian rights management society, a member of IFRRO. The Linked Content Coalition LCC Forum Members ​Associated Newspapers​ Axel Springer​ Coordination of European Picture Agencies Stock​, Press and Heritage (CEPIC) Common Rights Copyright Clearance Centre Copyright Licensing Agency Criteria Media Exchange Danish Producers Association DDEX Digimarc EditEUR MovieLabs Microgen EMI Music Publishing Europa Distribution European Magazine Media Association (EMMA) European Newspaper Publishers Association (ENPA) European Publishers Council (EPC) European Visual Artists (EVA)​ European Writers Council (EWC) ​​Federation of European Publishers (FEP) Organizações Globo​​ Gruppo Espresso Hachette Livre International DOI Foundation International Federation of the Phonographic Industry (IFPI) International Federation of Reproduction Rights Organisations (IFRRO) International Press Telecommunications Council (IPTC) International Publishers Association (IPA) IPR License ITV Journaux Francophones Belges Laurence Kaye Solicitors Microsoft News International Newspaper Licensing Agency (NLA) Pearson PLS Plus Coalition Reed Elsevier Rightscom RTL group International Association of Scientific, Technical & Medical Publishers (STM) Universitat de Lleida Unidad Editorial Vivere  Consulting

4 Current ISNI Sources 30…and growing
GENERAL SOURCES Bowker Books in Print BOWKER The European Library (48 national libraries) TEL Virtual International Authority File (33 libraries) VIAF MUSIC American Musicological Society AMS British Library Sound Archive BLSA International Performers’ Database Association IPDA MusicBrainz MUBZ RIGHTS MANAGEMENT Access Copyright, Canada ACCE Authors’ Licensing and Collecting Society, UK ALCS Centrum Dienstverlening Auteurs- en aanverwante Rechten, Netherlands CEDA Centro Español de Derechos Reprográficos CEDR Irish Copyright Licensing Agency ICLA Prolitteris, Switzerland PROL VG WORT, Germany VGWO RESEARCHERS AND PROFESSIONALS American Musicological Society AMS Authors Guild AGLD British Library Theses BRTH Digital Author identifier, Netherlands DAI Jisc Names Project, UK JNAM La Trobe University AU:VLU Modern Languages Association MLA OCLC Theses OCLCT ORCID and DataCite Interoperability Network ODIN AuthorClaim and RePec OPENL Proquest Theses PROQ Scholar Universe, Proquest SCHU Electronic tables of content ZETO These are the current ISNI sources with their codes that appear in the ISNI database. ORGANISATIONS American Chemical Society ACS Boekenbank, Belgium BOEK Bowker Publishers BOWP Publishers Licensing Society, UK PLS Ringgold RING

5 VIAF and ISNI are Complementary
VIAF Scope Persons Organisations Works / uniform titles Expressions Meetings Geographic All public data ISNI Scope Persons + musicians, researchers Organisations (excluding sparse) (excluding undifferentiated) Includes private data ISNI’s scope overlaps but is not identical to VIAF’s scope. For persons, ISNI includes all VIAF (except sparse and undifferentiated records) plus includes many persons involved with music and research not present in VIAF. Also, unlike VIAF, ISNI includes private data that may be used for matching but not displayed or diffused publically. Such data includes dates of birth (actors in particular do not like their dates of birth publicized because it limits the parts that they are offered). Rights management associations are also not permitted to reveal the relationships between real persons and pseudonyms. Witness the recent case of JK Rowling publishing crime novels under a pseudonym and being irked that her cover was revealed by her Lawyers.

6 VIAF and ISNI are Complementary
VIAF Role Ingest authority records from the world’s major national and research libraries Make clusters Expose and diffuse ISNI Role Create permanent IDs By batch On demand Diffuse those IDs Libraries, trade, rights management, professional societies, educational institutions ISNI’s role is different from VIAF’s. ISNI creates a permanent ID and is required to keep the ID as stable as possible, and where it changes must diffuse corrections. ISNI diffuses cross domain – libraries, trade, rights management, professional societies, education.

7 VIAF and ISNI are Complementary
VIAF System Harvester Clustering mechanism (re-clustered monthly) 5 web interface languages Download in multiple formats Linked data & SRU 1 million personal visitors p.a. ISNI System Batch load Online request API Web site (English only) Allows end user input Member input and correction 16+ indexes SRU; linked data Quality Team monitoring & correcting Diffusion, including corrections ISNI includes and online request and maintenance capability Improved data quality and confidence Anomaly reports – 7,000 date anomalies (>50% represent real errors) Merge, split and data error reports (c. 5,000) Matching improvements Dates, common surnames, longest name form, weightings, new elements Detection of UNIMARC Conversion errors parallel main names, name variant conversion, related names conversion, missed data Pseudonyms Feedback, record links (c. 70,000) More widely diffused linked data Proposal for inter-operation – joint notification, shared maintenance

8 Synchronisation ISNI to VIAF
2012 ISNI / VIAF identifiers 2013 Full records; ISNI a VIAF source 2014 ISNI records, verification mark In 2012, ISNIs were sent to VIAF. In 2013, the decision was taken to includes ISNI as a source in VIAF so ISNI started sending full records to VIAF for all assigned records that contained a VIAF code, including all restricted data from nonVIAF sources. In 2014, as well as records containing a VIAF code, records containing an ISNI code are now being sent to VIAF, including those created by the pseudonym programs or created manually by the ISNI Quality Team. The VIAF records that have been edited manually by the ISNI quality team contain a verification mark so that it can be used in the VIAF clustering process.

9 VIAF ingest into ISNI VIAF provides full file each month
ISNI compares previous & current files & creates separate files for processing Deletes (VIAF cluster ID in old but not new) If assigned or has other sources, source becomes ISNI Contents changed Sources added or deleted New (VIAF cluster ID in new but not old) Re-matches VIAF deletes VIAF cluster movement reports for BL and BnF This slide outlines how ISNI processes its monthly files from VIAF.

10 VIAF Global Council - Lyon, France 15 August 2014
Maintaining Clusters

11 Mixed identities Cluster Error Source Error Source 1 Source 2 Source 1
There are two types of error in a mixed identity. The clustering software can make an error by erroneously clustering records from two sources each representing a different person (with the same name). Or a single source record may have mixed identities by listing titles of works that belong to more than one identity (with the same name) Cluster Error Source Error

12 End User Note Dear Sir / Madam, The ISNI refers to "Marco Antonio Casanova", Professor at the Catholic University of Rio de Janeiro. I am not the author of "Fragmentos póstumos. - Nietzsche uma introdução filosófica" or "Segunda consideração intempestiva da utilidade e desvantagem da história para a vida". The author of these works is "Marco Antonio dos Santos Casa Nova". You may confirm this information by consulting our CVs at the Brazilian Research Council: Marco Antonio Casanova (me): Marco Antonio dos Santos Casa Nova (the other author): This is a typical input from an end user of the ISNI database. The requests are coming in on average 2-3 a day. The requests are almost all very high quality as per above and most (to our surprise) include an so that we respond with the action taken. ISNI also engages to notify all sources in case of a fixed error.

13 Correction – Source Error
Reply to End User Thank you for using the ISNI database and suggesting improvements to your record. There is now another ISNI record for Marco Antonio dos Santos Casa Nova (ISNI ). I have corrected your record, removed the erroneous titles and added a link to your online CV (Lattes database). If you have any further queries, please let me know. to Source I am part of the the ISNI Quality Team (experts from the British Library and Bibliothèque nationale de France in charge of the quality of the ISNI database). We perform manual checking and corrections in the ISNI database such as splits, merges/deduplications and data corrections. ISNI Quality team received a request from an enduser about ISNI records and   , VIAF and their related Authority record  XXX mixes 2 identities (see the snapshot below) : 1/ Marco Antonio Casanova (ISNI ) 2/ Nova, Marco Antonio dos Santos Casa (ISNI   ) Philosoph, and author of "Segunda consideração intempestiva da utilidade e desvantagem da história para a vida" I hope this information will be useful. End user requests are stored in non-displayable fields in the ISNI record. Each evening new fields generate alerts to the ISNI Quality Team. The QT then decides appropriate action, i.e. making links viewable, merging records, splitting records and generating notifications to all sources involved. In this slide, the QT has determined that a single VIAF source is causing a mixed identity and notifies the source by . The resulting split records are marked as having been verified manually. This becomes a signal to VIAF that they should be treated as special status records by the VIAF clustering program. = I Source 1 Source ISNI Source ISNI

14 Correction – Cluster Error
Source ISNI Source ISNI In the case above, the QT has determined that a split identity has been caused by a VIAF (or ISNI) cluster error. Two separate records are made with a verification mark. Sources are notified of the cluster change as appropriate. ISNI marks its two records as verified & sends to VIAF These records are given the same status as XA records in VIAF clustering. No two XA records may occur in the same cluster

15 End User Note It seems 2 ISNIs has been assigned to the French singer Laïka Fatien (born 1968 in Paris): ISNI and ISNI X. I think the last one can be deleted.

16 Correction – Merged duplicate
Reply to End User Thank you for using the ISNI database and providing us with information about the duplicate records for Laïka Fatien. There is now just one record on the ISNI database for this identity – ISNI: If you have any further queries, please let me know. Notification to VIAF via ISNI record ISNI record contains verification note (i.e. treat as XA) ISNI record contains 2 VIAF cluster identifiers = VIAF A VIAF B ISNI VIAF A VIAF B

17 ISNI Quality Team Samples data regularly
c. 2% VIAF clusters have mixed identities Duplicate clusters are higher, nearer 5% Makes corrections at cluster level Merges, splits, error notifications Access to cataloguing client / macros Makes system recommendations Gives approval for single source assignment Responds to End User input Sends s to sources for error correction (12 VIAF sources currently participating) The ISNI Quality Team plays an essential role in the life of the ISNI database. Not only does it respond to End User input, it proactively tests the database, looking for sets of records to re-process, making recommendations to improvements to the algorithms. So far the QT has been able to keep up with the input from End Users.

18 ISNI System Notification (Push process)
Someone else has matched & details When events occur on records in the ISNI database, all sources concerned are notified. The notification is in the form of a regular monthly XML report. Notification fields for matches (tells you someone else has matched your data) Recipient source code (028C $2) Source of incoming record Date/time of match Matching data string Matching data type (name and dates, name and title, partial name, date, title) Matching score Total evaluation score Date/time stamp of notification Notification fields for errors (you need to take action) Type of error: merge, duplicate, dataError or split Recipient source code Recipient local identifier Date/time stamp of field creation Data field contents Data field identifier, (e.g 021A = title) Should be Correct ISNI ExplanatoryText You probably need to take action

19 ISNI Assignment Agency
Matching, merging and splitting infrastructure Correction of errors Sampling and anomaly checks, e.g. date anomalies, unlikely mixture of sources Pseudonym splitting Re-importing and re-matching Diagnostic indexes and reports Enrichment e.g. Wikipedia, Dewey Notification system The ISNI assignment system also plays a vital role in maintaining the quality of the ISNI database. When errors are found if they can be fixed by program, they are. For example we found that one source was always giving a full date when only the year or month and year were known, such that there was an unusual peak in the index for e.g and etc. The matching algorithm was adjusted to mistrust such dates, the records were found and re-matched with the new algorithm. The matching algorithm is continually being refined.

20 VIAF ISNI Interoperability Task Force
Met in Paris April 2014 Representatives from Bibliothèque nationale de France Biblioteca Nacional de España British Library Deutsche Nationalbibliothek Sudoc OCLC (VIAF system) OCLC Leiden (ISNI Assignment Agency)

21 Recommendations to VIAF at OCLC
Use profession and other disambiguating data Investigate making an anomaly report Investigate changing the clustering rules to flag and prevent a record with a mixed identity from entering the clusters where 2 or more sources have established separate identity Investigate changing the clustering rules to prevent duplicate clusters. Provide deprecated VIAF Ids in the distributed data Treat records from ISNI that are flagged as manual as XA records Include ISNI in RDF Remove test from ISNI icon Only show one name form for ISNI in the wheel Investigate why SUDOC titles are not appearing Action has been taken on most of these recommendations

22 Recommendations to ISNI at OCLC
Flag manual merges and splits (joint specification to be made) Indicate to VIAF that a VIAF source needs to be split from a VIAF cluster (joint specification to be made) Keep up to date with VIAF Produce anomaly reports Produce notifications to VIAF sources [Provide only one ISNI record per VIAF cluster ID; make split off records ISNI source] [Provide records with ISNI source to VIAF] Action has been taken on most of these recommendations

23 Recommendations to VIAF Council
Mark undifferentiated authorities or consider not supplying them to VIAF Include nationality, particularly for own national identities Use VIAF in authority control and select VIAF cluster ID Also use ISNI If a mixed identity is found in VIAF or ISNI, use either the public interface or [preferably] the member interface of ISNI to request resolution by the ISNI Quality Team. All manual corrections made in ISNI will come to VIAF as records with XA status to ensure merges or splits.

24 Become Involved Jointly let’s maintain clusters
VIAF Global Council - Lyon, France 15 August 2014 Become Involved Jointly let’s maintain clusters It is important to fixed mixed identities and duplicates at the cluster level. A record containing a mxed identity can match into a good cluster and pollute that cluster, resulting in incorrect diffusion, especially of linked data. A mixed identity record in isolation potentially causes duplicate clusters. Duplicates in a source file threaten to create duplicate clusters in VIAF.

25 The ISNI Quality Team Board members are British Library and Bibliothèque nationale de France (Representing CENL) Seeking Associate Members KB, Netherlands in process Control own identities Access to client maintenance software Access to restricted data Provide back-up for end user responses The ISNI Quality Team currently receives alerts at the rate of 2 to 3 per day. End user alerts are stored in hidden fields from which the system generates an alert. The number is currently manageable but the QT wants to be ready to be able to scale up because the volume has been steadily increasing. Associate QT members would be primarily responsible for controlling the identities in their own sphere but on standby for peaks in end user requests generally.

26 ISNI Members View whole database (but not restricted fields)
Access to compare screen; can merge Reports on request ISNIs – simple report or enhanced Cluster movement report Diagnostic reports Statistics and links ISNI members have privileged access to the database but without the obligations of Associate QT member.

27 ISNI Database: Member view
Public view ISNI members have access to detailed indexes, both via the web interface and via the SRU API. Member view

28 Public view – only see assigned

29 Member view – list of additional data displayed (if not private)
Related identities Related persons Related organisations Nationality Gender Keyword or key phrase Dewey classification Publisher Dates active Associated countries Provisional records Including links to possible matches, if applicable

30 Private data Dates Personal Affiliations Titles of works
These can be masked from the public and from member view. However most sources allow titles to be seen by other members to facilitate merging.

31 Do not merge Anything that looks suspicious :
Report it in a general note and the QT will review This is not the same person This title belongs to Slide by Pauline Chougnet, BnF. This slide shows the compare screen available to ISNI members. Members are able to compare records to make merge decisions. A slide set has been made by the ISNI Quality Team giving merge guidelines.

32 ISNI Statistics Basic statistics Cross matches VIAF matches
Basic statistics: Provisional = records that have been loaded by batch and did not match where the name is also not unique. Members can enrich provisional record manually via the web interface to make them assigned. Suspect records are those that have been marked either manually or by anomaly detection as possibly having mixed identities. “Unique” records are assigned records with a single source; the name is unique to the database in its full and abbreviated forms. “Possible” are provisional records that the programs have marked as matching with another but with a score below the acceptable merge threshold. Members are able to use the compare screen and make manual merges. Cross matches indicate the numbers of records where your source and another source co-occur. Curious cases could point to mixed identities.

33 La Trobe University: 1,864 VIAF Links
This slide is just an example of the links generated by the load of La Trobe University to ISNI. This gives a new idea to the university of the impact of its researchers. Linked Data: isni.org/isni/

34 Janifer Gatenby EMEA Program Manager Metadata


Download ppt "VIAF and ISNI Synchronisation"

Similar presentations


Ads by Google