InterParty 1 InterParty: Common Metadata and Linking Public Identities A presentation to the final InterParty Seminar The Hague 13 June 2003 Robina Clayphan.

Slides:



Advertisements
Similar presentations
Agents and Authority Linking Breakout sessions Oct 4 2-4, Oct 5 3-4:30 Goals: Explore issues in Agent discovery. Do we need an Agents working group? If.
Advertisements

Configuration management
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Persistent identifiers – an Overview Juha Hakala The National Library of Finland
InterParty Functional Requirements A presentation to the final InterParty Seminar The Hague 13 June 2003 David Martin.
Database Management System Module 3:. Complex Constraints In this we specify complex integrity constraints included in SQL. It relates to integrity constraints.
Agents and Authority Linking Breakout sessions Oct 4 2-4, Oct 5 3-4:30 Goals: Explore issues in Agent discovery. Do we need an Agents working group? If.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
8/28/97Information Organization and Retrieval Files and Databases University of California, Berkeley School of Information Management and Systems SIMS.
Firat Batmaz, Chris Hinde Computer Science Loughborough University A Diagram Drawing Tool For Semi–Automatic Assessment Of Conceptual Database Diagrams.
Credential Provider Operational Practices Statement CAMP Shibboleth June 29, 2004 David Wasley.
InterParty 1 An Introduction to the InterParty Project A framework for the interoperable, unique identification of parties in e-commerce A presentation.
FRAD: Functional Requirements for Authority Data.
Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials Arwen Hutt, University of Tennessee.
Tommie Curtis SAIC January 17, 2000 Open Forum on Metadata Registries Santa Fe, NM SDC JE-2023.
Introduction to Databases Trisha Cummings. What is a database? A database is a tool for collecting and organizing information. Databases can store information.
3 & 4 1 Chapters 3 and 4 Drawing ERDs October 16, 2006 Week 3.
CSC Intro. to Computing Lecture 10: Databases.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
Working with XML Schemas ©NIITeXtensible Markup Language/Lesson 3/Slide 1 of 36 Objectives In this lesson, you will learn to: * Declare attributes in an.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
IMT530- Organization of Information Resources1 Feedback Lectures –More practical examples –Like guest lecturers –Generally helpful in understanding concepts.
Introduction to Active Directory
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
HTBN Batches These slides are intended as a starting point for further discussion of how eTime might be extended to allow easier processing of HTBN data.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
1 Database Design Sections 6 & 7 First Normal Form (1NF), Second Normal Form (2NF), Unique Identifiers (UID), Third Normal Form (3NF), Arcs, Hierarchies.
University of Colorado at Denver and Health Sciences Center Department of Preventive Medicine and Biometrics Contact:
Data Modeling Using the Entity- Relationship (ER) Model
COP Introduction to Database Structures
Comp 1100 Entity-Relationship (ER) Model
Databases (CS507) CHAPTER 7.
Logical Database Design and the Rational Model
Visual Basic 2010 How to Program
Software Engineering Lecture 4 System Modeling The Analysis Stage.
Entity Relationship (E-R) Modeling
Data and Applications Security Developments and Directions
Facet5 Audition Module Facilitator Date Year.
XML QUESTIONS AND ANSWERS
DSS & Warehousing Systems
Configuration Management and Prince2
A Metadata System for Geomagnetism
Distribution and components
THE STEPS TO MANAGE THE GRID
Lecture 2 The Relational Model
System Modeling Chapter 4
What is a Database and Why Use One?
Proposal for a Regulation on medical devices and Proposal for a Regulation on in vitro diagnostic medical devices Key Provisions and GIRP Assessment.
FRAD: Functional Requirements for Authority Data
AS LEVEL Paper One – Section A / B
Name authority control in an evolving landscape
This presentation has been prepared by Vault Intelligence Limited (“Vault") and is intended for off line demonstration, presentation and educational purposes.
Chapter 5 Advanced Data Modeling
Metadata in Digital Preservation: Setting the Scene
CTI STIX SC Monthly Meeting
Module P4 Identify Data Products and Views So Their Requirements and Attributes Can Be Controlled Learning Objectives: Understand the value of data. Understand.
A LEVEL Paper Three– Section A
Moving forward with assurance
Data and Applications Security Developments and Directions
Relational Database Design
CHAPTER 2 - Database Requirements and ER Modeling
Configuration management
Attributes and Values Describing Entities.
IT 244 Database Management System
Data and Applications Security Developments and Directions
Chapter 4 System Modeling.
Entity-Relationship Modelling
3 EGR Identification Service
Presentation transcript:

InterParty 1 InterParty: Common Metadata and Linking Public Identities A presentation to the final InterParty Seminar The Hague 13 June 2003 Robina Clayphan The British Library

InterParty 2 Outline Common Metadata for Public Identities - background and issues Proposed Common Metadata set InterParty Links Proposed Link Record

InterParty 3 Common functionality and metadata InterParty will be a network of InterParty members (IPMs) who have databases containing party metadata All potential members share a common need for accurate metadata to support the identification of parties All member databases have identification of parties as a common function Sharing access to party metadata between databases can substantially reduce costs of data creation and improve data quality Creating links will add value to the metadata held in separate systems

InterParty 4 We need sufficient metadata to allow “disambiguation” between parties with shared or similar attributes What common metadata is required? Is not the same person as Is known as John Williams … because different people use the same name...

InterParty 5 … and the same person uses different names We need sufficient metadata to allow “collocation” of the same party with different attributes Is also known as John Williams … in different contexts –eg Iain Banks and Iain M Banks … to hide their identity –use of pseudonyms … sometimes it’s simply a matter of language –Mao Tse Tung or Mao Zedung? Is also known as a member of the group called “Sky” What common metadata is required?

InterParty 6 What common metadata is required? How much is sufficient? The answer is contextual –Metadata about people (as about anything else) is essentially unbounded –A unique identifier may be enough (if you trust it’s source) The same person? For InterParty - we need enough metadata to make a comparison between parties in different databases in order to make a decision

InterParty 7 The InterParty approach InterParty “Common Metadata” is a subset of what may be publicly known Person Sensitive Not publicly known Publicly known “Personal” metadata Public Identity InterParty is concerned with “public identities” not “persons”

InterParty 8 The nature of Public Identities Person Public Identity One person usually has only one public identity Public Identity But some people have more than one, with different attributes For example, may write under a pseudonym

InterParty 9 The nature of Public Identities Person Public Identity Relationships between real persons and public identities out of scope

InterParty 10 Public Identity InterParty and Public Identities InterParty is concerned with Public Identities in different namespaces Within the InterParty network, each Public Identity will require a Public Identity Identifier or “PIDI”. This is a combination of a Namespace and a Unique Identifier within that namespace PIDI Namespcae B : 876X5 PIDI Namespace A : Brian Green

InterParty 11 Common Metadata and Public Identities Metadata that IPMs are willing and able to share over the network Information that is in the public domain Excluding information that is private or sensitive

InterParty 12 Common Metadata Designed to be a practicable set of elements: –To enable disambiguation –To enable the creation of Links asserting a relationship between Public Identity records in IPM databases –That IPMs will be willing and able to provide it is not expected that all IPMs will be able to provide all the elements Feasibility is the biggest issue –Certainly for the demonstrator we will not be able to achieve what we might see as “the ideal solution”

InterParty 13 Common Metadata “standards” Need to define rules and format conventions appropriate –The more standardised the Common Metadata (in terms, for example, of controlled “values”) the higher its value – but the higher its cost To what extent will the “common metadata” need to adhere to common forms of semantic or syntactical expression? –Manual links: only to a limited extent, if its function is primarily for human interpretation –Automated links: algorithmically-based linking would require more standardised “common metadata”

InterParty 14 Metadata questionnaire Unique identifier? Persistent? Standard & variant name forms? Party types? Corporate? Personal? Pseudonyms - single or multiple public Ids? Other standard identifiers? Dates of birth/death? Dates of incorporation? Period of activity? Address/contact details? Works? Roles? Author? Composer? Artist? Associations? Other distinguishing metadata? Nationality? Citizenship?

InterParty 15 Limitations to common metadata Is the element there? Is it held in a discrete field that could be mapped to a common metadata set? Many elements missing from individual data sets, e.g. – addresses, dates, works, roles, etc. Where data is held there is variable practice in how it is held, e.g. –works data contained within broader Notes fields –data held outside the “party file” as links within a given IPM’s working context - library authority files links to bibliographic files Options for automated, algorithm-based linking are limited

InterParty 16 Common metadata - conclusions There are unique party records to be linked There are sufficient metadata elements in different systems to support judgements about links Different databases bring different strengths - with potential for enrichment and more accurate identification –examples to be shown in demonstrator Proposal for a reduced metadata set based on areas of greatest commonality Assumption that the system is primarily a manual search/edit facility to support accurate identification - with Links themselves built via usage

InterParty 17 Proposed Common Metadata Set PIDI –An identifier assigned to a public identity by an IPM that is unique within the domain of that IPM –The unique ID comprises Namespace/identifier to ensure it is unique on the network –Must be persistent, though the associated metadata will typically change Standard Name –The standard, preferred or usual name by which a Public Identity is known Party Type –Nature of the public identity –Categories: personal, corporate, unknown Variant Names –different forms of names belonging to the public identity Related Ids –may contain names of other public identities linked to the public identity within the IPM’s own system, e.g. pseudonyms Date of birth –usually 4 digit year of birth Elements defined for the demonstrator system

InterParty 18 Works –Works with which the Public Identity is associated, represented by title –Accompanied by date & role of Public Identity if known Address/contact details –permissible only where Party Type = Corporate Notes –includes other identifiers, associations, roles, other metadata that may include works, dates, etc. where not recorded in a discrete field for display InterParty Links themselves –Access to the Links is key element of Common Metadata Currently, the only mandatory elements are expected to be the PIDI and a Name BUT more data will be essential to support the task of identification Proposed Common Metadata Set Elements defined for the demonstrator system

InterParty 19 InterParty Links

InterParty 20 InterParty’s added value proposition - InterParty Links All metadata is ultimately about expressing relationships that someone claims to exist –e.g. Book A ‘has’ Author B All the participating databases in InterParty will express such relationships internally –Sometimes the same relationships, sometimes different relationships InterParty will create value to the extent that it enables new relationships to be expressed between databases... –e.g. Person X in Namespace A ‘is the same as’ Person Y in Namespace B … and recorded as an “InterParty Link”

InterParty 21 Establishing new relationships The establishment of InterParty Links will require effort and judgement –hence the need for enough metadata to make a decision about relationships These Links need to be recorded and made available to others to be of real value –therefore InterParty will need a facility to record the decision and store it as a Link record. For the demonstrator this new Link data will be held centrally as part of the “aggregated metadata” Is the same person

InterParty 22 InterParty Links An InterParty Link is the assertion of a relationship between two PIDIs PIDI Namespcae B : 876X5 PIDI Namespace A : Brian Green is Links may only be made by the owner of one of the PIDIs… –…and endorsed or disputed by the owner of the other PIDI The assertion of a relationship between the two PIDIs is held in a single Link record For the purposes of the demonstrator project, the relationships expressed in a Link will be restricted to “is”, “is complex” and “is not”

InterParty 23 Types of Relationship PIDI 1 “is” PIDI 2 –It is asserted that PIDI 1 and PIDI 2 have a functional and reciprocal equivalence for the purposes of InterParty PIDI 1 “is not” PIDI 2 –It is asserted that PIDI 1 does not have a functional equivalence with PIDI 2 despite appearances PIDI 1 “has a complex relationship with” PIDI 2 –It is asserted that PIDI 1 has a partial equivalence or complex relationship with PIDI 2 that is not necessarily reciprocal

InterParty 24 What is a “Complex” assertion? IPMs hold records for parties and names in different ways - there may not be easy one-to-one relationships between databases –IPM A assigns a single PIDI for Ruth Rendell, with a note that Barbara Vine is a pseudonym of Ruth Rendell –IPM B assigns separate PIDIs for both Ruth Rendell and Barbara Vine (with or without an internal assertion between them) –The “complex” relationship must be used There are numerous other circumstances where IPMs may take a different approach to identification –It is not proposed to define all the relationships covered by “Complex” any more precisely for the demonstrator

InterParty 25 Establishing a Link The status of a link will relate to how it is established and to what degree the two IPM owners have been involved. There are 4 status types “Proposed” –The relationship has been asserted by one IPM owner only “Authorised” –Concurring assertions have been made by both IPM owners “Disputed” –Assertions have been made by both IPM owners but they do not concur “Inferred” –Generated automatically based on inference from “is” relationships only

InterParty 26 Inferred Links PIDI NSC: 876X54 PIDI NSA: PIDI NSB:Brian Green is IFis AND isTHEN An inferred relationship

InterParty 27 Outline of a Link Record Link ID – a unique identifier for the Link Record PIDI 1 and PIDI 2 –the Identifiers being linked Link relationship –the relationship asserted (is, is not, is complex) Link status –Proposed, Authorised, Disputed, Inferred Link method –Manual or automatic Link timestamp –when the record was created or last updated Elements defined for the Link Record

InterParty 28 Owner Assertion composite -a group of elements which record each Owner IPM’s assertion about the Link, including -Owner ID -PIDI owned -Owner assertion – used to set up or amend Link Relationship -Assertion comment – notes field -Asserted by – name of individual -Assertion timestamp Comment composite -A group of elements to allow other IPMs to add further notes or comments to the record without directly affecting status of the assertion Elements defined for the Link Record Outline of a Link Record

InterParty 29 The InterParty network A C B “Resolution Service” Metadata User “Common metadata” A B C “InterParty Link records”

InterParty 30 To summarise... A limited set of discrete metadata fields The fields are mostly optional to allow for variable practices among IPMs The system will be built on human judgements and interpretation of the metadata found in searches across the network Adding links will make connections that will mutually enrich metadata in any two systems by associating the different strengths of the different systems Adding links will be a simple procedure –illustrated by the demonstrator

InterParty 31 Thank you