Presentation is loading. Please wait.

Presentation is loading. Please wait.

Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Similar presentations


Presentation on theme: "Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled."— Presentation transcript:

1 Creator Element Authority Control

2 Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled vocabulary for all –Subjects –Names –Common descriptive terms

3 Access Points - Purposes To identify (e.g., an entity known to the user) To collocate (i.e., bring together related entities/works) To aid in evaluating or selecting (e.g., Has this author written something newer on the subject? Which of several works with the same title do I want? What level of subject treatment is needed –a whole work on the subject? a chapter? A paragraph?) To locate the image, etc.

4 Access Points for Names and Titles - Purposes To facilitate the retrieval of names and titles that are imperfectly remembered To facilitate the retrieval of names and titles that are expressed differently in different information packages To facilitate the retrieval of names and titles that have changed over time To collocate expressions and manifestations of works To collocate works that are related to other works

5 Access Points for Names and Titles – How Accomplished Name and Title Authority Control –All access points (whether main or added entries) need to be under authority control so that persons or entities with the same name can be distinguished from each other all names used by a person or body, or all manifestations of a name of a person or body will be brought together all differing titles of the same work can be brought together –Therefore, current practice dictates either the establishment of a “heading” for each name or title as an access point or the provision of pointers to draw different representations of names or titles together –Headings are kept track of in authority files; RDF provides a model for linking entities

6 Name Authority Standards LCNAF (Library of Congress Name Authority File) – constructed according to principles set out in AACR2R Getty Vocabulary tools (artist names; geographic names) – VRA Core Categories calls for use of the Getty vocabulary ISAAR(CPF) – International Standard Archival Authority Record for Corporate Bodies, Persons and Families EAC – Encoded Archival Context (for describing creators of archival collections) DCMI Agents – creators, contributors, and publishers – to be used in Dublin Core records

7 DCMI Agents: Working Definitions Agent: A person (author, publisher, sculptor, editor, director, etc.) or a group (organization, corporation, library, orchestra, country, federation, etc.) or an automaton (weather recording device, software translation program, etc.) that has a role in the lifecycle of a resource. Agent Record: A collection of elements describing an agent. Agent Authority Record: An agent record that includes the particular name that is preferred (considered authoritative) within a particular community (e.g., libraries).

8 Controlled Subject Terminology - Purposes To provide subject access to information packages in a catalog or index To collocate surrogate records for information packages of a like nature To provide suggested synonyms and syndetic structure to aid a user in subject searching To save the users’ time

9 Controlled Subject Terminology – How Accomplished Conceptual analysis – describe aboutness in natural language Translate that analysis into the framework of the controlled vocabulary system (e.g., use of single concept terms vs. use of phrases, compound concepts, and precoordinated subdivisions) Use controlled vocabulary system rules to create controlled subject access points to be added to metadata records

10 Controlled Vocabularies Subject heading lists –LCSH (Library of Congress Subject Headings) –FAST (Faceted Access to Subject Terminology) –Sears List of Subject Headings –MeSH (Medical Subject Headings) Thesauri –AAT (Art & Architecture Thesaurus) –Thesaurus of ERIC Descriptors –Many more...

11 Names: What Do We Need Enable the user to retrieve all relevant items associated with a person or group Enable the user to retrieve all relevant items associated with a name regardless of the fullness or spelling of the person or group Enable names to be browsed by either last name or first name but displayed in natural order

12 Names: Existing Tools ANAC (Automated Name Authority Control system) Perseus project developed its own named entity extractor optimized for Civil War–era names. Uses MADS Stanford Natural Language Processor Tools

13 Authorities Authority Control governs usage of a controlled vocabulary. This is managed with Authority Files, that consist of Authority Records, each of which records a term and its variants as well as evidence. They are created using Authority Work, bibliographic detective work usually.

14 Authorities Each authority record exists to control a term, known in library cataloging as a “heading” The only “entity” is the controlled heading The relationships are among the heading and variant forms of the heading Everything else in the authority record is evidentiary or used for file control

15 Role of Authority Work Authority work, in which terms and names are verified and validated, is a critical part of documentation practice. The concept originated in the library cataloging domain in the days of manual card catalogs and indexes when strict consistency was necessary for minimal access. Today authority work has extended to other information management communities and its processes and procedures have benefited greatly from computerization. The development and application of standard controlled vocabularies is an significant outcome of authority work.

16 Authority Work Characteristics Authority files are compilations of authorized terms or headings used by a single organization or consortium in cataloging, indexing, or documentation Authority control is a system of procedures that maintains consistent information in database records.

17 Authority Work Characteristics An authority file is a controlled vocabulary, but not all controlled vocabularies are authority files. Authority files are an integral part of most automated information systems but you will find differing levels of implementation depending on the system. Authority work procedures may be automated, but the intellectual processes needed to create quality authority files are still best accomplished by humans.

18 Attributing Works in the Anglo- American Cataloguing Rules A work may be attributed to an individual creator, it may be attributed to a corporate emanator, or it may be entered under its title. Individuals: chiefly responsible for the creation of intellectual (artistic, etc.) content (21.1A1). Responsibility may be shared or mixed … Corporate body: an organization with a name that acts as an entity … and causes a work of collective thought or activity to emanate … (21.1B2). Governments, churches, universities, corporations, conferences, etc.

19 A “Heading” Contains, but is Not Equal to, A “Name” A heading includes: –The authorized form of name (title, etc.) –Manipulated in various ways (inverted, for instance) –Qualifiers to make it unique The name is Richard P. Smiraglia The heading is Smiraglia, Richard P., 1952-

20 Constituting Headings: Personal Names The name of the creator as found in his published works. If more than one name, choose the latest. If more than one form, choose that found most often most recently. If all else fails, choose the fullest form. Add dates and middle names to resolve conflicts.

21 Constituting Headings: Corporate Names The name of the corporate body as found in its published works. If more than one name use all. If more than one form, choose the one found most often in its works. Add terms as qualifiers to resolve conflicts. –Who (Musical group) –Apollo (Spaceship)

22 Constituting Headings: Subordinate Entry Government or Corporate Entities with generic names or names implying subordination “Department” “Division” “Bureau” “Committee” etc. Entered under the name of the intermediate unit with a distinctive name. –California. Employment Data and Research Division. –NOT: California. Employment Development Department. Employment Data and Research Division.

23 Authority Control Maintains consistency of usage of names of individuals, corporate bodies, and titles of works. Always: –Smiraglia, Richard P., 1952- –Not Smiraglia, R.P. –Not Smiraglia, Richard Always: –Taylor, Arlene G., 1941- –Not Dowell, Arlene Taylor, 1941-

24 Authority Records Authority control works through the use of authority records Authority records record: –Authority work—the actual decision-making process of the cataloger –Variant forms found along the way –References in the catalog from recognized variant forms

25 A new model of “authority file” The authority records of creators are meant to include a much more complex set of information than traditional bibliographic authority records, exactly because they are devoted to implementing the model of separate description of archives and creators Dates of existence, history and geography, functions, occupations, and activities … political, social, cultural context in which the creator worked

26 From a Data Modeling Standpoint …. Thus the only entity in an authority record is the authorized heading (or “term”) Its variants are attributes, but could also be seen as equivalents The rest is functional: –Notes (Evidentiary and Non—two types) –Usage –Control AF BF A flat file model Headings in the Authority File govern usage in the Bibliographic File. One “ Dickens” in the AF governs all “Dickens” in the BF. Usage is inferential.

27 Online, new models emerged 1. Online flat-file models simply used the authority file as an occasional filter. All headings from the bibliographic file were run against it periodically for validation. 2. An ER model separated the headings from their representations in bibliographic records. This reduced redundancy dramatically. Every heading is stored only in the authority file, and copied as needed into the displays arranged from the bibliographic file. All “Dickens” resides only in the AF, with links from the BF. AFBF

28 Authority Control Traditional Functions –Ensures that access points are unique and consistent in content and form –Provides a network of linkages for variant and related headings in the catalog –Improves precision & recall for database searches

29 Reasons for Authority Control Success AC operates within a well-defined and bounded universe—the library catalog Creation of access points based on principles & standardized practices that guide the process Authority work is aided by reference to authoritative lists Performed by highly trained individuals –Part of library culture –Understand cause and effect in the information retrieval process

30 Functions of the Authority File Document decisions Serve as reference tool Control forms of access points Support access to bibliographic file Link bibliographic and authority files

31 Users Authority record creators and reference librarians Library patrons

32 Users and Tasks Users Authority record creators and reference librarians Library patrons User tasks Find –Find an entity or set of entities corresponding to stated criteria Identify –Identify an entity Contextualize –Place a person, corporate body, work, etc. in context Justify –Document the authority record creator’s reason for choosing the name or form of name on which an access point is based

33 Traditional Authority Control in Libraries Which names do we control? –Names of authors and some contributors of published books –Composers of sheet music –Names of corporate bodies responsible for official publications –Names associated with resources catalogued since 1981 –Names associated with audio or audio visual resources, where possible Which names do we exclude? –Names of authors of journal articles or chapters of published books –Contributors whose names fall towards the end of the alphabet or whose contribution we regard as insignificant –Names associated with archival or manuscript material –Names derived from older catalogues –Names associated with most Web Resources –Names in the content management system / institutional repository

34 Expectations There is a gap between ambition and delivery Only some names on some types of resources are controlled User expectations are changing Silos: –Libraries / Archives / Repositories / Museums –National practices –Institutional practices –Variance over time Is partial authority control acceptable to users? If not, will it be acceptable to administrators?

35 Workflows Current workflows are not scalable Retrospective Cataloger driven Decision making –Is A. Rose PhD the same person as Dr. Alex Rose, University of London? –What other information is available? –Is it sufficient to match or disambiguate the identities? –Is there a website / contact details?

36 Rethinking the Process Capture information about the person, family or corporate body at the time the resource is created Devolve responsibility to authors, publishers, researchers and academics Libraries and bibliographic agencies focus on quality control, complex relationships and conflict resolution. Capture information in a way that is machine intelligible. –Identification of entities not disambiguation of headings

37 VIAF: The Virtual International Authority File Match & Link Authority Files –Reduce costs –Increase utility –Retrospective alignment of bibliographic data Prototype http://viaf.org/http://viaf.org/ Linked Data http://outgoing.typepad.com/outgoing/2009/09/viaf-as-linked- data.html OCLC Bibliothèque nationale de France Bibliotheca Alexandrina (Egypt) National Library of the Czech Republic Deutsche Nationalbibliothek National Library of Israel Library of Congress/NACO National Library of Sweden Vatican LibraryBibliothèque nationale de FranceBibliotheca Alexandrina (Egypt)National Library of the Czech RepublicDeutsche NationalbibliothekNational Library of IsraelLibrary of Congress/NACONational Library of SwedenVatican Library

38 It’s not just about libraries… FO:AF Friend of a Friend –Social networking metadata –Granularity of parts of a name –http://xmlns.com/foaf/spec/http://xmlns.com/foaf/spec/ EAC-CPF: Encoded Archival Context – Corporate Bodies, Persons, and Families –Communication standard for exchange of authority records –ISAAR (CPF) –Draft Standard http://eac.staatsbibliothek-berlin.de/

39 Thoughts Controlling names remains important in the context of linked data and the Semantic Web Identification and collocation of variants is more important than establishing a preferred form Current techniques are not scalable Automation and participation are the way forward Web services for identification No simple solution Exension of the collaborative model

40 FRAD Functional Requirements for Authority Data IFLA Division of Bibliographic Control working group 1999- April 2007 draft for world-wide review Approved March 2009

41 FRAD Entities Name by which bibliographic entities are known (in the “real” world) Identifier assigned to those entities Controlled access point based on those names or identifiers These are the heart of the authority data

42 Name A character or group of words and/or characters by which an entity is known The basic name or term itself As found in the “real” world

43 Definition: Identifier A number, code, word, phrase, logo, device, etc. that is uniquely associated with an entity, and serves to differentiate that entity from other entities within the domain in which the identifier is assigned Not only bibliographic identifiers

44 Definition: Controlled Access Point A name, term, code, etc. under which a bibliographic or authority record or reference will be found Includes established or authorized headings and variant headings or references

45 Basic FRAD Model BIBLIOGRAPHIC ENTITIES known by NAMES and / or IDENTIFIERS basis for CONTROLLED ACCESS POINTS

46 More FRAD Entities Rules governing construction of a controlled access point Agency applying the rules, and creating/modifying the controlled access point

47 MADS MODS users kept asking for a compatible authority record Metadata Authority Description Schema –April 2004, Preliminary version out for review –December 2004 new draft out for review –April 2005 version 1.0 published

48 MADS schema design Highly coordinated with MODS –Schema specifies high level elements and unique substructures –But MADS points to substructures in MODS where possible Each heading is wrapped in an XML tag: or or Each subpart of a heading has authority list identifier

49 Components of MADS Authoritative heading Related heading(s) (see also) Variant heading(s) (see) Other elements

50 Heading elements – Same for and

51 Examples – Scotland – History – 18 th century – Historical fiction – Law, Felicia – Ways we move

52 Reference types –Attribute indicates: earlier later parentOrg broader narrower equivalent other –Attribute indicates: acronym abbreviation translation expansion Other

53 Other elements

54 Example - Unesco - - UnitedNations - - United Nations Educational, Cultural, and Scientific Organization -

55 Features Word oriented tagging –English word tags –Same as corresponding MODS elements –Easy to pick up and use? –Record creation by technicians?

56 Features Rich linking possibilities – element to link out at record level –xlink attribute for external links from elements –ID attribute to enable linking to an element

57 Features Special attributes on all elements –lang – MARC codes (ISO 639-2b) –xml:lang – ISO 639-1 –script – ISO 15924 –transliteration – no controlled list –authority – e.g., lcsh, naf

58 Subject heading example - Computer programming Computers Programming languages Systems analysis

59 MADS MADS is taking a fresh approach to authority records that is: –Coordinated with MARC 21 authorities and MODS –Accommodating to a variety of authority types and practices –Taking advantage of the XML environment –Web site: www.loc.gov/mads

60 Automated name metadata remediation Inconsistent name representation Metadata harvested from multiple providers Hand-crafted data is expensive Commercial alternatives are expensive

61 Johns Hopkins Project: Automated Name Authority Control (ANAC) 29,000 Levy sheet music records 13,764 unique names 3.5 million LC name authority records (at the time of the project)

62 ANAC The evidence used to determine the probability of a match between a name to an LC record is a set of Boolean tests involving the name, the Levy metadata associated with that name, and the LC record. The following fields were used by ANAC: Levy record: –Given name: often abbreviated –Middle names: often abbreviated –Family name –Modifiers: titles and suffixes –Date: publication year –Location: publication location (city) LC record: –Given name: includes abbreviations –Middle names: includes abbreviations –Family name –Modifiers: titles and suffixes –Birth: year of birth –Death: year of death –Context: miscellaneous data

63 ANAC The tests used are: first name equality and consistency, middle name equality and consistency, music terms present in LC record context, name modifier consistency, Levy sheet music publication consistent with LC author birth and death, and Levy record publication location in LC record context

64 ANAC In order to train the system, the Cataloging Department at the Sheridan Libraries generated ground truth data. For each name in 2,000 randomly selected Levy metadata records, catalogers recorded the authorized form of the name when a matching authority record was available. The entire process required 311 hours (approximately seven minutes per name). The human catalogers used much the same type of evidence as ANAC in establishing matches. Catalogers examined name similarity; compared publication dates from the Levy records to birth and death dates in the authority records; and examined authority record note fields for musical terms. In addition, the catalogers often searched for bibliographic records of other editions of a particular title to determine the authoritative name assigned to the subject.

65 ANAC Overall, ANAC was successful 58% of the time. When a name had an LC record, ANAC was successful 77% of the time, but when an LC record did not exist for a name ANAC was successful only 12% of them time. The reason for this discrepancy is that ANAC cannot learn whether or not a name has been added to the LC authority file. It took ANAC five hours and forty-five minutes to classify the 2,673 (2,841 minus 168) names, or about eight seconds per name. The database-bound process of retrieving the candidate set of MARC records given a family name consumed most of this time.

66 ANAC Matching very dependent on contextual data Machine matching much faster than manual Performance reasonable even with dirty metadata Machine matching could enhance manual work

67 ANAC: Conclusions Matching very dependent on contextual data Machine matching much faster than manual (8 sec. vs. 7 min.) Performance reasonable even with dirty metadata. Machine matching could enhance manual work Combination of machine processing and human intervention produced best results Approach could be tweaked by comparing names to multiple authority files or domain specific databases

68 Identifiers: People One area where growing interest in identifiers is very clear is that of people, particularly in their role as authors or creators. The benefits of using a consistent name are clear from a discovery point of view. So it is interesting that many people are inconsistent in how they identify themselves on their works. Search engines have probably made people more conscious of the distinctiveness - or otherwise - of their names? The additional step of unique identification would facilitate various services.

69 UK Names Project The project is going to scope the requirements of UK institutional and subject repositories for a service that will reliably and uniquely identify names of individuals and institutions. It will then go on to develop a prototype service which will test the various processes involved. This will include determining the data format, setting up an appropriate database, mapping data from different sources, populating the database with records and testing the use of the data. This will provide important information about the future usefulness of a name authority service for institutional and subject-based repositories, and other applications beyond the repository sector.

70 Virtual International Authority File (VIAF) Link authority records from national bibliographic agencies Build on their authority work Expand the concept of universal bibliographic control –Allow national or regional variations in authorized form to co-exist –Support needs for variations in preferred language, script, and spelling

71 VIAF Demonstrate feasibility of linking personal names across: Personennormadatei (PND) Library of Congress Name Authority File (LCNAF) Bibliotheque national de France

72 What is VIAF? System –Links between files –Web browser access –Multi-lingual and multi-scripts Maintenance –National agencies control their records –Records harvested from national systems Scalable –Any number of national authority files

73 Matching Variations In the LCNAF and PND authority files: Same name, same person Same name, different people Different names, same person Missing person in one file

74 Two Different People – One Name Adams, Mike PND: a golfer LCNAF: author of a Beatles collector's guide Same Name Different People

75 One Person – Two Names LCNAF: Morel, Pierre PND: Morellus, Petrus Same Person Different Names

76 Enhancing the Authorities Bibliographic Record Derived Authority Record Enhanced Authority

77 Strong Matching Attributes A work (title) in common Common control numbers (ISBN, ISSN, or LCCN) Exact birth and death year Joint authors Name as subject

78 Weaker Attributes Only one of birth/death date(s) (allows some variation) Subject area of works (two levels) Format (books, films, musical scores, etc.) Language Publisher Partial title match Date of publication Country Role (author, illustrator, composer, etc.) Format (books, films, musical scores, etc.)

79 OCLC Cooperative Identities Hub Bring together information about creators now hidden within library, archival, and museum contexts, using a social networking model. Broaden the view of "authority work" beyond NACO contributors. Increase metadata creation efficiency. Make it easier for users to identify works by or about the same creator regardless of language or discipline. Expose information about personal and corporate bodies beyond the confines of library, archival, and museum silos and bring them into the "network flow".

80 Names can be ambiguous… “John Adams” … the US president? … the US composer? … the British mathematician & astronomer? … the British nuclear physicist? … or someone else?

81 Names depend on context… US: Chiang Kai-shek France, Germany: Jiang Jieshi China, Japan: 蒋介石 蔣中正 Arabic-speaking countries: شيانج كاي شيك Tamil: சங் கை செக்

82 Cooperative Identities Hub Framework to concatenate and merge authoritative information Gateway to all forms of names without preferring one form over another Use social networking model Provide a switch to extract relevant information for re-use in own contexts Create federated trust environment to authenticate and authorize contributors

83 Hub Objectives Increase metadata creation efficiency Easier to identify identity regardless of language or discipline Determine preferred form within own context Enable contributing agencies to augment own data resources Expose information about personal and corporate bodies beyond original contexts

84 Hub Data Elements At least one form of name Life events, with dates if known: origin, place(s) of output, knowledge domains, institutional affiliations… Associated entities (role and what relationship is) At least some works Short biographical history Unique identifiers from each source

85

86

87

88


Download ppt "Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled."

Similar presentations


Ads by Google