Presentation on theme: "Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway."— Presentation transcript:
Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway
Topics FRBRizing existing catalogues –The BIBSYS FRBR project Internal FRBR structures –How to structure and store FRBR data internally Exchange –How to express and exchange FRBR data externally What kind of specification do we want/need the FRBR to be for implementations?
The BIBSYS FRBR project a case study in the use of the FRBR model on the BIBSYS database BIBSYS –Norwegian service center for libraries: Norwegian university libraries, the National Library, all college libraries, and a number of research libraries –Bibliographic database with circa 3.8 mill. records (8 mill. holdings) –BIBSYSMARC ~ NORMARC (subset but not proper subset of USMARC) Project cooperates with –Norwegian University of Science and Technology (Project management, modeling and implementations) –The National Library of Norway (Mapping FRBR – BIBSYSMARC) –OCLC (running the Work-Set algorithm on the BIBSYS database) –The National Database Project of Norwegian University Museums (CRM) Funded by the Norwegian Archive, Library and Museum Authority (1/ – 31/8 2005) and is a part of the Norwegian Digital Library Initiative
Motivation and objectives Large number of existing MARC-based bibliographic catalogues –FRBRizing existing catalogues is a major challenge and the key to a FRBRized bibliographic universe –Realistic FRBR prototypes can be used to validate the model Holistic view –Process the complete database (not ideal subset) –From FRBR data model to test database and search prototype –Cover as much as possible of the BIBSYS data Findings –Possibilities and limitations –How to improve support for FRBR in BIBSYSMARC –Further research on specific problems
FRBRizing existing catalogues Def: –to implement aspects of the FRBR model Two different strategies: –Presentation layer only Adding system component that enables generation of FRBR Run-time or preprocessed –Presentation and storage layer Convert data to a FRBR compatible model
Levels of FRBRizing Different levels of FRBRizing –Implement group 1 entities and inherent relationships –Implement group 2 and 3 entitites and inherent relationsips –Implement other relationships –Implement FRBR attributes
Implementing FRBR Record 4 Record 5 Record 1 Record 2 Record 3 Internal FRBR data structure Build on ER approach Decompose and convert MARC to FRBR attributes
BIBSYSMARC Example record *008 pv eng *015 $a nf *020 $a $b h. *082 $d [S] *100 $a Ibsen, Henrik *241 $a Et dukkehjem $w dukkehjem *245 $a A doll's house $c by Henrik Ibsen ; adapted by Frank McGuinness *260 $a New York $b Dramatists Play Service $c c1998 *300 $a 70 s. *700 $a McGuinness, Frank *096 $a NBO $c Småtr. 582 $n 02ga00027 *096 $a NBO $c Ibsensenteret $n 01ga20306 *100 $a Ibsen, Henrik *241 $a Et dukkehjem $w dukkehjem *008 pv eng *245 $a A doll's house $c by Henrik Ibsen ; adapted by Frank McGuinness *700 $a McGuinness, Frank *020 $a $b h. *245 $a A doll's house $c by Henrik Ibsen ; adapted by Frank McGuinness *260 $a New York $b Dramatists Play Service $c c1998 *300 $a 70 s.
* *008 pv eng *241 $aNår vi døde vågner *241 $aLille Eyolf *241 $aJohn Gabriel Borkman *245 $aLittle Eyolf ; John Gabriel Borkman ; When we dead awaken $cwith introductions by William Archer *260 $c1907 *300 $aXXVIII, 456 s. *491 $n x$q11$v11 *096ga$aNBO$cIbsensenteret/11$n85ga06648 *096ga$aNBO$cIbsensenteret/11$n75ga29424 *096ga$aNBO$cIbsensenteret/11$n74ga02038 * *008 pv *245 $aBrand$ctranslated and with introduction by C.H. Herford *260 $c1906 *300 $aXIII, 262 s. *491 $n x$q3$v3 *700 $aHereford, C.H. *096ga$aNBO$cNA/A 2001:579$n75ga27601 *096ga$aNBO$cIbsensenteret/3$n74ga02035 *096ga$aNBO$cIbsensenteret/3$n74ga02036 Whole/Part records in BIBSYS for numbered series and multi-volumed publications * x *008 pv eng *015 $alc *082 $c839.82/26 *100 $aIbsen, Henrik *240 $aVerker$lEngelsk *245 $aThe collected works of Henrik Ibsen$c[entirely revised and edited by William Archer]$wcollected works of Henrik Ibsen *250 $aCopyright ed. *260 $aLondon$bHeinemann$c *300 $a12 b. *700 $aArcher, William$d *580 $aDette er et lenket flerbindsverk *096kj$aUHS$bISS$c Ibs:Col$n75k *096ga$aNBO$cIbsensenteret$n75ga27600 *096ga$aNBO$n75ga29508 *096ga$aNBO$cIbsensenteret$n74ga02037 *096ga$aNBO$cIbsensenteret$n85ga06639 *491 is used to implement an isPartOf reference App. 20% of the records
Preliminary results: W:E:M Statistics from the BIBSYS database
Data quality problems Typical problems for not normalized data –Redundant information The same information is duplicated in multiple records –Records are missing information –The same information is expressed in different ways Inherent problems with data quality Results from earlier work on the subset of Ibsen records (~3000) –Using manual inspection and corrections of entries (language, titles, etc) –Based on knowledge about author, works, titles, …. Compared to results from automatic processing Numbers indicate –a high level of imprecise information –quality can significantly be improved WorksExpressions With error corrections Without error corrections
Typical problems Different capitalization Spelling errors Substrings Only selected values Indicative information Missing information Easy Difficult
Conversion process outlined FRBR Implementation model FRBR – BIBSYSMARC mapping Identify entities and relationships Convert or extract from MARC fields to FRBR attributes
FRBR in MARC catalogues Work Expression Manifestation Item MARC-record Relationships 1~1 N:1 N:N 1:N Group 2 and 3 entities N:N
FRBR attributes Each of the entities in the model has associated with it a set of characteristics or attributes Attributes serve as the means by which users formulate queries and interpret responses Derived from a logical analysis of the data that are typically reflected in bibliographic records Attributes are defined at a logical level Some are generally applicable, others are applicable only to subtypes Intended to be comprehensive but not exhaustive Not every instance will exhibit all attributes listed
Mapping MARC to FRBR FRBR attributes are the bridge between FRBR and other formats Functional Analysis of the MARC 21 Bibliographic and Holdings Formats Local mapping tables are needed Mapping is easy but conversion is difficult Depending on the purpose of mapping –Full conversion of data –Enable searching in different formats –Mapping tables need to be close to conversion processes –Requires refinement of many FRBR attributes –and generalization of others What structures/formats do we implement?
Example: Manifestation.title 245 TITLE $a – Title $b – Other title information $n – Number of part of work $p – Title of part of work 246 PARALELL TITLE (R) $a - Title proper/short title $b - Other title information 740 ADDED ENTRY TITLE (R) $a – Title $b – Other title information $n – Number of part of work $p – Title of part of work ABBREVIATED TITLE $a – Abbreviated title $b – Complementary information KEY TITLE $a – Key title $b – Complementary information Field names are translations of BIBSYSMARC fieldnames 740 is also mapped to expression and work title Complex data that maps to a single element Generic category of information except for 740 Somewhat comparable structure
Example: Manifestation.identifier 020 ISBN (R) –$a ISBN –$z Invalid ISBN 022 ISSN (R) –$a ISSN –$y Invalid ISSN 024 ISMN and ISRC (R) –$a Number –$x Type of number –$y Invalid number And 027, 028,.. Complex data that maps to a single element 020 and 022 comparable structure, but not 024
Example: 300 PHYSICAL DESCRIPTION $a Extent =Extent of the Carrier ~Form of Carrier ~Presentation Format (Visual Projection) ~Foliation (Hand-Printed Book) ~Collation (Hand-Printed Book) $b Illustrations (Other physical details) ~Capture Mode ~Colour (Image) ~Playing Speed (Sound Recording) ~Kind of Sound (Sound Recording) $c Format (Dimensions of the carrier) =Dimensions of the Carrier *Mapped to manifestation Some FRBR attributes are too specific!
Prototype solution Substructure is not always important for searching Substructure is important for presentation Mix models (FRBR and MARC)? Classifying specific fields/subfields as belonging to a specific entity/attribute –Not possible for fields that map to several FRBR entities and/or attributes Decomposing record instances –Determine what belongs to what entity/attribute –Tag values in MARC records with FRBR entity/attribute E.g extend MARCXML with attributes that identify FRBR entity/attribute –Tag FRBR attribute values with original MARC field/subfield Prototype solution using XML: –Different records for different entities –Maintain MARC substructure –To avoid runtime selection of work and expression entities –To facilitate error corrections and improve overall FRBR group 1 structure
FRBR as ontology FRBR is a conceptual model –Mainly interpreted as a reference model Can be formalized to an ontology eg. using W3C OWL: –This is a FRBR.Expression and it has a FRBR.Translation relationship to another FRBR.Expression Using Topic Maps and FRBR as typology (example from another project)
TM prototype FRBR as ontology for music information: –Works and creators –Artists and recorded performances –Navigation as the main discovery/search strategy Model and represent music information as distinct entities and relationships using FRBR as types –Not including FRBR attributes Exchange and integrate fragments using P2P (TMRAP) Objective –Explore and evaluate the use of FRBR entities and relationships –P2P exchange and integration of rich music information –Identifiers in the domain of music –The use of FRBR as an ontology in Topic Maps * Examples are based on demo version of Omnigator software from Ontopia
Conclusion What do we want FRBR to be? –A reference model for bibliographic catalogues –A conceptual model for understanding bibliographic records –An ontology for exchanging bibliographic information within the domain and with other domains