Presentation is loading. Please wait.

Presentation is loading. Please wait.

DSD Distributed Systems Division MTA SZTAKI Automatic Conversion from MARC to FRBR Christian Mönch (MTA SZTAKI) Trond Aalberg (NTNU)

Similar presentations


Presentation on theme: "DSD Distributed Systems Division MTA SZTAKI Automatic Conversion from MARC to FRBR Christian Mönch (MTA SZTAKI) Trond Aalberg (NTNU)"— Presentation transcript:

1 DSD Distributed Systems Division MTA SZTAKI Automatic Conversion from MARC to FRBR Christian Mönch (MTA SZTAKI) Trond Aalberg (NTNU)

2 Distributed Systems Division MTA SZTAKI DSD 2ECDL 2003 - Trondheim, Norway Outline Bibliographic catalogs FRBR model A framework for extracting FRBR entities from MARC-based catalogs Application to the BIBSYS catalog Results

3 Distributed Systems Division MTA SZTAKI DSD 3ECDL 2003 - Trondheim, Norway Record-Based Bibliographic Catalogs Structure: Set of records (search, exchange) Record: Surrogate for a publication Set of attributes, name-value pairs Problems: Non normalized structure with excessive data replication Many search requests are unsupported or require knowledge of bibliographic format

4 Distributed Systems Division MTA SZTAKI DSD 4ECDL 2003 - Trondheim, Norway IFLA’s FRBR Model ER-Model, three groups of entities Four operations on entities: search, identify, select, obtain Item Work is realized through is embodied in is exemplified by Translation Expression Adaptation Manifestation Whole/Part Corporate BodyPerson

5 Distributed Systems Division MTA SZTAKI DSD 5ECDL 2003 - Trondheim, Norway Availability? Highly structured model, that supports A multitude of search operations Navigation of bibliographic records Expensive to create Re-cataloging unaffordable Automatic conversion

6 Distributed Systems Division MTA SZTAKI DSD 6ECDL 2003 - Trondheim, Norway Automatic Creation of FRBR Instances Records Item Manifestation SRecords Expression Work Splitting SRecords Expression 1 SRecords Expression 2 SRecords Work 2 Expression Clustering Work Clustering SRecords Work 1 is realized through Extract manifestations and items from records Identify and split aggregative records Cluster record set to identify works Cluster work sets to identify expressions Create entities from the clusters

7 Distributed Systems Division MTA SZTAKI DSD 7ECDL 2003 - Trondheim, Norway Obstacles to the Automatic Creation FRBR Model Instances Inconsistency of data in catalogs: Identical information is represented differently in different records (attributes, syntaxes) Erroneous data Incompleteness of data in catalogs: Information necessary for clustering has not been captured in the records

8 Distributed Systems Division MTA SZTAKI DSD 8ECDL 2003 - Trondheim, Norway Obstacles to the Automatic Creation FRBR Model Instances Inconsistency of data in catalogs: Identical information is represented differently in different records (attributes, syntaxes) Erroneous data Might be resolved automaticaly, for example, through authority files Incompleteness of data in catalogs: Information necessary for clustering has not been captured in the records

9 Distributed Systems Division MTA SZTAKI DSD 9ECDL 2003 - Trondheim, Norway Obstacles to the Automatic Creation FRBR Model Instances Inconsistency of data in catalogs: Identical information is represented differently in different records (attributes, syntaxes) Erroneous data Incompleteness of data in catalogs: Information necessary for clustering has not been captured in the records Requires additional information linked to individual records

10 Distributed Systems Division MTA SZTAKI DSD 10ECDL 2003 - Trondheim, Norway The Attribute Layer SRecords Expression 1 SRecords Expression 2 SRecords Work 2 Expression Clustering Work Clustering SRecords Work 1 Attribute Layer Extract consistent and error-free FRBR-related Generic Attributes and Properties from the records, e.g. title, creator, isTranslation. Specific to bibliographic formats and catalogs

11 Distributed Systems Division MTA SZTAKI DSD 11ECDL 2003 - Trondheim, Norway The Attribute Layer for BIBSYS (I) Classify records: Series, monographs Monographs may have each of the following characteristics: Linked Aggregative Example for retrieval of Generic Attributes from monograph records: Attribute title: Searched in: 130$a, 740$a, 240$a (if 240$l does not exist), and 245$a Extended to referenced records

12 Distributed Systems Division MTA SZTAKI DSD 12ECDL 2003 - Trondheim, Norway The Attribute Layer for BIBSYS (II) Attribute original title: Searched in: 241$a, 240$a (if 240$l does exist), and 500$a (if it starts with the indicators originaltittler:, or orig.titt.: ) Extended to referenced records Attribute creator: Searched in: 100$a, and 110$a Extended to referenced records

13 Distributed Systems Division MTA SZTAKI DSD 13ECDL 2003 - Trondheim, Norway Tested on 4379 records related to Henrik Ibsen Works: 41, of which eight were false positives due to different spelling or spelling errors Expressions: 1111 Manifestations: 1072, of which 35 contained more than one expression But: 3307 records were ignored, because reliable retrieval of Generic Attributes was impossible Unreliable: 580 works, 3706 expressions, 3567 manifestations. Not convincing! Application of the Framework to BIBSYS

14 Distributed Systems Division MTA SZTAKI DSD 14ECDL 2003 - Trondheim, Norway Ongoing Work Fault tolerant dissimilarity measure for the clustering process Use of authority files to dissambiguate values Leverage information retrieved from high quality records for incomplete records. Thus making incompleteness a property of the whole catalog and not of single records Apply to a 100.000 record subset of BIBSYS

15 Distributed Systems Division MTA SZTAKI DSD 15ECDL 2003 - Trondheim, Norway Questions?


Download ppt "DSD Distributed Systems Division MTA SZTAKI Automatic Conversion from MARC to FRBR Christian Mönch (MTA SZTAKI) Trond Aalberg (NTNU)"

Similar presentations


Ads by Google