Presentation on theme: "FRBR Work Match activities at DBC Where are we and where are we going Author: Hans-Henrik Lund Elag 2002 - Roma 17.04.2002 ( )"— Presentation transcript:
FRBR Work Match activities at DBC Where are we and where are we going Author: Hans-Henrik Lund Elag Roma ( )
What do we have zA record collection of 16,5 mil. marc records from 172 different ’libraries’ zIncluding: ythe Danish national bibliography 1,4 mil. yBNB 1,3 mil. yLC 3,3 mil. zAll converted to danMARC2
What do we want zMake this collection available for the end user as a ”work” collection (and not as a collection of records). zWe have defined that 2 works are different, if the language or the material type is different.
How do we do this: zWe have matched the entire data base on a ”edition/manifestation” level (in clusters). If you want the system to handle orders, its important to maintain edition level. yBy making clusters based on manifestation the logical numbers of records was reduced from 16,5 to 12,3 mil. records
From manifestation to work zThe result of a search will be matched, on the fly, on work level. (in the test version) zA result of a author search ”Stephen King” yields 362 cataloguing records, 231 manifestation/clusters and 102 works zThe benefits of this approach is that we can change the criteria for a ”work” and test it.
The match program zThe match program works in two phases yFirst it makes a key. This key is like a hatch key. The key could be based on the title and/or a known identifier (issn, isbn etc.) ySecond it takes two record at a time, with the same key, and compares them according to rules for the match-script
Normalization of the text zKøbenhavn’s freds kommité KOBENHAVNS FREDS KOMMITE zHans Krüger HANS KRYGER
different operands zalike znot_alike zalike_or_missing
Logical fields zA logical field containing data from many subfields ymaintitle = 245*a | 239*t | 240*a & 240*d & 240*e & 240*f & 240*h zA logical field containing only parts of a subfield yauthor = 700*a & 700*h:1 x100 *a Rifbjerg *h Klaus = 100 *a Rifbjerg *h K.
Edition comparison zWe make a temp-field only with words recognized from the edition field (after it has been text converted) z“EDITION” & | “REVISED” | “NEW” ) y *a 3. ed. *x 12. reprint. = EDITION 3 y *a 3. ed.,4. rep. = EDITION 3
Problems zDifferent cataloguing praxis zErrors (typing etc) zMore than one work in the same marc-record yA CD can contains works from many different artist
Development strategy zThe syntax and features of the match-script has been developed along with the project in collaboration between the libarien and the programmer. zThe libarien had a online test program of the match-script
The match test program
The future ? zIt depend of the result of this project zPerhaps the cost/benefits not good enough zPerhaps we actually make a publication database with records stored as works ?
The script language zAn exampleAn example
Some test results zBoligsikring = 145 manifestations 29 works zMankell and bøger and dansk = 44 manifestations 19 works zVerdi and opera and cd = 111 manifestations 35 works