Presentation on theme: "FRBR Work Match activities at DBC Where are we and where are we going Author: Hans-Henrik Lund Elag 2002 - Roma 17.04.2002 ( )"— Presentation transcript:
FRBR Work Match activities at DBC Where are we and where are we going Author: Hans-Henrik Lund Elag 2002 - Roma 17.04.2002 ( firstname.lastname@example.org )
17.04.2002 What do we have zA record collection of 16,5 mil. marc records from 172 different ’libraries’ zIncluding: ythe Danish national bibliography 1,4 mil. yBNB 1,3 mil. yLC 3,3 mil. zAll converted to danMARC2
17.04.2002 What do we want zMake this collection available for the end user as a ”work” collection (and not as a collection of records). zWe have defined that 2 works are different, if the language or the material type is different.
17.04.2002 How do we do this: zWe have matched the entire data base on a ”edition/manifestation” level (in clusters). If you want the system to handle orders, its important to maintain edition level. yBy making clusters based on manifestation the logical numbers of records was reduced from 16,5 to 12,3 mil. records
17.04.2002 From manifestation to work zThe result of a search will be matched, on the fly, on work level. (in the test version) zA result of a author search ”Stephen King” yields 362 cataloguing records, 231 manifestation/clusters and 102 works zThe benefits of this approach is that we can change the criteria for a ”work” and test it.
17.04.2002 The match program zThe match program works in two phases yFirst it makes a key. This key is like a hatch key. The key could be based on the title and/or a known identifier (issn, isbn etc.) ySecond it takes two record at a time, with the same key, and compares them according to rules for the match-script
17.04.2002 Normalization of the text zKøbenhavn’s freds kommité KOBENHAVNS FREDS KOMMITE zHans Krüger HANS KRYGER
17.04.2002 3 different operands zalike znot_alike zalike_or_missing
17.04.2002 Logical fields zA logical field containing data from many subfields ymaintitle = 245*a | 239*t | 240*a & 240*d & 240*e & 240*f & 240*h zA logical field containing only parts of a subfield yauthor = 700*a & 700*h:1 x100 *a Rifbjerg *h Klaus = 100 *a Rifbjerg *h K.
17.04.2002 Edition comparison zWe make a temp-field only with words recognized from the edition field (after it has been text converted) z“EDITION” & ( @digit | “REVISED” | “NEW” ) y250 00 *a 3. ed. *x 12. reprint. = EDITION 3 y250 00 *a 3. ed.,4. rep. = EDITION 3
17.04.2002 Problems zDifferent cataloguing praxis zErrors (typing etc) zMore than one work in the same marc-record yA CD can contains works from many different artist
17.04.2002 Development strategy zThe syntax and features of the match-script has been developed along with the project in collaboration between the libarien and the programmer. zThe libarien had a online test program of the match-script