Presentation on theme: "Lis512 lecture 4 the MARC format structure, leader, directory."— Presentation transcript:
lis512 lecture 4 the MARC format structure, leader, directory
MARC 21 MARC 21 is as important example of a record format used in by the library community Integrated Library Systems (ILSs) all import MARC21 records into relational database system export MARC21 records from relational database systems MARC21 records describe records from library catalogues.
documentation I use the documentation provided by the library of congress at http://www.loc.gov/marc/.
warning: the blank The MARC record format uses blanks as values. Since blanks are not usually seen on a slide, I may denote them here by a special sign ␢. Most of the time ‘ ’ does the job. In the documentation at the LoC, they use the # character.
MARC 21 formats There are different format of MARC21 records – MARC 21 format for bibliographic data – MARC 21 format for authority data – MARC 21 format for holdings data – MARC 21 format for classification data We are only studying the first one here and call this the “MARC format” in what follows.
MARC format The MARC format is very complicated. The basic structure is – leader – directory – control fields – data fields The leader and directory are fixed fields. That means they have fixed length. The control and data fields are called variable fields. That means they have variable length.
fixed fields The leader is 24 characters/bytes long. It can not be repeated. Each directory entry is 12 characters/bytes long. Both parts can only contain ASCII characters. There may be many directory entries. Therefore the total length of the directory is not fixed.
MARC leader Described in http://www.loc.gov/marc/bibliographic/bdlea der.html The leader is 24 bytes long. Each byte houses one ASCII character, so we can also say that it is 24 characters long. We count them from 0 to 23.
grouping of leader bytes I have grouped the bytes into “boring” and “interesting” bytes. Boring bytes contain no information. Their values can be calculated by a computer. Interesting bytes contain some information. They have to be filled in by a human.
sub-classification of boring bytes Boring bytes can be classified as fixed boring bytes and variable boring bytes. Fixed boring bytes always contain the same value for any MARC record. Variable boring bytes contain potentially different values for every MARC record. I write “leader/??” when I want to refer to the byte ?? in the leader.
fixed boring byte positions Here are all fixed boring bytes – leader/10 always contains ‘2’ – leader/11 always contains ‘2’ – leader/20 always contains ‘4’ – leader/21 always contains ‘5’ – leader/22 always contains ‘0’ – leader/23 always contains ‘0’ Here is an example with fixed boring bytes underlined: “01178cam a2200313 a 4500”
fixed boring bytes For fixed boring bytes, we need to be able to see where they are. We do not really need to know what they mean since we can’t change them anyway. For the curious, I have explanations in the appendix.
Variable boring bytes These bytes contains length data about the record. Of course, for most records you will find different values. But the values hardly contain any useful information. The boring positions are 00–04 and 12–16. Both are 5 positions long and we study them in turn.
leader positions 00–04 At that position, there appears the length of the entire MARC record. This is the number of bytes, not the number of characters! There are 5 numerical characters. The number is right justified and unused characters contain zeros, as in say “00234”
leader positions 12–16 This contains the position of the start of the data, after the leader and the directory. This is contains the length of the leader plus the length of the directory, plus 1. It is encoded like positions 00–04. The number should be much smaller than the number encountered at 00–04. If you subtract one, you should find a multiple of 12.
finished with boredom Here is an example with fixed all boring bytes underlined: “01178cam a2200313 a 4500” In the interesting bytes, we find codings that give us information about the MARC records. If we create a MARC record, we need to fill them out. Sometimes a system may prepopulate them but we still need to know what they are.
interesting code positions From the example, you will see that the code positions that are interesting come in a rear part 17–19 and it a front part 05–09.
leader position 05 The records status indicates the relationship of the record to a set of records in a file. – ‘a’ The record has been changed to a higher encoding level as recorded in position 17. – ‘c’ Corrected or revised Addition/change other than in the encoding level code has been made to the record. – ‘d’ Deleted Record has been deleted. – ‘n’ New Record is newly input. – ‘p’ Increase in encoding level from prepublication.
leader position 6, slide 1 This gives the type of material. Values are – ‘a’ language material – ‘t’ manuscript language material – ‘c’ notated music – ‘d’ a manuscript notated music – ‘e’ cartographic material – ‘f ‘ manuscript cartographic material – ‘g’ projected medium
leader position 6, slide 2 further values are – ‘I’ nonmusical sound recording – ‘j’ musical sound recording – ‘k’ a two-dimensional non-projectable graphic – ‘m’ a computer file – ‘o’ a kit – ‘p’ mixed materials – ‘r’ a three-dimensional artifact or naturally occurring object
leader position 7 Bibliographic level of the description – ‘a’ monographic component part – ‘b’ serial component part – ‘c’ collection – ‘d’ a subunit – ‘I’ an integrating resource – ‘m’ a monograph – ‘s’ a serial
leader position 8 This gives the type of control. There are two valid values – ‘ ’ –> not specified – ‘a’ the item is described according to archival descriptive rules, which focus on the contextual relationships between items and on their provenance rather than on bibliographic detail. All forms of material can be controlled archivally.
leader position 9 This field indicates the character set and encoding scheme used in the record. There are two valid values – ‘ ’ MARC-8 – ‘a’ the UTF-8 encoding of UCS/Unicode
conclusion about the front part Normally, if we are cataloging books, and keep changing the records, we expect something like ‘cam a’. You are allowed to use that for your record provided you don’t go into multi-part resources.
leader position 17, slide 1 This is the encoding level. It indicates the fullness of the bibliographic information in the record. – ‘ ’ “full level”. It is a complete MARC record created from information derived from an inspection of the physical item. For serials, at least one issue of the serial is inspected. – ‘1’ “full level, material not examined” created from information derived from an extant description of the item, without reinspection of the item. – “2” “Less-than-full level, material not examined” created from an extant description of the material without reinspection of the item.
leader position 17, slide 2 More allowed values for position 17 are – ‘3’ “abbreviated level” meaning a brief record that does not meet minimal level cataloging specifications. – ‘4’ “core level” a less-than-full but greater-than-minimal level cataloging record – ‘5’ “partial (preliminary) level” a preliminary cataloging level record that is not considered final
leader position 17, slide 3 More allowed values for position 17 are – ‘7’ “minimal level” Record that meets the U.S. National Level Bibliographic Record minimal level cataloging specifications and is considered final – ‘8’ “prepublication level” prepublication level record. Includes records created in cataloging in publication programs. – ‘u’ “unknown” – ‘z’ “not applicable” the concept of encoding level does not apply to the record.
leader position 18 This field say what cataloging rules were applied in the descriptive part. Allowed values are – ‘ ’ the description does not follow the International Standard Bibliographic Description (ISBD) cataloging and punctuation provisions. – ‘a’ the description uses the AACR2 – ‘c’ the description follows the ISBD, but the end of subfield punctuation is omitted. – ‘i’ the description follows the ISBD including end of subfield punctuation – ‘u’ unknown
leader position 19 This has the multipart resource record level. This pertains to the situation where the record describes part of a resource or a resource that has many parts. – ‘ ’ not specified or not applicable – ‘a’ the record describes a set of resources – ‘b’ the record describes a resource which is part of a set. The resource has a title that allows it to be independent of the set. – ‘c’ the record describes a resource that is part of a set. The resource does not have a title that makes it understandable separately.
conclusion about the rear part Normally, when we download a MARC record, we expect the rear part to be ‘ a ’, with two blanks surrounding an ‘a’. In the record that we create for class we make it ‘7a ’. Let’s avoid complications with multi- part resources. ;-)
Thank you for your attention. http://openlib.org/home/krichel
MARC directory The MARC directory follows the leader. The directory contains fixed-length information about the variable-length fields. Each directory entry has 12 bytes numbered 0 to 11. For each variable field appearing in the record, there is a directory entry. The MARC directory contains no information.
appendix This is an appendix to the slides. It contains a description of the MARC directory It contains a partial rationale for the fixed boring bytes.
MARC directory bytes 0–2 Bytes 00–02 give the field name. The field names used in MARC are numbers. So you will find three ASCII numeric characters at that place.
the MARC directory bytes 03–06 They give the field length. This is 4 (four) ASCII numeric characters. The fact that there are four of them is determined by character 20 in the MARC leader. The number is right justified and filled by 0s if required. The field length of the variable field, including the field names, the field indicators, subfield codes, data, and the field terminator character.
the MARC directory bytes 07–11 Bytes 07–11 give the starting character position. This five ASCII numeric characters that specify the starting character position of the variable field relative to the base address. The fact that there are five of them is determined by character 21 in the MARC leader. The number is right justified and filled by 0s if required.
leader position 10 This is the indicator count. It says how many bytes are used by indicators to a field. The indicator count is always 2.
leader position 11 This indicates the number of character positions used for each subfield code in a variable data field. This always ‘2’. There are two characters in each subfield. First there is the delimiter ‘$’ and then there is a lowercase or number code for the field.
leader position 20 This is the length of the length-of-field portion. Meaning how long is the length of a field. It is always “4”.
leader position 21 Length of the starting-character-position portion. At this position, we always find the value “5”. This says that in each entry in the MARC directory (to be covered later) the starting character position part is 5 numbers long.
leader position 22 This gives the length of the implementation dependent part of the record. This is always 0. This means that the record is done according to the specification and there is no additional part that is custom made for use on the information system you are working with. If it were there, it would have to be short!
leader position 23 This is the last position. The meaning of the data at this position is undefined. I suppose it could be defined later when we find a need for it. This position is always occupied by a 0.