DIGITIZATION OF RARE LIBRARY MATERIALS Metadata -Introduction Mark-up © Adolf Knoll, National Library of the Czech Republic.

3 CZ D I LV LT car lorry car motorcycle OK aircraft

4 Marked up categories of objects All are means of transport. Marked the country (CZ, LV, D, I, OK, LT) Marked the type (car, motorcycle, lorry, aircraft) However, the way of marking is different: if the type expresses the means of road transport, the Czech Republic is marked as CZ, but if it is a flying object, it is marked OK.

5 How is driven the mark-up? The same object can belong to different categories marked differently. This is done on the basis of object properties. That property considered as decisive is taken as background for classification. We cannot foresee the number of such decisive properties in many cases.


7 Text processing To display text on a page, it must be arranged graphically. If the text is electronic, the characters organized in such a way to form sequences divided by blanks must be driven by some tools to be displayed or printed where we want. There are several possibilities how to do it:

8 Text processing Page Description Language, e.g. Postscript Text editors: making a paragraph in obsolete editors: break line + add an empty line + indent making a paragraph in modern editors: say that a block of a text is a ¶paragraph¶ The paragraph is marked, but what to do with it?

9 Text processing Mark-up and behaviour An object can be marked by a sequence of characters (a Latvian car marked as LV), by a symbol or sign (a Latvian man marked by, a paragraph marked by ¶). Under certain conditions, we may need to assign some behaviour to the objects marked in a certain way. Thus it is evident that we need some behavioural information somewhere to tell to the identically marked objects how to behave:

10 Text processing Mark-up and behaviour During an ice-hockey championship men marked by will play against men marked by. Cars marked by LV will undergo other mandatory technical control than the cars marked by CZ.

11 Text processing Mark-up and behaviour In a good text editor, e.g. MS World, the formatting of the marked object (paragraph) is set separately (indented or not, how many dots after or before the paragraph, etc.) In the web language, HTML, this is analogue: the paragraph is marked as shown, while the web browser knows that it must be displayed on a separate line after some space is omitted.

12 What is marked up? We have seen that objects are marked up. These objects can be objects from the real world or their representations. The objects can be represented by their denominations, which - when written - are mere sequences of characters. However, they can be also represented by their images or symbols and by the sound.

13 Object Car

14 Properties of the object INSECT It may be necessary to mark also some other properties of the object, which may be relevant to group or to classify its concrete representations. beetle fly butterfly spider

15 Concrete INSECT The concrete insect can be beetle that is lady- bird, goldsmith-beetle, longicorn beetle, or may-bug, etc. This is its name, which is different in different languages: in Czech, for example, the above sequence of beetles have names as beruška, zlatohlávek, tesařík, chroust. However, the differences of names do not affect the correctness of content mark-up.

16 Summing up It is evident that marking an object we should distinguish between: the mark-up of the content the complementary properties of the marked object the assigned names to the object the information about how such an object should behaved if activated (display, printing, projection, …)

17 How to describe the object? INSECT BEETLE PICTURE Colorado Beetle INSECT BEETLE TEXT

18 How to prescribe behaviour to the described object? The behaviour is prescribed by special formatting - in this case - rules. This behaviour is separate from the mark-up of the contents. The formatting rule can take only the representations of the objects and mark up their behaviour if these representations are activated.

19 Among the images used for my Power Point presentation, there is also this one: Colorado Beetle This beetle is very nice.

20 The problems At the formatted output here, we have lost the description of the contents, which are necessary for other kind of work. It is evident from this that such a kind of output cannot be taken as the only existing source data. However, it can be admitted that it one of possible appearances of source data.

21 Source data and access Source Data Appearance no. 1 Appearance no. 3 Appearance no. 2 Direct and simple look inside the source data is desirable

