Presentation is loading. Please wait.

Presentation is loading. Please wait.

Content and Bibliographic Theory CS 431 Architecture of Web Information Systems Carl Lagoze Cornell University Acks to H. Van de Sompel.

Similar presentations


Presentation on theme: "Content and Bibliographic Theory CS 431 Architecture of Web Information Systems Carl Lagoze Cornell University Acks to H. Van de Sompel."— Presentation transcript:

1 Content and Bibliographic Theory CS 431 Architecture of Web Information Systems Carl Lagoze Cornell University Acks to H. Van de Sompel

2 People want stuff. Godfrey Rust 1999

3 Where did I put that file?

4 Where is that information?

5 Am I getting compensated for my talent? Copies? Derivations? Contributions?

6 Is that available in a way I can use it?

7 Are there other resources like these?

8 Content, Data, Metadata -- informal definitions Content refers to resources as information that is of interest to a user. It is the human view of information: musicBeethoven's Fifth Symphony databaseGenome Database literatureGone with the wind web siteweather.com softwareMS Word

9 Content, Data, Metadata -- informal definitions Data emphasizes on the bits and bytes to be processed by a computer. It is the computer representation of information: bit and byte layout (e.g., ASCII) compression schemes (e.g., MP3) image format (GIF, JPEG, PNG)

10 Content, Data, Metadata -- informal definitions Metadata is data about data/content. Descriptive metadata (e.g., catalog records) Administrative metadata (e.g. circulation records) Structural metadata (e.g. serials record) Rights metadata (e.g. shrink wrap license)

11 Information vs. Data – Formal basis Claude Shannon – Problem of noisy communication channels Entropy –Informally - measure of the amount of information in a data transmission –Amount of disorder in a system. –Proportional to the uncertainty of the recipient of a data stream about the content of the message –Implications Same information can be encoded in multiple data streams Size of data stream necessary for given content is proportional to entropy

12 Bibliographic model provides a user with an organized view of content/information/data in a collection object = piece of content bibliographic system collection descriptive metadata: works creators subjects objectives?

13 Objectives of a bibliographic system 1. To locate objects in a file or database as the result of a search using attributes or relationships of the objects: To find a singular object (known item search) To locate sets of objects representing (search): All objects corresponding to the same work, expression, manifestation All objects by a given author All objects about a given author All objects on a given subject All objects published by a given publisher All objects defined by other criteria (cf. IFLA entities)

14 Objectives of a bibliographic system 2. To identify an object (i.e. confirm that a described object corresponds to the sought object or distinguish between objects with similar characteristics) 3. To select an object that is appropriate to the user’s need 4. To obtain access to an object (purchase, loan, license, …) 5. To navigate the file or database (browse)

15 Traditional models challenged by networked digital information Scale of corpus or collection Variety of content – Internet Commons Unbinding of information from its carrier Mutability of data ‘universal context’ – all types of people, resources, needs Requires more advanced data models to represent: Distinct entities Their relationships Their evolution over time

16 Variants of information entities: to be reflected in bibliographic system Psycho Killer The score by David Byrne The original recording by Talking Heads Psycho Chicken (cover) by The Fools Herbert’s personal copy of that single A live performance by the Fools in 1981 The 45 RPM single released in 1979

17 IFLA Model to represent object variants: entities Entities are the key objects of interest to users of bibliographic data (i.e. of a bibliographic system): Group 1 - products of intellectual endeavor: work, expression, manifestation, item Group 2 – the parties responsible for the intellectual content: person, corporate body Group 3 – the subjects of intellectual endeavor: concept, object, event, place IFLA model is a conceptual framework. It does not provide rigorous definitions

18 IFLA Model: work, expression, manifestation, item A work is an abstract entity, an idealization e.g. The Iliad The Weather Channel web site Beethoven's Fifth Symphony Unix operating system The Bible This is roughly equivalent to the concept of "literary work" used in copyright law.

19 IFLA Model: work, expression, manifestation, item An expression is a realization of a work; a representation of the work in a disseminatable form e.g. The Iliad has oral expressions and written expressions A musical work has a score, live performance(s), an original recording, cover(s),.... Many works have only a single expression, e.g. a web page, or a book which only has a single edition, a painting, a medieval manuscript.

20 IFLA Model: work, expression, manifestation, item A manifestation is the concrete embodiment of an expression; it reflects physical form e.g. The text of The Iliad has been manifested in numerous manuscripts and printed books. A musical recording can be distributed on CD, cassette, or on a soundtrack of a DVD.

21 IFLA Model: work, expression, manifestation, item When many copies are made of a manifestation, each copy is a separate item, e.g. the Cornell Library’s copy of an edition of the Iliad your copy of the latest Norah Jones CD

22 work Psycho Killer CONTENTCONTENT IFLA Model: work, expression, manifestation, item

23 work expression Psycho Killer The score by David Byrne The original recording by Talking Heads Psycho Chicken (cover) by The Fools CONTENTCONTENT IFLA Model: work, expression, manifestation, item

24 work expression manifestation Psycho Killer The score by David Byrne The original recording by Talking Heads Psycho Chicken (cover) by The Fools CONTENTCONTENT PHYSICALPHYSICAL A live performance by the Fools in 1981 The 45 RPM single released in 1979 IFLA Model: work, expression, manifestation, item

25 work expression manifestation item Psycho Killer The score by David Byrne The original recording by Talking Heads Psycho Chicken (cover) by The Fools CONTENTCONTENT PHYSICALPHYSICAL Herbert’s personal copy of that single A live performance by the Fools in 1981 The 45 RPM single released in 1979 IFLA Model: work, expression, manifestation, item

26 work expression manifestation item An theory in high energy physics A peer-reviewed paper … An oral presentation… A preprint … CONTENTCONTENT PHYSICALPHYSICAL The copy of the TeX version on the Italian mirror of arXiv.org TeX version posted by the author to arXiv.org PDF version created by arXiv.org IFLA Model: work, expression, manifestation, item

27 Why should we care about all this? Matches a cognitive model of our information seeking and usage behavior Impacts intellectual property interests and laws Drives preservation decisions

28 Implications for Preservation Preserve the data or the information? Bit Preservation –The exact representation of the information is critical –Focus on strategies such as media longevity and migration and emulation of tools to interpret the bits Information Preservation –Content is more important than the bits –Focus on strategies such as migration of content to newer but ‘equivalent’ formats

29 Information equivalence

30 Preserving the content vs. the data

31 References IFLA Study group on the Functional Requirements for Bibliographic Records: final report. 1998. http://www.ifla.org/VII/s13/frbr/frbr.pdf (chapter 3) Svenonius, E. 2000. The Intellectual Foundation of Information Organization. MIT Press. http://www.netlibrary.com/SUMMARY.ASP?EV=1627610&I D=39954 (part of chapter 2)


Download ppt "Content and Bibliographic Theory CS 431 Architecture of Web Information Systems Carl Lagoze Cornell University Acks to H. Van de Sompel."

Similar presentations


Ads by Google