Presentation is loading. Please wait.

Presentation is loading. Please wait.

ICS-FORTH August 1, 2008 1 Integrated Information Management and Access - new chances for museums, archives and libraries Martin Doerr Foundation for Research.

Similar presentations


Presentation on theme: "ICS-FORTH August 1, 2008 1 Integrated Information Management and Access - new chances for museums, archives and libraries Martin Doerr Foundation for Research."— Presentation transcript:

1 ICS-FORTH August 1, 2008 1 Integrated Information Management and Access - new chances for museums, archives and libraries Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Singapore, August 1, 2008 Center for Cultural Informatics

2 ICS-FORTH August 1, 2008 2  Information Integration – a utility perspective  Museum and Library Information  Key-words, Finding Aids and Thesauri  Do we talk about the same thing?  Understanding events, contexts and stories  CIDOC CRM, simple implementations Integrated Information Management Overview

3 ICS-FORTH August 1, 2008 3 Memory institutions maintain Digital Repositories (“Digital Memories”) Information systems preserving and providing access to primary information sources, scientific and scholarly information and literature, such as digital libraries of publications, indices of archives of social or scientific activities, or documentation of physical collections. Digital Repositories are necessarily heterogeneous to optimize their function for different information forms and access needs, but the knowledge they contain forms a logical whole. To get information and learn from information we need uniform access, retrieval by human criteria and connection of disparate information assets (e.g., painting & biography) Information Integration Management A Perspective of Utility

4 ICS-FORTH August 1, 2008 4 Information integration provides a syntactically and semantically homogeneous layer on top, be it physical or virtual, manual or automated. Multiple standard formats can coexist, if information can be transformed or merged. One format does not ensure that the information is connected! Standardization and transformation go hand in hand. For both, documentation (metadata) needs to be provided, adapted or “cleaned”: legacy data to standard form, from one standard to another, “tune” data so that they can be transformed. Ultimate integration cost: manual creation/ adaptation of metadata. Better integration is not always more work, but needs more foresight. Bad decisions cost most. Information Integration Management A Perspective of Utility

5 ICS-FORTH August 1, 2008 5  Levels of Integration: From one platform, I can… 1. read everything, if I have the ID : syntactic integration, The Web 2. get everything that refers to the words X, Y, Z: Google and others 3. get everything about a particular person, thing, place, fact, or concept 4. learn, if there are things, facts with given characteristics 5. learn about associations and contexts of things across documents For instance,  What species is this object?  Which professions had the relatives of van Gogh? Which where the clients of van Gogh’s paintings?  Were German soldiers in Russia before WWII ?  Which antique art objects may Michelangelo have seen? (25 years project !) Information Integration Management A Perspective of Utility

6 ICS-FORTH August 1, 2008 6  The traditional library task:  Collect and preserve documents and provide finding aids  The job is solved, when the (one, best) document is handed out. “All you need is in this document”.  But understanding lives from relationships. Museum information has complex relationships. Relationships may be categorical or factual:  Categorical (e.g., “smoking causes cancer”). : Richly exploited by Semantic Web technology. Use and integration limited to research results. Not useful for primary research itself.  Factual associations concatenate information assets to meaningful (“epistemic”) networks (“stories”): support context-based hypothesis building, cross-disciplinary search etc. (e.g. “John smoked with 20”, …30.. 40”. “John had lung cancer with 60”) Information Integration Management A Perspective of Utility

7 ICS-FORTH August 1, 2008 7  The typical library contents: “The whole stories”  Secondary literature (research results)  Facts brought into causal context  Categorical: theories and hypotheses  Fiction.  The typical archive contents: “The needle in the haystack”  Primary sources, “bits and pieces” (letters, legal documents, administration acts, images, scientific records).  factual, kept in the sequence of creation, as by the creator or responsible.  The typical museum information: “Museum objects rarely talk”  Factual documentation of properties and context per object, references, classification  Highly heterogeneous, disparate. Information Integration Management Library, Archive, Museum Information

8 ICS-FORTH August 1, 2008 8 Museum Information “ A Monet is not like a Dinosaur”  Museum objects may be:  Unique in form, valuable out of context — Valued art objects: “La Pie by Monet”, aesthetic minerals, exceptional life forms, curiosities. Unique by particular context, not valuable out of context, valuable only as illustration or symbol, — Historical heirlooms, relics of saints, “John Lennon’s T-Shirt” Not unique, not particularly valuable. Used as example of a category out of the particular context — Most objects in Natural History, ethnology, archeology. Unique by rarity, valuable as evidence out of a particular context — Most objects in paleontology, many unique archeological objects: “6th left rib from a T. Rex”

9 ICS-FORTH August 1, 2008 9 Information Integration Management The Museum Information Problem  The ultimate goal of users seeking information is not to get an “object” but to understand a topic.  Understanding lives from relationships:  objects are interpreted by context (e.g., bone finds in Evan’s “bathtubs”)  contexts are interpreted by objects (e.g., many arrowheads in Troy IV)  objects are interpreted by categories (e.g., Evan’s Minoan “bathtubs”)  categories are supported by examples (e.g., the shape of a kris)  categories may be based on rare evidence (e.g., a hominid tooth)  We need to integrate museum, archives, libraries in a sensible way to find integrated knowledge and produce new knowledge, to provide evidence for new hypotheses or verify or challenge old hypotheses.

10 ICS-FORTH August 1, 2008 10  Museum and library information has complex interrelations. Museum and library information overlaps, and otherwise is different.  Libraries document literature in order to facilitate access to it.  Museum documentation classifies and describes museum objects, their context and relevance. It refers to literature. Museums produce regularly (secondary) literature.  Museum objects are referred to and published in literature. Literature may describe museum objects, their context and theories about and related to them. Literature describes concepts that are exemplified or illustrated by museum objects. No standard documentation format yet for that!  Libraries may also produce literature. Libraries may document and curate rare objects as museums do. Most museums maintain libraries. Information Integration Management Library and Museum Information

11 ICS-FORTH August 1, 2008 11 Libraries Museums Archives illustrate, exemplify refer to Books Objects primary Documents provide finding aids are about document features & context provide finding aids make narratives from publish using Information Integration Management Archive, Library and Museum Information

12 ICS-FORTH August 1, 2008 12 Key-words, Finding Aids and Thesauri The second level of integration  Why is Google (i.e. Search Engines!) good?  Low cost, no data tuning, scalable  Find easily secondary literature, esp. if abundant  Find things by usual category names  No user training, no access language => Recommendation: You should always provide a good search engine !  Why is Google bad?  User must know all synonyms  Names are not things: Rare things are covered under frequent names (e.g., “George Bush”, a S/W called “Volcano”)  Relations only by aggregation of terms appearing in the source (e.g., “First known Turkish - Greek marriage in Crete” (1635) ),  No control on relevance, no statistics possible, no related sources

13 ICS-FORTH August 1, 2008 13  Finding Aids:  Assumption: User knows a topic, characterized by a noun, or knows associations of the topic uncorrelated to the problem to be solved (e.g. “organic farming” for “host-parasite studies”, an author for a topic, or: search object by date of acquisition, because I don’t remember the name)  Dublin Core Metadata Elements makes 15 relationships to terms explicit (type, classification, creator, publisher, date, format etc.)  It increases precision  It increases recall if additional terms in the metadata are added Key-words, Finding Aids and Thesauri The second level of integration

14 ICS-FORTH August 1, 2008 14  Is Dublin Core better than Google?  Literature search by Author-Title: Google is sufficient or better  Type, format, subject, coverage: DC only better if terms not in the content  Relationship: DC better if not connected by relevant term cluster  Non-verbose, non-digital objects: DC provides the minimal metadata!  By Shakespeare or about Shakespeare: DC disambiguates!  What Dublin Core does not?  Not appropriate for museum objects (no place, finding info, material)  No typed relationships, no context information  No notion of identity (separation of URI and name, American library tradition) => DC has significant benefit for non-verbose digital objects. Key-words, Finding Aids and Thesauri The second level of integration

15 ICS-FORTH August 1, 2008 15 Key-words, Finding Aids and Thesauri The second level of integration  Thesauri of controlled terms (categories)  Subjects, object types, place types, person roles, event types  Good for secondary literature search, metadata fields (libraries!)  Bad: A “new language” users must learn, expensive to create  invisible thesauri enhance search engines  “Museums do not like thesauri”:  Not suited for factual knowledge!!  Cultural terminology is a dynamic research tool (“every PhD a new typology”) to conclude from form to function or time etc.  Only few high-level terms are stable and useful for finding aids Recommendation: Small thesauri for museums (that users can see on one page) increase power of metadata and improve search results.

16 ICS-FORTH August 1, 2008 16 Do we talk about the same Thing? Co-reference can connect documents! Such networks hide stories! (complementary information) ? ? ?

17 ICS-FORTH August 1, 2008 17 Integration by Factual Relations Ethiopia Johanson's Expedition CIDOC CRM Core Ontology Documents in Digital Libraries Hadar Discovery of Lucy AL 288-1 Lucy Deductions Linking documents by co-reference Primary link corresponding to one document Donald Johanson Cleveland Museum of Natural History Instance of real world nodes (KOS) Do we talk about the same Thing? Hypertext is wrong: Documents contain links!

18 ICS-FORTH August 1, 2008 18................ match Authority service........ local ids Content LinktableLinktable match Source 1 Source 2 local ids id Dyn amic li nk Join Join across sources by transitivity of co-reference query input: “Martin” output: “George” “Κώστας” / “Kostas” Do we talk about the same Thing? Co-reference links via authority files Not scalable! Find “friends of a friend”

19 ICS-FORTH August 1, 2008 19........ match........ local ids Content make a co-reference Source 1 Source 2 local ids Join Join across sources by transitivity of co-reference query input: “Martin” output: “George”........ local ids make a co-reference Do we talk about the same Thing? Co-reference links without authority files Find “friends of a friend” “Κώστας” / “Kostas”

20 ICS-FORTH August 1, 2008 20 Do we talk about the same Thing? The third level of integration  Do we talk about the same thing?  Documents are connected if they refer to the same things people, places, events = “Co-reference”. The hypertext model is wrong.  Authority files cannot catch up, they simplify procedure but do not solve it. The scale is incredible.  Curation of direct co-reference links (co-reference clusters) needed.  Not more expensive than a search engine index  Duplicate detection, data cleaning and Web 2.0 methods can help massively generate co-reference links Recommendation: Prepare for co-reference in documentation practice! (tag names, link locally etc. )

21 ICS-FORTH August 1, 2008 21 Understanding Events, Contexts, Stories The Fourth Level of Integration  So far, by integration nothing learned yet beyond what I manually collect from each source.  Co-reference: Allows for tracing stories, but not for querying stories.  Understanding lives from relationships.  Is there a global model of relationships? (social, economic, material, geographic, biological relations…, thousands of documentation formats)  Dominance of the mesoscopic, human activity scale.  Identification, classification, part-whole, reference, participation in meetings => these relations integrate museum and library information!  Confirmed by museums, e-science, historians.

22 ICS-FORTH August 1, 2008 22 Information Integration Management Context as a network of related “meetings” space time “LAOKOON” (copy) (in Vatican museum) Winkelmann “…noble simplicity, silent grandeur…” (in a library) Winkelmann’s birth Winkelmann’s death Winkelmann sees “Laokoon” Winkelmann writes…. Winkelmann’s mother unknown Roman copies “Laokoon” “LAOKOON” unknown Roman Greece RomeGermany (archive information) Published Inference (in a library)

23 ICS-FORTH August 1, 2008 23 The CIDOC CRM ISO21127 The CIDOC Conceptual Reference Model (ISO21127:2006)  Developed by the CRM Special Interest Group of the International Committee for Documentation (CIDOC) of the International Council of Museums (ICOM), following an initiative of ICS-FORTH, Heraklion, Crete.  Is an extensible core ontology describing the underlying semantics of over a hundred database schemata and structures from all museum disciplines, archives and libraries. (Now extended by FRBR OO, modeling IFLA’s FRBR).  It is result of 15 years interdisciplinary work and agreement.  In essence, it is a generic model of recording of “what has happened” in human scale, i.e. a class of discourse.  By it we can generate huge, meaningful networks of knowledge by a simple abstraction: history as meetings of people, things and information.  It bears surprise: Minimal or no specialization allows for covering new domains.

24 ICS-FORTH August 1, 2008 24 The CIDOC CRM Historical Archives…. Type:Text Title: Protocol of Proceedings of Crimea Conference Title.Subtitle: II. Declaration of Liberated Europe Date: February 11, 1945. Creator:The Premier of the Union of Soviet Socialist Republics The Prime Minister of the United Kingdom The President of the United States of America Publisher:State Department Subject:Postwar division of Europe and Japan “ The following declaration has been approved: The Premier of the Union of Soviet Socialist Republics, the Prime Minister of the United Kingdom and the President of the United States of America have consulted with each other in the common interests of the people of their countries and those of liberated Europe. They jointly declare their mutual agreement to concert… ….and to ensure that Germany will never again be able to disturb the peace of the world…… “ Documents Metadata About…

25 ICS-FORTH August 1, 2008 25 The CIDOC CRM Images, non-verbose objects… Type:Image Title: Allied Leaders at Yalta Date: 1945 Publisher:United Press International (UPI) Source:The Bettmann Archive Copyright:Corbis References:Churchill, Roosevelt, Stalin Photos, Persons Metadata About…

26 ICS-FORTH August 1, 2008 26 The CIDOC CRM Places and Objects TGN Id: 7012124 Names: Yalta (C,V), Jalta (C,V) Types: inhabited place(C), city (C) Position: Lat: 44 30 N,Long: 034 10 E Hierarchy: Europe (continent) <– Ukrayina (nation) <– Krym (autonomous republic) Note: …Site of conference between Allied powers in WW II in 1945; …. Source: TGN, Thesaurus of Geographic Names Places, Objects About… Title: Yalta, Crimean Peninsula Publisher: Kurgan-Lisnet Source: Liaison Agency

27 ICS-FORTH August 1, 2008 27 The CIDOC CRM Explicit Events, Object Identity, Symmetry P14 performed P11 participated in P94 has created E31 Document “Yalta Agreement” E7 Activity “Crimea Conference” E65 Creation Event * E38 Image P86 falls within P7 took place at P67 is referred to by E52 Time-Span February 1945 P81 ongoing throughout P82 at some time within E39 Actor E53 Place 7012124 E52 Time-Span 11-2-1945

28 ICS-FORTH August 1, 2008 28 The CIDOC CRM Data Example (e.g. from Extraction) Transfer of Epitaphios GE34604(entityE10 Transfer of Custody, E8 Acquisition Event P28 custody surrendered by Metropolitan Church of the Greek Community of Ankara P23 transferred title from P29 custody received by Museum Benaki P22 transferred title to Exchangeable Fund of Refugees P2 has type national foundation P14 carried out by Exchangeable Fund of Refugees P4 has time-span GE34604_transfer_time P82 at some time within 1923 - 1928 P7 took place at Greece nation republic P89 falls within Europe continent TGN data P30 custody transferred through, P24 changed ownership through Epitaphios GE34604 (entityE22 Man-Made Object) P2 has type ) E39 Actor(entity ) E39 Actor(entity ) E39 Actor(entity P40 Legal Body ) (entity E55 Type ) (entity E55 Type ) (entity E55 Type ) (entity E55 Type ) (entity Metropolitan Church of the Greek Community of Ankara ) E39 Actor(entity E53 Place ) (entity E53 Place ) (entity E52 Time-Span ) (entity E61 Time Primitive)(entity Multiple Instantiation !

29 ICS-FORTH August 1, 2008 29 The CIDOC CRM Top-level Entities relevant for Integration participate in E39 Actors E55 Types E28 Conceptual Objects E18 Physical Thing E2 Temporal Entities E41 Appellations affect or / refer to refer to / refine refer to / identifie location at within E53 Places E52 Time-Spans

30 ICS-FORTH August 1, 2008 30 The CIDOC CRM Example: The Temporal Entity Hierarchy

31 ICS-FORTH August 1, 2008 31  Identification of real world items by real world names.  Classification of real world items.  Part-decomposition and structural properties of Conceptual & Physical Objects, Periods, Actors, Places and Times.  Participation of persistent items in temporal entities. — creates a notion of history: “world-lines” meeting in space-time.  Location of periods in space-time and physical objects in space.  Influence of objects on activities and products and vice-versa.  Reference of information objects to any real-world item. The CIDOC CRM A Classification of its Relationships

32 ICS-FORTH August 1, 2008 32  Ontologies are formalized knowledge: clearly defined concepts and relationships about real possible states of affairs of a domain. “Semantics” is the world they refer to (“ontological commitment”), and not a set of logical rules! (e.g., what is an event?)  Ontologies describe a reality, independent from context and performance! Information models are not ontologies! They abbreviate, denormalize, select. E.g.: “DC.creator”, “DC.Date”, “birthday/birthplace”, “destination” in the MIDAS schema (UK monuments records).  Ontologies can be understood by people and processed by machines to enable data exchange, data integration, query mediation:  Local information systems may export information in a CRM compatible form (CRM Core or more).  Local information systems may answer queries by a subset of CRM concepts.  Exported information may be merged in another database (“data warehouse”). Complementary information can thus be easily integrated. The CIDOC CRM What is an ontology?

33 ICS-FORTH August 1, 2008 33  There cannot be one database schema for all ALM information. A global core ontology is a high-level explanation, not a format, allowing for automated correlation, mediation, transformation, generation of integrated views.  A particular Installation should have a core schema, compatible with the core ontology, following an informed decision about its integration and access capabilities, for instance, CRM Core, MuseumDat,or a similar CRM-compatible schema. DC and CRM Core can be combined.  With CRM, we know at any time what extension to more functionality means, e.g., FRBRoo/ FRBRCore. (DC extension simply failed!).  CRM Core(or MuseumDat): A low-cost entry to CRM compatibility. — As easy as Dublin Core, but appropriate to relate ALM — start with finding aids — add co-reference – manual, automated, Web 2.0 — add NLP to recover more events. — Add more sophisticated relationships. Interoperability of Museum Information towards a network of knowledge

34 ICS-FORTH August 1, 2008 34 Interoperability of Museum Information CRM Core metadata elements

35 ICS-FORTH August 1, 2008 35 E52 Time-Span 1898 E53 Place France (nation) E21 Person Auguste Rodin E52 Time-Span 1840 E67 Birth Rodin’s birth E52 Time-Span 1917 P4 has time-span E69 Death Rodin’s death E12 Production Rodin making “Monument to Balzac” in 1898 E21 Person Honoré de Balzac E55 Type sculptors E84 Information Carrier The “Monument to Balzac” (plaster) E55 Type plaster E52 Time-Span 1925 E55 Type bronze E40 Legal Body Rudier (Vve Alexis) et Fils E12 Production Bronze casting “Monument to Balzac” in 1925 E55 Type companies E84 Information Carrier The “Monument to Balzac”(S1296) P108B was produced by P62 depicts P16B was used for P134 continued P2 has type P120B occurs after P4 has time-span P2 has type P100B died in P98B was born P4 has time -span P2 has type P14 carried out by P62 depicts P108B was produced by P2 has type P7 took place at P4 has time-span Interoperability of Museum Information Integration with CRM Core (Network View)

36 ICS-FORTH August 1, 2008 36 Work (CRM Core). Category = E84 Information Carrier Classification =sculpture (visual work) Classification =plaster Identification =The Monument to Balzac (plaster) Description =Commissioned to honor one of France's greatest novelists, Rodin spent seven years preparing for Monument to Balzac. When the plaster original was exhibited in Paris in 1898, it was widely attacked. Rodin retired the plaster model to his home in the Paris suburbs. It was not cast in bronze until years after his death. Event Role in Event =P108B was produced by Identification= Rodin making Monument to Balzac in 1898 Event Type = E12 Production Participant Identification =Rodin, Auguste Identification =ID: 500016619 Participant Type = artists Participant Type = sculptors Date = 1898 Place = France (nation) Related event Role in Event =P134B was continued by Identification= Bronze casting Monument to Balzac in 1925 Event Role in Event =P16B was used for Identification= Bronze casting Monument to Balzac in 1925 Event Type = E12 Production Participant Identification =Rudier (Vve Alexis) et Fils Participant Type = companies Thing Present Identification =The Monument to Balzac (S.1296) Thing Present Type =bronze Thing Present Type =sculpture (visual work) Date = 1925 Related event Role in Event =P120B occurs after Identification= Rodin's death Relation To = Honore de Balzac Relation type refers to Artist (CRM Core). Category = E21 Person Classification = artists Classification = sculptors Identification =Rodin, Auguste Identification =ID: 500016619 Event Role in Event =P98B was born Identification= Rodin‘s birth Event Type = E67 Birth Date = 1840 Event Role in Event =P100B died in Identification= Rodin‘s death Event Type = E69_Death Date = 1917 Related event Role in Event =P120 occurs before Identification= Bronze casting Monument to Balzac in 1925 Metadata View

37 ICS-FORTH August 1, 2008 37 The CIDOC CRM Why an Integration layer on Top?  Information acquisition needs: — sequence and order, completeness, case-specific language and constraints to guide and control data entry. — ergonomic documentation units, optimized to specialist needs — work-flow on series of analogous items, item-centric. — Low interoperability needs (capability to be mapped!)  Integration / comprehension needs epistemic networks: — break up document boundaries, relate facts to wider context, — match shared identifiers of items, aggregate alternatives — no preference direction of search, no cardinality constraints. — High interoperability needs (mapping to a global schema)  Interpretation, story-telling, hypothesis building — explore context, paths, analogies (orthogonal to data acquisition) — present in order, resolve alternatives (enforce constraints) — deduction and induction

38 ICS-FORTH August 1, 2008 38 Epistemic Networks on DLs Metadata at sources and indirect co-reference links surrogate nodes Core Ontology (e.g., CIDOC CRM) Sources LucyJohanson's Expedition Donald Johanson Hadar indirect co-reference links extracted, normalized metadata Easy update Scalable, peer-to-peer Slow querying, Concatenation of facts, Alternatives management Ethiopia?

39 ICS-FORTH August 1, 2008 39  Historical information is factual and contextual. Metadata formats for cultural heritage data must be adequate to the scientific discourse.  We need small thesauri for museums. Better invest in Gazetteers (placenames), and authority files.  CRM Core already captures first sensible Museum-Archive-Library connection. Immense benefit over Dublin Core, with similar effort.  The co-reference problem is widely ignored (or even feared ?). Its scale is extraordinary. Traditional KOS and data cleaning are not enough. We need Web 2.0 methods.  Capacity to link and transform information is crucial to integrate information in long-terms, beyond platforms. The CRM shows how to do that. Understand the historical perspective of information. Interoperability of Museum Information Conclusions


Download ppt "ICS-FORTH August 1, 2008 1 Integrated Information Management and Access - new chances for museums, archives and libraries Martin Doerr Foundation for Research."

Similar presentations


Ads by Google