Lis512 lecture 6 identifiers, dublin core and RDF.

Slides:



Advertisements
Similar presentations
Ali Alshowaish. dc.coverage element articulates limitations in the scope of the resource, typically along the following lines: geographical, temporal,
Advertisements

THE DONOR PROJECT Titia van der Werf-Davelaar. Project Financed by: Innovation of Scientific Information Provision (IWI) Duration: –phase 1: 1 may 1998.
DC Architecture WG meeting Monday Sept 12 Slot 1: Slot 2: Location: Seminar Room 4.1.E01.
A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background The.
Metadata vocabularies and ontologies Dr. Manjula Patel Technical Research and Development
Andy Powell, Eduserv Foundation Feb 2007 The Dublin Core Abstract Model – a packaging standard?
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
LIS512 lecture 2 relational databases Thomas Krichel
Developing a Metadata Exchange Format for Mathematical Literature David Ruddy Project Euclid Cornell University Library DML 2010 Paris 7 July 2010.
CS570 Artificial Intelligence Semantic Web & Ontology 2
RDF Tutorial.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
LIS654 lecture 5 DC metadata and omeka tables Thomas Krichel
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
1 CS 430: Information Discovery Lecture 17 Library Catalogs 2.
Metadata Xiangming Mu. What is metadata? What is metadata? (cont’) Data about data –Any data aids in the identification, description and location of.
MPEG-21 : Overview MUMT 611 Doug Van Nort. Introduction Rather than audiovisual content, purpose is set of standards to deliver multimedia in secure environment.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
SWAP FOR DUMMIES. Scholarly Works Application Profile a Dublin Core Application Profile for describing scholarly works (eprints) held in institutional.
Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences
1 CS/INFO 430 Information Retrieval Lecture 20 Metadata 2.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Content and Computer Platforms Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers.
Lis512 lecture 4 XML: documents and records. up until now Relational databases can store information that is internal to an organization. But a lot of.
Creating an Application Profile Tutorial 3 DC2004, Shanghai Library 13 October 2004 Thomas Baker, Fraunhofer Society Robina Clayphan, British Library Pete.
Of 41 lecture 4: rdf – basics and language. of 41 RDF basic ideas the fundamental concepts of RDF  resources  properties  statements ece 720, winter.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
LIS654 lecture 5 DC metadata and omeka tables Thomas Krichel
RDF and XML 인공지능 연구실 한기덕. 2 개요  1. Basic of RDF  2. Example of RDF  3. How XML Namespaces Work  4. The Abbreviated RDF Syntax  5. RDF Resource Collections.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Evidence from Metadata INST 734 Doug Oard Module 8.
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
Practical RDF Ch.6 Creating an RDF Vocabulary DongHyuk Im SNU OOPSLA Lab. Shelley Powers, O’Reilly August 19, 2004.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
A centre of expertise in digital information management UKOLN is supported by: Metadata for the People’s Network Discovery Service PNDS.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
1cs The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,
Pete Johnston, Eduserv Foundation 16 April 2007 An Introduction to the DCMI Abstract Model JISC.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lotzi Bölöni.
Interoperability How to Build a Digital Library Ian H. Witten and David Bainbridge.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Entity Relationship Diagram (ERD). Objectives Define terms related to entity relationship modeling, including entity, entity instance, attribute, relationship.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
DC Architecture WG meeting Wednesday Seminar Room: 5205 (2nd Floor)
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Course on persistent identifiers, Madrid (Spain) Information architecture and the benefits of persistent identifiers Greg Riccardi Director Institute for.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
Linked Data Publishing on the Semantic Web Dr Nicholas Gibbins
Dublin Core Basics Workshop Lisa Gonzalez KB/LM Librarian.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
prepared by Dr. Ammar Yakan
The Semantic Web By: Maulik Parikh.
Chapter Eight Interoperability How to Build a Digital Library
Data Management: Documentation & Metadata
Attributes and Values Describing Entities.
Session 2: Metadata and Catalogues
Attributes and Values Describing Entities.
Presentation transcript:

lis512 lecture 6 identifiers, dublin core and RDF

identifiers When you are maintaining your information on your own, you don't exchange it, you don't need to worry about identifiers. Identifiers come into play when you try to communicated to another party that you are talking about the some thing as before or the other party and you mean the same thing.

keys We have already seen keys in a database. A key is simply a column in a table that is unique in for each line in in the table. A key is simply a field in a record that is unique for each record in a table.

identifier An identifier is something a bit different. When two records have the same identifier, they are about the same thing. Being about the same thing calls the philosophers into the room. Things become pretty messy. Discussion on identifiers is often very confused, even among experts.

example Record 1 Name: Thomas Krichel Homepage: Record 2 Name: Thomas Krichel Homepage: Are they about the same person?

a person homepage finder Imagine a home page finder. Ask it what the homepage of Thomas Krichel is. It may say that there are two Thomas Krichels with two different homepages. If there is only one, the service is wrong.

work to do If we have two records that look like we describe the same person, we try to unite them. In a practical scenario, we most of the time have a first Thomas Krichel. When the other Thomas Krichel comes along, we check, make a decision if it is the same person, then update the homepage setting.

human labour To find out if a person is the same as another person is something that no computer can do. We need a person to verify this. Identity can only be controlled by a person. Worst: two people may differ on what “same person” means.

identifier The identifier is a special field that we keep constant over time as long as the record is about the some thing. When we get our first record about Thomas Krichel, we have Name: Thomas Krichel Homepage: Identifier: tk1

changes to the record Thomas gets married Name: Thomas Lechirk Homepage: Identifier: tk1 Later, he decides to change his hompage Name: Thomas Lechirk Homepage: Identifier: tk1

problem Since he is now know as Thomas Lechirk, should he not be given an identifier tl1? Well in general, the resultion of this question depends on the enviroment that we are operating in. Changing the identifier would destroy its value.

value of the identifier If you change the identifier every time you change the record, it is useless. The value of the identifier is that is long-lived. With a good idenifier, users will still be able to find the same thing even when they query the same dataset in hundreds of years.

dumb vs intelligent identifiers Identifiers can be dumb or intelligent. Choosing a scheme for identifiers is a key step at the start of a project. Once we are running our project, changing identifiers can only be accomplished at great expense. Dump identifiers are strings that have no meaning or structure “na$19h-ab9a19]”

dumb indentifiers The advantages are If they are long and complicated enough they may be found by web searches. They are not at risk from lobbying for change. The problem is It is not easy to resolve the identifier into data about.

intelligent identifiers Most identifiers have some form of intelligence. That intelligence says something about the identifier to a person who looks at it. The intelligence may also help a computer program to make decisions.

ISBN An ISBN is assigned to each edition of a book. After 2007, it has 13 numbers, the first 3 are 978 or 978. Before 2007, it had only the remaining 10. Digits 1 to 5 are the group identifier for country or country group. Then follows a publisher identifier. Then follows an item number by the publisher. The parts are often split by hyphens but the hyphen is not part of the actual number.

Uniform Resource Locators URL We all know URLs. Speaking in simplified terms, a URL typically has three components. a mechanism a host a local part, local to the host

URL mechanism The URL mechanism essentially says what to do with the thing when you actually find it. Typical URL mechanisms include – http – ftp – telnet – mailto

host The host is the domain name of a machine. Domain names are a hierarchical scheme of names that can be resolved into IP addresses. Example – openlib.org – fafner.openlib.org – I lease openlib.org.

local part Once the resolution reaches the host it needs to know what to send out. This is some sort of local identifier that the host knows about.

URIs The world wide web consortium, in its quest to make the web a more universal information medium, generalized URLs to URIs. URI are Uniform Resource Identifiers. The don't only allow for locating resources but also for naming them. The structure is similar, but there is a wide variety of mechanisms, called schemes.

example scheme info The info: URI scheme was created by representatives of the library community and the publishing industry as a light-weight way to make existing identifiers available as URIs. has the list of these schemes.

URL examples – mechanism: http – host: openlib.org – local part: /home/krichel/ – mechanism: mailto – host: openlib.org – local part: krichel

DC contributor URI Definition: An entity responsible for making contributions to the resource. Comment: Examples of a Contributor include a person, an organization, or a service. Typically, the name of a Contributor should be used to indicate the entity.

Dublin Core Dublin Core was initially created at a invitational meeting at OCLC in Dublin, OH. The aim was to create a metadata format that non-trained specialists could use the annotate resources on the Web. Nowadays, it serves as a basic common denominator metadata in a wide variety of applications.

DC coverage URI Definition: The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant. Comment: Spatial topic and spatial applicability may be a named place or a location specified by its geographic coordinates. Temporal topic may be a named period, date, or date range.

DC creator URI Definition: An entity primarily responsible for making the resource. Comment: Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity.

DC date URI: Definition: A point or period of time associated with an event in the lifecycle of the resource. Comment: Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme.

DC description Definition: An account of the resource. Comment: Description may include but is not limited to: – an abstract – a table of contents – a graphical representation, or – a free-text account of the resource.

DC format Definition: The file format, physical medium, or dimensions of the resource. Comment: Examples of dimensions include size and duration. Recommended best practice is to use a controlled vocabulary. The MIME types by the IANA are such a vocabulary for files.

DC identifier URI: Definition: An unambiguous reference to the resource within a given context. Comment: Recommended best practice is to identify the resource by means of a string conforming to a formal identification system.

DC language URI: Definition: A language of the resource. Comment: Recommended best practice is to use a controlled vocabulary such as RFC 4646

DC publisher URI: Definition: An entity responsible for making the resource available. Comment: Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity.

DC relation URI: Definition: A related resource. Comment: Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system.

DC source URI: Definition: A related resource from which the described resource is derived. Comment: The described resource may be derived from the related resource in whole or in part. Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system. Apparently only kept for historical purposes.

DC subject URI: Definition: The topic of the resource. Comment: Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. To describe the spatial or temporal topic of the resource, use the DC coverage element.

DC title URI: Definition: A name given to the resource. Comment: Typically, a Title will be a name by which the resource is formally known.

DC type URI: Definition: The nature or genre of the resource. Comment: Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary To describe the file format, physical medium, or dimensions of the resource, use the DC Format element.

DC abstract model Basically, this is something like the FRBR for Dublin Core. It maybe quite useless in practice, but it makes for beautiful theory.

the resource A resource is anything that can be given an identifier. Since a URI can be given to almost everything, this includes a vast array of "stuff". But the important thing is that the resource is actually identified by some identifier, rather than being named. It's essentially same thing as an entity in FRBR.

property / value pair A description occurs when a bunch of attribute / value pairs are associated with a resource. This is like the attributes in FRBR, just that it counts them together with the values.

property values Property values can be of two forms. They can be literals, i.e. human understandable strings. Or they can be resources, which again can be described. When a resource is used as a value, we quote an identifier.

example Resource: – DC.Creator: Thomas Krichel – DC.Format: text/html Note: properties themselves are identified by URIs.

similar: the resource description framework The resource description framework has been developed by the W3C is a way generically represent information, such that it can be processed via the Web. It again is a conceptual model. RDF can be expressed in a variety of syntaxes. The XML syntax of RDF is very ugly.

general idea The general idea is that of a semantic web. Such a semantic web you could ask real questions such as what we will tomorrows weather here, or what athlete won the most gold medals in the Olympic games of Such a system is many years away. Computer scientists don't make much progress with this.

key problem The key problem is identity. You and I may understand what a gold medal and what the Olympic games are. But teaching agreeing on what it precisely means appears to be a matter of great difficulty. Recently, some research, the dbpedia, used wikipedia headings as concept names.