Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Copyright 2011 TopQuadrant Inc. Slide 1 Evolving Practices of Linked Data Irene Polikoff, TopQuadrant June 29-30, 2011 W3C Government Linked Data Working.

Similar presentations


Presentation on theme: "© Copyright 2011 TopQuadrant Inc. Slide 1 Evolving Practices of Linked Data Irene Polikoff, TopQuadrant June 29-30, 2011 W3C Government Linked Data Working."— Presentation transcript:

1 © Copyright 2011 TopQuadrant Inc. Slide 1 Evolving Practices of Linked Data Irene Polikoff, TopQuadrant June 29-30, 2011 W3C Government Linked Data Working Group

2 © Copyright 2011 TopQuadrant Inc. Slide 2 What is data? Data has: value type structure units of measure encoding bit and byte order Not a topic of this presentation but many questions relevant to interpretation of data depend on the attributes of the data

3 © Copyright 2011 TopQuadrant Inc. Slide 3 What is Linked Data? A set of best practices for publishing and connecting structured data on the Web A method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried.HTTPURIs

4 © Copyright 2011 TopQuadrant Inc. Slide 4 How is LD publishing being done today? SPARQL endpoints Making static serialized RDF available at a URL URL that corresponds to the base namespace? Content negotiation (person gets HTML document, machine gets RDF) Structured markup embedded in HTML (RDFa, microdata, microformats) Provided as a meta tag link in an HTML page pointing to the corresponding RDF file Zipped RDF files downloadable from the web ???

5 © Copyright 2011 TopQuadrant Inc. Slide 5 What factors influence LD publishing decisions? Available infrastructure and its constraints Cost Data consumers preferences Size of the data being published Frequency of change Skills and knowledge of the data publisher W3C recommendations ???

6 © Copyright 2011 TopQuadrant Inc. Slide 6 A data consumer viewpoint – in favor of SPARQL endpoint At the latest EBI Industry Day industry reps requested EBI curated content to be made available as SPARQL endpoint as opposed to, e.g., published as a large download, or being re-hosted by a third party The following arguments were made: Ease of access. The datasets are very large, and are updated regularly. Download of an entire dataset is time consuming and costly. A (high-performance) SPARQL endpoint allows a client to specify just what data they want, and get it in a just-in-time manner. Currency. The datasets change often, users want to know that they have the latest version, without having to perform tedious checks at every access. Authority. The users of this data trust the EBI curation for this data, and don't know if they can trust a third party. Was the data corrupted? Is it the version it claims to be?

7 © Copyright 2011 TopQuadrant Inc. Slide 7 Each publishing approach requires guidance on best practices For example, for Content negotiation: How does a client identify its requirements (RDF/XML, Turtle, HTML, SPARQL endpoint)? The Turtle submission suggests mimetype text/turtle for turtle. What types of content can be negotiated? (SPARQL endpoint? RDF/XML? Turtle? NTriples? OWL/XML?) Must all negotiated variants contain the same information? What does this mean, when different formats have different interpretations (e.g., OWL/XML vs. Turtle)? Must all negotiated variants have the same prefix definitions? What about forms that don't have a notion of prefixes (NTripes, HTML)? And in a more general sense: How are versions managed? (e.g, using owl:versionInfo)? How are the URLs for various versions managed? If one dataset uses resources from another, how does it indicate this? Just use it? rdfs:seeAlso? owl:imports? What is the appropriate behavior of a client in these situations? Is there any relationship between the location at which a file is found and the URIs it describes? How about its base URI, owl:Ontology or default namespace?

8 © Copyright 2011 TopQuadrant Inc. Slide 8 An example of what we may see when we look at the published data "http://my.site.com/#Recoveries" Recoveries Recouvrement Bankruptcies Debtors Seizure (of property) The regaining of something of value, such as property or funds lent, as a result of special efforts by the owner or creditor. EC Economics and Industry "http://my.site.com/#Recovery%20plans%20%28Environment%29" Recovery plans (Environment) Environmental management NE Nature and Environment "http://my.site.com/#Recreation" Recreation Loisir Entertainment Hobbies Leisure Recreational activities Games Recreational facilities Sports Tourism Toys Outdoor recreation An activity that diverts, amuses or stimulates usually done in one's spare time. SO Society and Culture Government of Canada Core Subject Thesaurus

9 © Copyright 2011 TopQuadrant Inc. Slide 9 Issues with the example Minting new URIs in someone elses namespace e.g., skos:UsedFor, skos:SubjectCategory, etc. Providing no type definitions for the new URIs (Possibly) making errors in URIs did they mean skos:scopeNote or skos:ScopeNote? (Potentially) misusing URIs did they meant skos:narrower when they said skos:NarrowerTerm, if so, it is an object property Inventing a way to do language tags, e.g., skos:French perhaps, because they are not aware of how to do this correctly Not following a convention of lower camel case for properties Not linking their own data skos:NarrowerTerm and skos:RelatedTerm are all strings

10 © Copyright 2011 TopQuadrant Inc. Slide 10 One possible guideline or test Assuming that information about a resource should be found at the place it resolves to, then a resource like: skos:RelatedTerm should be available at which it isn't

11 © Copyright 2011 TopQuadrant Inc. Slide 11 Looking at examples helps There will be issues the working group would not have thought possible Understanding these will provide the needed scope/level of details for best practices One good resource is the Pedantic Web Group:

12 © Copyright 2011 TopQuadrant Inc. Slide 12 More questions to address - 1 If you use someone elses vocabulary, do you include type declarations, effectively replicating information? It is very common to see included something like: foaf:Person a owl:Class What role do imports play, if any, in Linked Data publishing? What name do we give to a set of graphs (ontologies) that belong together e.g., skos and skos-xl, QUDT ontology collection What should be a relationship between their URIs/namespaces? TQ has build grammars to resolve this

13 © Copyright 2011 TopQuadrant Inc. Slide 13 More questions to address - 2 What information should be returned for a resource? All triples that it is a subject of? What about back links? What about if a resource is a class? How to express vocabulary and data mappings? owl:sameAs, owl:equivalentClass, etc. are commonly used, sometimes, without understanding semantic commitment SKOS mapping properties are an alternative What about more complex mappings – at TQ, we use SPIN (SPARQL) maps

14 © Copyright 2011 TopQuadrant Inc. Slide 14 Thank You Irene Polikoff


Download ppt "© Copyright 2011 TopQuadrant Inc. Slide 1 Evolving Practices of Linked Data Irene Polikoff, TopQuadrant June 29-30, 2011 W3C Government Linked Data Working."

Similar presentations


Ads by Google