Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linked Open Data: a new resource for eResearch Dr Anne Cregan eResearch Analyst, Intersect and ANDS

Similar presentations


Presentation on theme: "Linked Open Data: a new resource for eResearch Dr Anne Cregan eResearch Analyst, Intersect and ANDS"— Presentation transcript:

1 Linked Open Data: a new resource for eResearch Dr Anne Cregan eResearch Analyst, Intersect and ANDS anne.cregan@intersect.org.au

2 What this talk will cover Open data The web of data RDF triples RDF graphs The Linked Open Data project Publishing to the web of data Consuming the web of data

3 Open data The philosophy and practice of making data freely available to everyone, without restrictions from copyright, patents or other mechanisms of control.

4 Why make data open? Public money was used to fund the work, so it should be available to the public. Facts cannot legally be copyrighted. Sponsors of research do not get full value for money unless the resulting data are made freely available In scientific research, the rate of discovery is accelerated by better access to data. Source: How to Make the Dream Come True: The Astronomers Data Manifesto (Norris, 2007)

5 How to make open data useful… Principles Make it easy to find Make it available to everyone Separate it from the applications that use it Interlink it with related datasets in a meaningful way Make it machine processable

6 The web of data The web of data = a naming model + a data model on the web It’s a web of interlinked data that machines can read (whereas the web is a web of interlinked documents for people to read) Also known as the “Semantic Web” because of its formal semantics for reasoning and its relationship to meaning

7 The web of data It is an initiative of the World Wide Web Consortium (W3C), and is a collaborative effort of many parties It derives from W3C director Sir Tim Berners- Lee's vision of the Web as a universal medium for data, information, and knowledge exchange. Like the web, anyone can publish to it: anyone can say anything about anything.

8 The web of data It is an initiative of the World Wide Web Consortium (W3C) and is a collaborative effort of many parties It derives from W3C director Sir Tim Berners- Lee's vision of the Web as a universal medium for data, information, and knowledge exchange. Like the web, anyone can publish to it: anyone can say anything about anything. However, they need to say it in RDF, not HTML.

9 The web of data It is an initiative of the World Wide Web Consortium (W3C) and is a collaborative effort of many parties It derives from W3C director Sir Tim Berners- Lee's vision of the Web as a universal medium for data, information, and knowledge exchange. Like the web, anyone can publish to it: anyone can say anything about anything. However, they need to say it in RDF, not HTML. And anything they want to talk about has to be a URI.

10 URI = Uniform Resource Identifier The naming model for the web of data A URI is a unique name that identifies a resource A resource is anything to which we can attach identity A resource can be an information object, like a document or a webpage, but it can also be a real world object, like a person. It can be anything at all. For example: A URL is a kind of URI that names the resource and also indicates a means of acting upon or obtaining it via its primary access mechanism e.g. http, ftp URL: http://www.w3.o rg/People/Berne rs-Lee/ URL: http://www.w3.org/ TR/rdf-concepts/

11 RDF = Resource Description Framework A framework for describing and linking resources on the web Allows URIs to be connected into a directed graph Based on the idea of triples Subject Predicate Object

12 RDF = Resource Description Framework A framework for describing and linking resources on the web Allows URIs to be connected into a directed graph Based on the idea of triples: e.g. intersect.org.au/inter sect- team/AnneCregan intersect.org.au doac:organization

13 RDF = Resource Description Framework intersect.org.au doac:organization ands.org.au doac:organization Putting triples together creates a graph intersect.org.au/inter sect- team/AnneCregan

14 RDF = Resource Description Framework intersect.org.au doac:organization ands.org.au doac:organization Putting triples together creates a graph Nodes of the graph are URIs and literals intersect.org.au/inter sect- team/AnneCregan “Anne” foaf:firstName

15 RDF = Resource Description Framework intersect.org.au doac:organization ands.org.au doac:organization Has a schema to describe relationships between things, called RDF Schema intersect.org.au/inter sect- team/AnneCregan “Anne” foaf:firstName

16 RDF = Resource Description Framework intersect.org.au doac:organization ands.org.au doac:organization Is a World Wide Web consortium (W3C) Recommendation Is part of the Semantic Web “stack” intersect.org.au/inter sect- team/AnneCregan “Anne” foaf:firstName

17 Semantic Web Technology Stack The Semantic Web standards build on each other URI is the naming mechanism RDF, RDF-Schema and OWL are the languages for describing resources and relationships between them SPARQL is a query language for querying RDF graphs

18 RDF Graphs Putting triples together creates a directed graph

19 RDF Graphs Putting triples together creates a directed graph

20 RDF Graphs Graphs can be interconnected by referring to URIs in other graphs

21 RDF Graphs

22 Linking Open Data Project Community project of the W3C Semantic Web and Outreach (SWEO) group Started in 2007 Has grown rapidly by members of the community adding open datasets Has created the largest existing RDF graph – over 18 billion triples!

23 Linking Open Data Project October 2007

24 Linking Open Data Project September 2008

25 Linking Open Data Project July 2009

26 Linking Open Data Project July 2009

27 Linking Open Data Project April 2010

28 Linking Open Data Project As at May 2009 had created a linked open data cloud of 4.7 billion RDF triples; in April 2010 Linked Open Numbers added another 14 billion triples Datasets include: – DBpedia – linked data version of wikipedia – US Census – 2000 US Census data set – Gene Ontology – annotations from Gene Ontology db – Drug bank – info about FDA approved drugs – UniProt – life sciences data set – Lots of bio/life sciences data sets - BIO2RDF cloud More info at http://esw.w3.org/topic/TaskForces/CommunityProje cts/LinkingOpenData/DataSets http://esw.w3.org/topic/TaskForces/CommunityProje cts/LinkingOpenData/DataSets

29 Publishing to the Linked Open Data Cloud – Principles 1.Use URIs to name things 2.Use HTTP URIs so you can look up those things on the web 3.When someone looks up a URI, provide useful information (“dereference-able”) 4.Include RDF statements that link to other URIs so that they can discover related things These principles are from Tim Berners-Lee‘s 2007 note: http://www.w3.org/DesignIssues/LinkedData.html

30 Consuming linked open data Browsing linked data is easy You need an RDF Browser like Tabulator, Disco, Zitgist, Marbles and OpenLink Let’s go for a ride on Disco: http://www4.wiwiss.fu- berlin.de/rdf_browser/ Start here: http://www.w3.org/People/Berners-Lee/card#i http://www4.wiwiss.fu- berlin.de/rdf_browser/ We can travel through the linked open data cloud between URIs linked using RDF RDF Browsers include Marbles http://www5.wiwiss.fu-berlin.de/marbles

31 Consuming linked open data eResearch example: Enabling drug discovery Data sets published to the data cloud: – Linked CTLinked Clinical Trials 60,000 trials in 158 countries – DrugBankFDA-approved drugs 5,000 small molecule and biotech drugs – DiseasomeDisorders and Disease genes 4,300 Disorders, disease genes and associations – DailyMedChemical structures of marketed drugs 124,000 triples and 29,600 links – SWAN Alzheimers Hypothesis Browser Knowledgebase

32 Consuming linked open data Using an RDF browser: See all drugs in trials for Alzheimer’s disease in Linked CT, including a Phase III trial for Varenicline Follow a link to data from DailyMed showing that Varenicline is already on the market for nicotine addition. The typical dose is 1mg twice daily and the Linked CT trial used no higher than that so no new safety issues. Link to DrugBank to find that Varenicline is an alpha-4 beta-2 neuronal nicotine acetylcholine receptor agonist. Diseasome indicates that the corresponding genes are only important in nicotine addiction, not Alzheimers. But the SWAN Knowledgebase shows there are hypotheses relating Alzheimers to nicotinic receptors through amyloid beta.

33 Consuming linked open data Using the linked open data cloud with an RDF browser, able to : Browse data relating to companies, clinical trials, drugs, diseases and genetic variation See when extra data is available Gain access to data without needing to map identifiers and synonyms – interlinking has already been done Gain additional insights about interesting questions to ask Jentzsch et al “Enabling Tailored Therapeutics with Linked Data” events.linkeddata.org/ldow2009/papers/ldow2009_paper9. pdf

34 Consuming linked open data Querying using SPARQL Queries A SPARQL endpoint enables users (human or other) to query a knowledge base via the SPARQL language. Results are typically returned in one or more machine-processable formats. Examples: http://wiki.dbpedia.org/OnlineAccess http://wiki.dbpedia.org/OnlineAccess

35 Types of Queries Selection and extraction queries retrieve parts of the data based on its content, structure, or position Reduction queries specify which part of the data not to include in the answer Restructuring queries restructure data into possible formats/serialisations Aggregation queries aggregate several data item into one new data item Combination and inference queries combine information that is not explicitly connected

36 Summary Open data The web of data RDF triples RDF graphs The Linked Open Data project Publishing to the web of data Consuming the web of data

37 Thankyou More details are at – http://linkeddata.org/ http://linkeddata.org/ –http://esw.w3.org/topic/SweoIG/TaskForces/Communit yProjects/LinkingOpenDatahttp://esw.w3.org/topic/SweoIG/TaskForces/Communit yProjects/LinkingOpenData –http://www.w3.org/2001/sw/http://www.w3.org/2001/sw/ Questions and comments may be emailed to anne.cregan@intersect.org.auanne.cregan@intersect.org.au


Download ppt "Linked Open Data: a new resource for eResearch Dr Anne Cregan eResearch Analyst, Intersect and ANDS"

Similar presentations


Ads by Google