Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Web of Linked Data Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.

Similar presentations


Presentation on theme: "The Web of Linked Data Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University."— Presentation transcript:

1 The Web of Linked Data Information Universe Seongmin Lim hovern@snu.ac.kr Dept. of Industrial Engineering Seoul National University

2 contents  Foundations of Dataspaces and Linked Data -Where do they overlap?  The Web of Linked Data -What data is out there?  Linked Data Applications -What is being done with the data?  Remarks on -Identity -Self-descriptive Data -Pay-as-you-go Integration 2

3 From data integration systems to dataspace  In order to cope with growing number of data sources  Properties of dataspaces -may contain any kind of data (structured, semi-structured, unstructured) -require no upfront investment into a global schema -provide for data-coexistence -give best-effort answers to queries -rely on pay-as-you-go data integration 3

4 Linked data principles  For publishing structured data on the general Web  Tim Berners-Lee 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful RDF information. 4. Include RDF statements that link to other URIs so that they can discover related things. 4

5 From classic web to web 2.0 Single global information space No single global dataspace 1.Small set of simple standards1. APIs have proprietary interfaces 2.Hyperlinks to connect everything2. Mashups from a fixed data sources 3. No hyperlinks within different APIs 5

6 Web APIs slice the Web into Walled Gardens

7 Can’t we just publish data as files?  pdf -Easy to read and publish  Excel -Allows further processing and analysis  csv -Processing without need for proprietary tools But… -Structure of data not explained -No connection between different data sets, silos -Static and fixed – can’t retrieve just slices relevant to problem 7

8 Linked data  Extend the Web with a single global dataspace -By using RDF to publish structured data on the Web -By setting links between data items within different data sources 8

9 What is RDF?  Resource Description Framework  RDF is the data format for linked data  It’s about writing down relations between things  What is RDF for? -For everyone to do same for data -To make the Web into a database 9

10 The essence of RDF: the ‘triple’  Typical database table 10 things propertiess

11 Relations between ‘things’ 11

12 Using the Web’s infrastructure  Entities are identified with HTTP URIs -Specifically http:// 12

13 13

14 contents  Foundations of Dataspaces and Linked Data -Where do they overlap?  The Web of Linked Data -What data is out there?  Linked Data Applications -What is being done with the data?  Remarks on -Identity -Self-descriptive Data -Pay-as-you-go Integration 14

15 Properties of the Web of linked data  Global, distributed dataspace built on a simple set of standards -RDF, URIs, HTTP  Entities are connected by links -enables the discovery of new data sources.  Provides for data-coexistence -Everyone can publish data to the Web of Linked Data -Everyone can express their personal view on things -Everybody can use the schemata that they like for this 15

16 W3C linking open data project  Publish existing open license datasets as linked data  Interlink things between different data sources  2007 16

17 LOD datasets on the Web: July 2009 17

18 DBpedia  community effort to extract structured information from Wikipedia.  provides data about 3.4 million things -312,000 persons -140,000 organizations -413,000 places -94,000 music albums -49,000 films -146,000 species -…  provides identifiers for many common things -http://dbpedia.org/resource/Calgary  overlaps with many other data sources on the Web 18

19 Uptakes in many areas  Uptake in life sciences -W3C linking open drug data effort -Bio2RDF project -Allen Brain Atlas  Governments, libraries, media industry, …… 19

20 The structural continuum  The Web of linked data is interwoven with the classic Web. -Unstructured data: HTML -Semi-structured data: RDFa embed into HTML -Structured data: RDF/XML  Services using named entity recognition to annotate texts with Linked Data URIs -Open Calais (Thomsons Reuters) for news -Zemanta (startup) for blog posts 20

21 contents  Foundations of Dataspaces and Linked Data -Where do they overlap?  The Web of Linked Data -What data is out there?  Linked Data Applications -What is being done with the data?  Remarks on -Identity -Self-descriptive Data -Pay-as-you-go Integration 21

22 Linked data browsers  Provide for navigating between data sources in order to explore the dataspace. -Tabulator Browser (MIT, USA) -Marbles (FU Berlin, DE) -OpenLink RDF Browser (OpenLink, UK) -Zitgist RDF Browser (Zitgist, USA) -Disco Hyperdata Browser (FU Berlin, DE) -Fenfire (DERI, Irland) 22

23 23

24 Mashups(DBpedia mobile) 24

25 Web of data search engines  Crawl the dataspace and provide best-effort query answers over crawled data. -Falcons (IWS, China) -Sig.ma (DERI, Ireland) -Swoogle (UMBC, USA) -VisiNav (DERI, Ireland) -Watson (Open University, UK) 25

26 26

27 What are the big players doing?  Yahoo! and Google have started to crawl Linked Data in its RDFa serialization as well as Microformats.  Yahoo! -provides access to crawled data through the Yahoo BOSS API -is using the data within Yahoo Search Monkey to make search results more useful and visually appealing.  Google -uses crawled RDF data for its Social Graph API -uses crawled data to enhance search results snippets for reviews and people. 27

28 Yahoo! Search monkey 28

29 contents  Foundations of Dataspaces and Linked Data -Where do they overlap?  The Web of Linked Data -What data is out there?  Linked Data Applications -What is being done with the data?  Remarks on -Identity -Self-descriptive Data -Pay-as-you-go Integration 29

30 Identity  Real world objects are identified with multiple URIs -Coupling of identification and retrieval -Data-coexistence: everybody can say everything about anything 30

31 Enable Clients to retrieve the Schema  Clients can resolve the URIs that identify vocabulary terms in order to get their RDFS or OWL definitions. 31

32 Reuse Terms from Common Vocabularies  Common Vocabularies -Friend-of-a-Friend for describing people and their social network -SIOC for describing forums and blogs -SKOS for representing topic taxonomies -Organization Ontology for describing the structure of organizations -GoodRelations for describing products and business entities -Music Ontology for describing artists, albums, and performances -Review Vocabulary provides terms for representing reviews  Common sources of identifiers (URIs) for real world objects -LinkedGeoData and Geonames: Locations -GeneID and UniProt: Life science identifiers -Dbpedia: Wide range of things 32

33 Somebody Pays-As-You-Go  The overall data integration effort is split between the data publisher, the data consumer and third parties.  Data Publisher -publishes data as RDF -publishes data in a self-descriptive fashion -sets links and publishes mappings  Third Parties -set links pointing at your data -publish mappings to the Web  Data Consumer -has to do the rest 33

34 Summary  Linked Data moves the dataspace vision to a global scale and adds the social/community aspect to it.  The Web of Linked Data is growing rapidly -active deployment communities in different domains -might have exceeded the critical mass  Great playground for experimentation -dataspace profiling -probabilistic and approximate schema mapping -data fusion, data quality, and trust -What will the user interfaces look like? -Will search engines turn into answer engines? 34

35 End of Document Seongmin Lim hovern@snu.ac.kr Dept. of Industrial Engineering Seoul National University


Download ppt "The Web of Linked Data Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University."

Similar presentations


Ads by Google