Presentation is loading. Please wait.

Presentation is loading. Please wait.

김 우 주 연세대학교 정보산업공학과. 목차 I. 정보의 홍수와 극복 방안 II.Linked Data 의 구축과 활용 III.LOD 2 - 시맨틱 기술의 미래 2.

Similar presentations


Presentation on theme: "김 우 주 연세대학교 정보산업공학과. 목차 I. 정보의 홍수와 극복 방안 II.Linked Data 의 구축과 활용 III.LOD 2 - 시맨틱 기술의 미래 2."— Presentation transcript:

1 김 우 주 연세대학교 정보산업공학과

2 목차 I. 정보의 홍수와 극복 방안 II.Linked Data 의 구축과 활용 III.LOD 2 - 시맨틱 기술의 미래 2

3 3

4 An Instrumented Interconnected World 2+ billion people on the Web by end 2011 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide 100s of millions of GPS enabled devices sold annually 76 million smart meters in 2009… 200M by 2014 12+ TBs of tweet data every day 25+ TBs of log data every day ? TBs of data every day

5 Information Overflow on the Web Growth of the Web  The amount of information available on the Web grows so fast.  The February 2014 survey shows there exist at least 920,120,079 sites (http://news.netcraft.com/archives/category/web-server-survey/). 5

6 Information Overflow on the Web The Indexed Web contains at least 19.8 billion pages (Sunday, 02 March, 2014).  http://www.worldwidewebsize.com/ http://www.worldwidewebsize.com/6

7 Information Overflow Problems Problems  How to cover all available information? - Recall  How to find the relevant information? - Precision 7 Not data (search), but integration, analysis and insight, leading to decisions and discovery

8 Example Query to Google ‘iPad’ 검색 사례 8

9 Information Silo Problem Stove-piped Systems and Poor Content Aggregation

10 Semantic Interoperability To cope with the problems mentioned in the preceding slide, we need Semantic Interoperability. Semantics  “The meaning or the interpretation of a word, sentence, or other language form.” What is Semantic Interoperability?  “Processing or Integration of resources based on the understanding what’s intended or expressed by other systems or parties.’’ 10

11 Front-endedness? 11

12 What if I want to... Move my content from one place to another?  RSS ? Not enough Aggregate my data  An open FriendFeed? Re-use my Flickr friends on Twitter?  Invite. Again and again... The Semantic Web and Ontology can help !  By providing a common framework to interlink data from various providers in an open way. 12

13 How is it Possible? Ontology: Agreement with Common Vocabulary & Domain Knowledge Semantic Annotation: metadata (manual & automatic metadata extraction) Reasoning: semantics enabled search, integration, analysis, mining, discovery 13

14 Semantic Web Layer Cake 14

15 Three Technical Building Block Basic Building Block  URIs for unambiguous names for resources,  RDF for common data model for expressing metadata,  Ontology(OWL) for common vocabularies. Semantic Web becomes:  web of data/things/concepts What is a Thing/Concept? It can be anything in the world - a movie, a person, a disease, a location… Machines will be able to understand the concept behind a html page. This page is talking about ‘Barack Obama’, He is a ‘Person’ and he is the ‘President of USA’ ? 15

16 Who borrows this Idea? Facebook  Facebook Open Graph Protocol and Graph SearchGraph Search Google  Knowledge Graph Knowledge Graph Twitter  Real-time Semantic Web with Twitter Annotations Real-time Semantic Web with Twitter Annotations16

17 17

18 Linked Data Building a “Web of Data” to enhance the current Web The Linking Open Data (LOD) project:  http://linkeddata.org/ http://linkeddata.org/  Translating existing datasets into RDF and linking them together. For example, DBpedia (Wikipedia) and GeoNames, Freebase, BBC programmes, etc.  Government data also available as Linked Data DATA.gov DATA.gov.uk 18

19 The LOD cloud 19 2008 2007

20 The LOD cloud 20 2009 2008

21 Web of Data 21

22 Web of Data (Statistics) The size of the Web of Data  The size of the Web of Data can be estimated based on the data set statistics that are collected by the LOD community in the ESW wiki.  According to these statistics, the Web of Data currently consists of 31 billion RDF triples, which are interlinked by around 500 million RDF inter-links (09/19/2011). 22

23 Types of Linked Data Applications Linked Data 의 활용 방안 23

24 Semantic Search Engines Top 7 Semantic Search Engines as An Alternative to Google  Kngine Kngine  Hakia  Kosmix: now is part of @WalmartLabs  DuckDuckGo DuckDuckGo  Evri: specialized for iPad and iPhone  Powerset: now is part of Bing  Truevert: focus only on environmental concerns. 24

25 What is the Purpose of RDF? The purpose of RDF (Resource Description Framework) is to give a standard way of specifying data "about" something. Here's an example of an XML document that specifies data about China's Yangtze river: 25 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea "Here is data about the Yangtze River. It has a length of 6300 kilometers. Its startingLocation is western China's Qinghai-Tibet Plateau. Its endingLocation is the East China Sea."

26 XML  RDF 26 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea XML Modify the following XML document so that it is also a valid RDF document: 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea RDF Yangtze.xml Yangtze.rdf "convert to"

27 The RDF Format 27 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea RDF provides an ID attribute for identifying the resource being described. 1 The ID attribute is in the RDF namespace. 2 Add the "fragment identifier symbol" to the namespace. 3

28 The RDF Format (cont.) 28 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea Identifies the type (class) of the resource being described. 1 Identifies the resource being described. This resource is an instance of River. 2 These are properties, or attributes, of the type (class). 3 Values of the properties 4

29 Advantage of using the RDF Format You may ask: "Why should I bother designing my XML to be in the RDF format?" Answer: there are numerous benefits:  The RDF format, if widely used, will help to make XML more interoperable: Tools can instantly characterize the structure, "this element is a type (class), and here are its properties”. RDF promotes the use of standardized vocabularies... standardized types (classes) and standardized properties.  The RDF format gives you a structured approach to designing your XML documents. The RDF format is a regular, recurring pattern.  It enables you to quickly identify weaknesses and inconsistencies of non-RDF-compliant XML designs. It helps you to better understand your data!  You reap the benefits of both worlds: You can use standard XML editors and validators to create, edit, and validate your XML. You can use the RDF tools to apply inferencing to the data. It positions your data for the Semantic Web! 29 Network effect Interoperability

30 Uniquely Identify the Resource Earlier we said that RDF is very concerned about uniquely identifying the type (class) and the properties. RDF is also very concerned about uniquely identifying the resource, e.g., 30 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea This is the resource being described. We want to uniquely identify this resource.

31 rdf:about Instead of identifying a resource with a relative URI (which then requires a base URI to be prepared), we can give the complete identity of a resource. However, we use rdf:about, rather than rdf:ID, e.g., 31 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea

32 Triple  Resource/Property/Value 32 http://www.china.org/geography/rivers#Yangtze has a http://www.geodesy.org/river#length of 6300 kilometers resource property value http://www.china.org/geography/rivers#Yangtze has a http://www.geodesy.org/river#startingLocation of western China's... resource property value http://www.china.org/geography/rivers#Yangtze has a http://www.geodesy.org/river#endingLocation of East China Sea resource property value

33 RDF Model (graph) 33 Legend: Ellipse indicates "Resource" Rectangle indicates "literal string value"

34 rdf:Description + rdf:type There is still another way of representing the XML. This way makes it very clear that you are describing something, and it makes it very clear what the type (class) is of the thing you are describing: 34 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea This is read as: "This is a Description about the resource http://www.china.org/geography/rivers#Yangtze. This resource is an instance of the River type (class). The http://www.china.org/geography/rivers#Yangtze resource has a length of 6300 kilometers, a startingLocation of western China's Qinghai-Tibet Plateau, and an endingLocation of the East China Sea." Note: this form of describing a resource is called the "long form". The form we have seen previously is an abbreviation of this long form. An RDF Parser interprets the abbreviated form as if it were this long form.

35 RDF Namespace 35 http://www.w3.org/1999/02/22-rdf-syntax-ns# ID about type resource Description

36 RDF Parser There is a nice RDF parser at the W3 Web site: 36 http://www.w3.org/RDF/Validator/ This RDF parser will tell you if your XML is in the proper RDF format.

37 Note the two types (classes) 37 River Dam Instance: YangtzeYangtze Properties: length startingLocation endingLocation Instance: ThreeGorgesThreeGorges Properties: name width height cost

38 RDF Format 38 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea The Three Gorges Dam 1.5 miles 610 feet $30 billion 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea The Three Gorges Dam 1.5 miles 610 feet $30 billion As always, the other representations using rdf:about and rdf:Description are available.

39 RDF Model (graph) 39

40 Alternative Way 40 The Three Gorges Dam 1.5 miles 610 feet $30 billion The Three Gorges Dam 1.5 miles 610 feet $30 billion 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea Three-Gorges-Dam.rdf Alternatively, suppose that someone has already created a document containing information about the Three Gorges Dam: Yangtze.rdf Then we can simply reference the Three Gorges Dam resource using rdf:resource, as shown here:

41 A distributed network of data! 41 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea http://www.china.org/geography/rivers/yangtze.rdf Dri Chu - Female Yak River Tongtian He, Travelling-Through-the-Heavens River Jinsha Jiang, River of Golden Sand Dri Chu - Female Yak River Tongtian He, Travelling-Through-the-Heavens River Jinsha Jiang, River of Golden Sand http://www.encyclopedia.org/yangtze-alternate-names.rdf 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea Dri Chu - Female Yak River Tongtian He, Travelling-Through-the-Heavens River Jinsha Jiang, River of Golden Sand 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea Dri Chu - Female Yak River Tongtian He, Travelling-Through-the-Heavens River Jinsha Jiang, River of Golden Sand Aggregated Data! Aggregator tool collects data about the Yangtze

42 Another Example of Aggregation 42 The Three Gorges Dam 1.5 miles 610 feet $30 billion The Three Gorges Dam 1.5 miles 610 feet $30 billion 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea http://www.china.org/geography/rivers/yangtze.rdf http://www.encyclopedia.org/three-gorges-dam.rdf 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea The Three Gorges Dam 1.5 miles 610 feet $30 billion 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea The Three Gorges Dam 1.5 miles 610 feet $30 billion Aggregate! Note that the reference to the ThreeGorges Dam resource has been replaced by whatever information the aggregator could find on this resource!

43 43

44 LOD2 : What is LOD2? LOD2(Linked Open Data)  LOD2 is the large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Programme. Started in September 2010  Partners 14 partners (11 European Country) 44

45 LOD2 : Objectives of LOD2 LOD2 Project Objectives  Achieving visualization, deployment, sharing, accessibility for linked open data by software technology. Increase visibility of Linked Data activities [Visualization] Support deployment Linked Data components [Deployment] Improve information sharing between Linked Data components so that publishing Linked Data is eased. [Sharing] Improve access to the content: the online Linked Open Data [Accessibility] Improve the software technology which support it [By software technology] 45

46 LOD2 Stack : Overview LOD2 Stack  LOD2 project provides LOD2 Stack for the sake of easy access to linked data software.  the LOD2 software stack is an integrated distribution of aligned tools supporting the life-cycle of Linked Data from extraction, authoring/creation over enrichment, interlinking, fusing to visualization and maintenance 46

47 LOD2 Stack 3.0 47

48 LOD2 Stack : The overview of tools Apache Stanbol  In the LOD2 Stack, Apache Stanbol can be used for NLP services which rely on the stack internal knowledge bases, such as named entity recognition and text classification. CubeViz  CubeViz is a facetted browser for statistical data utilizing the RDF Data Cube vocabulary which is the state-of-the-art in representing statistical data in RDF. CubeViz 48

49 LOD2 Stack : The overview of tools Dbpedia Spotlight  DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia. DBpedia Spotlight D2RQ  D2RQ is a system for accessing relational databases(RDBMS) as virtual RDF graphs. 49

50 LOD2 Stack : The overview of tools DL-Learner  The DL-Learner software learns concepts in Description Logics (DLs) from user-provided examples. (Supervised-learning) ORE  The ORE (Ontology Repair and Enrichment) tool allows for knowledge engineers to improve an OWL ontology by fixing inconsistencies and making suggestions for adding further axioms to it. 50

51 LOD2 Stack : The overview of tools Poolparty  The PoolParty Extractor (PPX) offers an API providing text mining algorithms based on semantic knowledge models. 51

52 LOD2 Stack : The overview of tools SemMap  SemMap allows to visualize knowledge bases having a spatial dimension. SemMap Silk  The Silk Link Discovery Framework supports data publishers in accomplishing the second task. Using the declarative Silk - Link Specification Language (Silk-LSL), developers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must fulfill in order to be interlinked. 52

53 LOD2 Stack : The overview of tools Sieve  Sieve allows Web data to be filtered according to different data quality assessment policies and provides for fusing Web data according to different conflict resolution methods. LIMES  LIMES is a link discovery framework for the Web of Data. It implements time-efficient approaches for large-scale link discovery based on the characteristics of metric spaces. 53

54 Silk : Link Discovery Framework Interlinking and Fusion Stage Component of LOD2 Stack  Can be used by data providers to generate RDF links between data sets on the web of data Especially, to set explicit RDF links between data items within different data sources “Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web” 54

55 Silk : Silk – Link Specification Language Example Aggregation Example:  Combines multiple confidence values into a single value (average) 55 Confidence value is the average of two compared weight Numeric differences between parameters

56 DL-Learner Introduction  The goal of DL-Learner is to provide a DL/OWL based machine learning tool to solve supervised learning tasks.  The DL-Learner software learns concepts in Description Logics (DLs) from examples.

57 DL-Learner : Features

58 Demo of SDT Plug-in to Protégé 58

59 SWCL - Sample Example 59 Country Province hasPart positiveInteger PopulationValue ?

60 Constraints Representation in SWCL 60

61 Our Direction to the Future Directions  Open, Share your data, whenever and wherever you want  Semantic, Enhance your data, to make more sense of it An example: LinkedGeoData.org  We need an integrated framework to enhance communication and information sharing in GeoData. 61

62 Q&A 62


Download ppt "김 우 주 연세대학교 정보산업공학과. 목차 I. 정보의 홍수와 극복 방안 II.Linked Data 의 구축과 활용 III.LOD 2 - 시맨틱 기술의 미래 2."

Similar presentations


Ads by Google