Presentation is loading. Please wait.

Presentation is loading. Please wait.

Frank van Harmelen Vrije Universiteit Amsterdam The Web of data and LarKC’s role in it Creative Commons License: allowed to share & remix, but must attribute.

Similar presentations


Presentation on theme: "Frank van Harmelen Vrije Universiteit Amsterdam The Web of data and LarKC’s role in it Creative Commons License: allowed to share & remix, but must attribute."— Presentation transcript:

1 Frank van Harmelen Vrije Universiteit Amsterdam The Web of data and LarKC’s role in it Creative Commons License: allowed to share & remix, but must attribute & non-commercial

2 The Current Information Universe                     linked web-pages, written by people, written for people, used only by people... Many of these pages already come from data, usable by computers! But we can’t link the data.... ? ? ? ? The Future Information Universe ? linked data, usable by computers! useful for people!

3 already many billions of facts & rules How far away is this ? Not very far away! rapidly growing Linked Open Data cloud. Encyclopedia Geographic names (millions) names of artists & art works (10.000’s) scientific bibliographies hierarchical dictionaries (UK, FR, NL) hierarchical dictionaries (UK, FR, NL) life-science databases any CD ever recorded (almost) every book sold by Amazon basic facts on every country on the planet common sense rules & facts (100.000’s) It gets bigger every month

4 Full Web-style decoupling: re-usability, independence All identifiers are URL's (= on the Web) –Allows total decoupling of data vocabulary meta-data x T [ IsOfType ] different owners & locations

5 For the first time ever, it is now possible: to re-use somebody else's knowledge base without having to talk to them first (syntax, semantics) without having to make copies Rapid growth: "billion triple challenge" (= machine-reason with a billion facts and rules) 2006: “where do we get a billion facts from?” 2008: “which billion shall we choose!”

6 What to do when success is becoming a problem? The Large Knowledge Collider a platform for infinitely scalable reasoning on the data-web

7 Infinite scalability? parallelisation cluster computing distribution “Thinking@home”, “self-computing semantic Web” approximation “almost” is often good enough gets better with more resources

8 First result: MaRVIN MaRVIN scales by: distribution (over many nodes) approximation (sound but incomplete) anytime convergence (more complete over time) brain the size of a planet

9 The consortium 14 partners, 50 people

10 The project 10M€ budget 3.5 years 80 person years 3 case studies 14 partners

11 Use case: Drug Discovery Problem: pharmaceutical R&D in early clinical development is stagnating (Q1Q2Q3)(Q1Q2Q3) FDA white paper Innovation or Stagnation (March 2004): “developers have no choice but to use the tools of the last century to assess this century's candidate solutions.” “industry scientists often lack cross-cutting information about an entire product area, or information about techniques that may be used in areas other than theirs” FDA white paper Innovation or Stagnation (March 2004): “developers have no choice but to use the tools of the last century to assess this century's candidate solutions.” “industry scientists often lack cross-cutting information about an entire product area, or information about techniques that may be used in areas other than theirs” “ Show me any potential liver toxicity associated with the compound’s drug class, target, structure and disease.” Show me all liver toxicity associated with the target or the pathway. Genetics “Show me all liver toxicity associated with compounds with similar structure” Chemistry “Show me all liver toxicity from the public literature and internal reports that are related to the drug class, disease and patient population” LITERATURE Current NCBI: linking but no inference

12 Where is the traffic moving Is public transportation where people are Which location attracts most people right now Is public transportation where people will be Where is the traffic moving Is public transportation where people are Which location attracts most people right now Is public transportation where people will be Use Case: City on-line Our cities face many challenges Urban Computing is the ICT way to address them Is public transportation where the people are? Which landmarks attract more people? Where are people concentrating? Where is traffic moving? improve the quality of life

13 Is anybody doing this for real? OpenCalais: –enrich text (news items) with semantic meta-data –recognise people, places, events, organisations,... –useful for searching, selecting, personalising, aggregating, summarising, etc From early ’09: –identify “people, places, events, organisations,...” by linking to the Open Data cloud:        

14 Summarising The Information Universe of the Future will be a Web of Data This Web of Data is rapidly taking shape There are compelling use-cases Industrial take-up is beginning to happen We are building new infrastructure to deal with required scale

15 Contact Info Frank.van.Harmelen@cs.vu.nl http://www.larkc.eu Want to ask questions? Want to play with LarKC? Want to contribute plugins? Want to run a use-case? Want to ask questions? Want to play with LarKC? Want to contribute plugins? Want to run a use-case?


Download ppt "Frank van Harmelen Vrije Universiteit Amsterdam The Web of data and LarKC’s role in it Creative Commons License: allowed to share & remix, but must attribute."

Similar presentations


Ads by Google