Presentation is loading. Please wait.

Presentation is loading. Please wait.

Whats Wrong With current Semantic Web Reasoning (and how to fix it)

Similar presentations


Presentation on theme: "Whats Wrong With current Semantic Web Reasoning (and how to fix it)"— Presentation transcript:

1 Whats Wrong With current Semantic Web Reasoning (and how to fix it)

2 2 This talk (and this workshop) Current state of Web Reasoning? What's wrong with it? What are we going to do about it? LarKC: one large new effort to do something about it

3 3 Whats wrong with current SemWeb reasoning methods ?

4 4 Characteristics of current Semantic Web reasoning centralised, algorithmic, boolean, deterministic examples of current attempts at scaleability: –identify subsets of OWL OWL DL OWL DLP OWL Horst –identify alternative semantics for OWL e.g. LP-style semantics –scaleability by muscle-power

5 5 Scalability by muscle power Task/DataSystemMill. Stats. Time (sec) Speed (stat/sec) Inference LUBM(500), [15] Sesames Native Store (v..0alpha3) RDFS +/- LUBM(600) SwiftOWLIM v0.92b OWL-Horst +/- LUBM(8000) BigOWLIM v0.92b OWL-Horst +/-, complete Subset of UniProt ORACLE, 10gR RDFS + FOAF,DC, PRISM Jena v no info. RDFS +/- Reif. LUBM + Movie & Actor DB AllegroGraph RDFS++ Reasoner RDFS +/-;

6 6 Moving in the right direction: New BigOWLIM (OntoText, Sirma) 4 switchable inference modes (owl-max,owl-horst-,rdfs-s, optimised rdf-s, none) custom rules for definable semantics < 100ms query performance on billlion triples (but 34hrs upload)

7 7 Why we need something different Gartner (May 2007, G ): "By 2012, 70% of public Web pages will have some level of semantic markup, 20% will use more extensive Semantic Web-based ontologies Semantic Technologies at Web Scale? –20% of 30 billion 1000 triples per page = 6 trillion triples –30 billion and 1000 are underestimates, imagine in 6 years from now… –data-integration and semantic search at web-scale? Inference will have to become distributed, heuristic, approximate, probabilistic not centralised, algorithmic, boolean, deterministic

8 8 Why we need something different Problem: pharmaceutical R&D in early clinical development is stagnating (Q 1 Q 2 Q 3 ) FDA white paper Innovation or Stagnation (March 2004): developers have no choice but to use the tools of the last century to assess this century's candidate solutions. industry scientists often lack cross-cutting information about an entire product area, or information about techniques that may be used in areas other than theirs FDA white paper Innovation or Stagnation (March 2004): developers have no choice but to use the tools of the last century to assess this century's candidate solutions. industry scientists often lack cross-cutting information about an entire product area, or information about techniques that may be used in areas other than theirs Show me any potential liver toxicity associated with the compounds drug class, target, structure and disease. Show me all liver toxicity associated with the target or the pathway. Genetics Show me all liver toxicity associated with compounds with similar structure Chemistry Show me all liver toxicity from the public literature and internal reports that are related to the drug class, disease and patient population LITERATURE Current NCBI: linking but no inference

9 9 Why we need something different Our cities face many challenges Urban Computing is the ICT way to address them Is public transportation where the people are? Which landmarks attract more people? Where are people concentrating? Where is traffic moving?

10 10 Whats wrong with current Semantic Web Reasoning Properties of current inference techniques: Based on logic as guiding paradigm: Exact Abrupt Expensive

11 11 Current inference is exact yes or no not: allmost, not by a long shot,yes, except for a few, etc (think of subClassof) This was OK, as long as ontologies were clean: –hand-crafted –well-designed –carefully populated –well maintained –etc

12 12 Current inference is exact But current ontologies are sloppy (and will be increasingly so) made by non-experts made by machines: –scraping from file-hierarchies, mail-folders todo-lists & phone-books on PDAs –machine learing from examples

13 13 Sloppy ontologies need sloppy inference

14 14 Sloppy ontologies need sloppy inference almost subClassOf

15 15 Combined ontologies need sloppy inference Mapping ontologies is almost always messy post-doc young-researcher almost equal

16 16 Properties of current inference techniques Based on logic as guiding paradigm: Exact approximate Abrupt Expensive

17 17 Current inference is abrupt nothing……………….. yes! we want gradual answers: anytime computation –agent can decide how good is good enough (human or machine) deadline computation –pay for quality –load balancing

18 18 Current inference is expensive approximate answers are cheap gradual answers are arbitrarily cheap (WYPIWYG)

19 19 Properties of current inference techniques Based on logic as guiding paradigm: Exact approximate Abrupt gradual Expensive cheap

20 20 Whats wrong with current Semantic Web Reasoning obsession with worst-case asymptotic complexity

21 21 Who cares about decidability? Decidability completeness guarantee to find an answer, or tell you it doesnt exist, given enough run-time & memory Sources of incompleteness : l incompleteness of the input data l insufficient run-time to wait for the answer Completeness is unachievable in practice anyway, regardless of the completeness of the algorithm

22 22 Who cares about undecidability? Undecidability always guaranteed not to find an answer Undecidability = not always guaranteed to find an answer Undecidability may be harmless in many cases; in all cases that matter

23 23 Who cares about complexity? worst-case: may be exponentially rare asymptotic ignores constants

24 24 What to do instead? No good framework for average case complexity 2 nd best: do more experimental performance profiles with realistic data

25 25 Whats wrong with current Semantic Web Reasoning obsession with worst-case asymptotic complexity –not even good framework for "average" complexity obsession with recall & precision Why we need something different

26 26 Need for approximation Trade-off recall for precision or vice versa l security: prefer recall l medicin: prefer precision Trade-off both for speed Logicians nightmare: l drop soundness & completeness!

27 27 precision (soundness) recall (completeness) logic IR Semantic Web A logicians nightmare (Dieter Fensel)

28 28 Whats wrong with current Semantic Web Reasoning obsession with worst-case asymptotic complexity –no good framework for "average" complexity obsession with recall & precision –no good framework for good enough separation of reasoning and search

29 29 Integrating Search with Reasoning Search Axioms: a hasType b b subClassOf c Reasoning Conclusion

30 30 Summary of analysis Based on logic, which is strict, abrupt, expensive Obsession with complexity Obsession with traditional soundness/completeness & recall/precision No recognition that different use-cases need different performance trade-offs

31 31

32 32 Goals of LarKC 1.Scaling to infinity –by giving up soundness & completeness –by switching between reasoning and search 2.Reasoning pipeline –by plugin architecture 3.Large computing platform –by cluster computing –by wide-area distribution

33 33 Scaling to infinity Possible approaches Markov Logic (probability in the logic, judging truth of formula) : –adds a learnable weight to each FOL formula, specifying a probability distribution over Herbrand interpretations (possible worlds) weighted RDF Graphs (probability as a heuristic, judging relevance of formula): –weighted activation spreading (for selection), –followed by classical inference over selected subgraph model sampling (probability in the logic) : sampling space of all truth assignments, driven by probability of model and others

34 34 Goals of LarKC 1.Scaling to infinity –by giving up soundness & completeness –by switching between reasoning and search 2.Reasoning pipeline –by plugin architecture 3.Large computing platform –by cluster computing –by wide-area distribution

35 35 Retrieve Relevant Sources Relevant Content Relevant Context Abstract Extract Information Calculate Statistics Transform to Logic Select Relevant Problems Relevant Methods Relevant Data Reason Probabilistic Inference Classification Context reasoning Decide Enough answers? Enough certainty? Enough effort/cost? What is the large Knowledge Collider? Plug-in architecture 1. Retrieve 2. Abstract 3. Select 4. Reason 5. Decide

36 36 What is the Large Knowledge Collider Integrating Reasoning and Search dynamic, web-scale, and open-world in a plugable architecture Combining consortium competence –IR, Cognition –ML, Ontologies –Statistics, ML, Cognition,DB –Logic,DB, Probabilistic Inference –Economics, Decision Theory

37 37 Goals of LarKC 1.Scaling to infinity –by giving up soundness & completeness –by switching between reasoning and search 2.Reasoning pipeline –by plugin architecture 3.Large computing platform –by cluster computing –by wide-area distribution

38 38 Two parallel implementations 1.Medium-size tight cluster parallel computing – O(10 2 ) nodes –fully available –fully reliable –(almost) no bandwidth restrictions 2.Large scale wide area distributed computing – O(10 4 ) nodes –unpredictable, unreliable, very limited bandwidth

39 39 How & when will others get access to the results Public releases of LarKC platform Public APIs enabling others to develop plug-ins Create Early Access Group Encourage uptake through Challenge Tasks Encourage participation through World Health Org. use-case is public domain data Give access to best practice through contributions to W3C SWBPD, SWEO, HCLS

40 40 OrganisationCountry DERI InnsbrukAustria AstraZenicaSweden CEFRIELItaly Cycorp, EuropeSlovenia Universität Stuttgart, High Performance ComputingGermany Max Plank PsychologyGermany Ontotext LabBulgaria SaltluxKorea SiemensGermany SheffieldUnited Kingdom Vrije Universiteit AmsterdamNetherlands Beijing University of TechnologyPRC World Health Organisation: Cancer ResearchFrance Who will build this?

41 41 Timing Start in April 08 First prototype after 1 year Limited open access after 2 years Open access after 2.5 years Open APIs, competition First demonstrators after 2.5 years Run-time 3.5 years

42 42 Most important results? An implemented configurable platform for large scale semantic computing, together with a library of plug-ins and APIs enabling development by others, the practicality of which is shown in three demonstrated deployments in medical research, drug development and urban computing using mobile data Open to the community come and play with us


Download ppt "Whats Wrong With current Semantic Web Reasoning (and how to fix it)"

Similar presentations


Ads by Google