What’s Wrong With current Semantic Web Reasoning (and how to fix it)

What’s Wrong With current Semantic Web Reasoning (and how to fix it)

This talk (and this workshop)
Current state of Web Reasoning? What's wrong with it? What are we going to do about it? LarKC: one large new effort to do something about it

What’s wrong with current SemWeb reasoning methods
?

Characteristics of current Semantic Web reasoning
centralised, algorithmic, boolean, deterministic examples of current attempts at scaleability: identify subsets of OWL OWL DL OWL DLP OWL Horst identify alternative semantics for OWL e.g. LP-style semantics scaleability by muscle-power

Scalability by muscle power
Task/Data System Mill. Stats. Time (sec) Speed (stat/sec) Inference LUBM(500), [15] Sesame’s Native Store (v. .0alpha3) 70 10 800 6 481 RDFS +/- LUBM(600) SwiftOWLIM v0.92b 83 3 941 20 979 OWL-Horst +/- LUBM(8000) BigOWLIM v0.92b 1 060 4 216 OWL-Horst +/-, complete Subset of UniProt ORACLE, 10gR2 100 361 RDFS + FOAF,DC, PRISM Jena v2.3 200 no info. Reif. LUBM + Movie & Actor DB AllegroGraph RDFS++ Reasoner 1.2 1 000 50 580 19 771 RDFS +/-;

Moving in the right direction: New BigOWLIM (OntoText, Sirma)
4 switchable inference modes (owl-max,owl-horst-,rdfs-s, optimised rdf-s, none) custom rules for definable semantics < 100ms query performance on billlion triples (but 34hrs upload)

Why we need “something different”
Gartner (May 2007, G ): "By 2012, 70% of public Web pages will have some level of semantic markup, 20% will use more extensive Semantic Web-based ontologies” Semantic Technologies at Web Scale? 20% of 30 billion 1000 triples per page = 6 trillion triples 30 billion and 1000 are underestimates, imagine in 6 years from now… data-integration and semantic search at web-scale? Inference will have to become distributed, heuristic, approximate, probabilistic not centralised, algorithmic, boolean, deterministic add animation to last bullet

FDA white paper Innovation or Stagnation (March 2004): “developers have no choice but to use the tools of the last century to assess this century's candidate solutions.” “industry scientists often lack cross-cutting information about an entire product area, or information about techniques that may be used in areas other than theirs” Problem: pharmaceutical R&D in early clinical development is stagnating Current NCBI: linking but no inference “Show me any potential liver toxicity associated with the compound’s drug class, target, structure and disease.” (Q1Q2Q3) Show me all liver toxicity associated with the target or the pathway. Genetics “Show me all liver toxicity associated with compounds with similar structure” Chemistry “Show me all liver toxicity from the public literature and internal reports that are related to the drug class, disease and patient population” LITERATURE

Our cities face many challenges Urban Computing is the ICT way to address them Is public transportation where the people are? Which landmarks attract more people? Where are people concentrating? Where is traffic moving?

What’s wrong with current Semantic Web Reasoning
Properties of current inference techniques: Based on logic as guiding paradigm: Exact Abrupt Expensive

Current inference is exact
“yes” or “no” not: “allmost”, “not by a long shot”, “yes, except for a few”, etc (think of subClassof) This was OK, as long as ontologies were clean: hand-crafted well-designed carefully populated well maintained etc

Current inference is exact
But current ontologies are sloppy (and will be increasingly so) made by non-experts made by machines: scraping from file-hierarchies, mail-folders todo-lists & phone-books on PDA’s machine learing from examples

Sloppy ontologies need sloppy inference

Sloppy ontologies need sloppy inference
“almost subClassOf”

Combined ontologies need sloppy inference
Mapping ontologies is almost always messy post-doc  young-researcher “almost equal”

Properties of current inference techniques
Based on logic as guiding paradigm: Exact approximate Abrupt Expensive

Current inference is abrupt
nothing……………….. yes! we want gradual answers: anytime computation agent can decide how good is good enough (human or machine) deadline computation pay for quality load balancing

Current inference is expensive
approximate answers are cheap gradual answers are arbitrarily cheap (WYPIWYG)

Properties of current inference techniques
Based on logic as guiding paradigm: Exact approximate Abrupt gradual Expensive cheap

obsession with worst-case asymptotic complexity

Who cares about decidability?
Decidability ≈ completeness guarantee to find an answer, or tell you it doesn’t exist, given enough run-time & memory Sources of incompleteness: incompleteness of the input data insufficient run-time to wait for the answer Completeness is unachievable in practice anyway, regardless of the completeness of the algorithm

Who cares about undecidability?
Undecidability ≠ always guaranteed not to find an answer Undecidability = not always guaranteed to find an answer Undecidability may be harmless in many cases; in all cases that matter

Who cares about complexity?
worst-case: may be exponentially rare asymptotic ignores constants

What to do instead? No good framework for “average case” complexity
2nd best: do more experimental performance profiles with realistic data

obsession with worst-case asymptotic complexity not even good framework for "average" complexity obsession with recall & precision Why we need “something different”

Need for approximation
Trade-off recall for precision or vice versa security: prefer recall medicin: prefer precision Trade-off both for speed Logician’s nightmare: drop soundness & completeness!

A logician’s nightmare
(Dieter Fensel) Semantic Web precision (soundness) recall (completeness) logic IR

obsession with worst-case asymptotic complexity no good framework for "average" complexity obsession with recall & precision no good framework for “good enough” separation of reasoning and search

Integrating Search with Reasoning
Axioms: a hasType b b subClassOf c Reasoning Conclusion

Summary of analysis Based on logic, which is strict, abrupt, expensive
Obsession with complexity Obsession with traditional soundness/completeness & recall/precision No recognition that different use-cases need different performance trade-offs

The Large Knowledge Collider

 Goals of LarKC Scaling to infinity Reasoning pipeline
by giving up soundness & completeness by switching between reasoning and search Reasoning pipeline by plugin architecture Large computing platform by cluster computing by wide-area distribution

Scaling to infinity Possible approaches
Markov Logic (probability in the logic, judging truth of formula): adds a learnable weight to each FOL formula, specifying a probability distribution over Herbrand interpretations (possible worlds) weighted RDF Graphs (probability as a heuristic, judging relevance of formula): weighted activation spreading (for selection), followed by classical inference over selected subgraph model sampling (probability in the logic): sampling space of all truth assignments, driven by probability of model and others

by giving up soundness & completeness by switching between reasoning and search Reasoning pipeline by plugin architecture Large computing platform by cluster computing by wide-area distribution

What is the large Knowledge Collider? Plug-in architecture
Retrieve Relevant Sources Relevant Content Relevant Context Abstract Extract Information Calculate Statistics Transform to Logic Select Relevant Problems Relevant Methods Relevant Data Reason Probabilistic Inference Classification Context reasoning Decide Enough answers? Enough certainty? Enough effort/cost? Retrieve Abstract Select Reason Decide

What is the Large Knowledge Collider
Integrating Reasoning and Search dynamic, web-scale, and open-world in a plugable architecture Combining consortium competence IR, Cognition ML, Ontologies Statistics, ML, Cognition,DB Logic,DB, Probabilistic Inference Economics, Decision Theory

by giving up soundness & completeness by switching between reasoning and search Reasoning pipeline by plugin architecture Large computing platform by cluster computing by wide-area distribution 

Two parallel implementations
Medium-size tight cluster parallel computing ≈ O(102) nodes fully available fully reliable (almost) no bandwidth restrictions Large scale wide area distributed computing ≈ O(104) nodes unpredictable, unreliable, very limited bandwidth

How & when will others get access to the results
Public releases of LarKC platform Public APIs enabling others to develop plug-ins Create Early Access Group Encourage uptake through Challenge Tasks Encourage participation through World Health Org. use-case is public domain data Give access to best practice through contributions to W3C SWBPD, SWEO, HCLS Note that the Challenge Tasks are not listed as deliverables…

Who will build this? Organisation Country DERI Innsbruk Austria
AstraZenica Sweden CEFRIEL Italy Cycorp, Europe Slovenia Universität Stuttgart, High Performance Computing Germany Max Plank Psychology Ontotext Lab Bulgaria Saltlux Korea Siemens Sheffield United Kingdom Vrije Universiteit Amsterdam Netherlands Beijing University of Technology PRC World Health Organisation: Cancer Research France

Timing Start in April ’08 First prototype after 1 year
Limited open access after 2 years Open access after 2.5 years Open API’s, competition First demonstrators after 2.5 years Run-time years

Most important results?
“An implemented configurable platform for large scale semantic computing, together with a library of plug-ins and APIs enabling development by others, the practicality of which is shown in three demonstrated deployments in medical research, drug development and urban computing using mobile data Open to the community come and play with us "Implementation of the platform, and proving that it (i) solves real problems in lecommunications and life sciences and (ii) outperforms the best currently existing technology"

What’s Wrong With current Semantic Web Reasoning (and how to fix it)

Similar presentations

Presentation on theme: "What’s Wrong With current Semantic Web Reasoning (and how to fix it)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

What’s Wrong With current Semantic Web Reasoning (and how to fix it)

Similar presentations

Presentation on theme: "What’s Wrong With current Semantic Web Reasoning (and how to fix it)"— Presentation transcript:

Similar presentations

About project

Feedback