Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic RDF Octavian Udrea 1 V.S. Subrahmanian 1 Zoran Majkić 2 1 University of Maryland College Park 2 University “La Sapienza”, Rome, Italy.

Similar presentations


Presentation on theme: "Probabilistic RDF Octavian Udrea 1 V.S. Subrahmanian 1 Zoran Majkić 2 1 University of Maryland College Park 2 University “La Sapienza”, Rome, Italy."— Presentation transcript:

1 Probabilistic RDF Octavian Udrea 1 V.S. Subrahmanian 1 Zoran Majkić 2 1 University of Maryland College Park 2 University “La Sapienza”, Rome, Italy

2 Motivation Not all information on the Web is easily expressible in “classic” models (i.e., relational) RDF extraction from text  STORY is the first, very successful prototype  Need to extend RDF with temporal, uncertainty components Goal: build a logical model of RDF with uncertainty and provide query algorithms

3 The Probabilistic RDF idea An RDF theory is a set of triples (subject, property, value)  (USA hasCapital Washington DC),  (Washington DC hasPopulation 500,000) Probabilistic RDF extends this model with uncertainty over the set of values. (USA hasCapital {(Washington DC, 0.95), (State of Washington, 0.05)})

4 Probabilistic RDF example Extracted based on www.wrongdiagnosis.com

5 Probabilistic RDF example

6

7

8 Probabilistic RDF syntax Schema uncertainty:  (c subClassOf (C,δ))  Σ dЄC δ(d) <= 1 Class-instance uncertainty:  (x rdf:type (C,δ))  Σ dЄC δ(d) <= 1 Instance-based uncertainty:  (x p (Y, δ))  Σ yЄY δ(y) <= 1

9 Probabilistic RDF syntax Sanity requirements  (c subClassOf (C 1,δ 1 )), ((c subClassOf (C 2,δ 2 )) => (C 1 = C 2 and δ 1 = δ 2 ) or C 1 ∩ C 2 = Ø  Same applies for other types of uncertainty Transitive properties  Simple inferential capability  Examples: associatedWith, controlledBy P-path:  A set of triples connected by transitive properties

10 Example p-path

11 P-path semantics and t-norms We cannot generally assume independence between triples on a transitive path  Flu, AcuteBronchitis, Pneumonia T-norms are used to express the user’s knowledge of the relationship between triples   is associative, commutative  0  x = 0, 1  x = x  x x  z <= y  w P-Path probability: t-norm applied to individual probabilities on the path

12 Example p-path (Flu, associatedWith, (Pneumonia, 0.455)) w.r.t. the product t-norm

13 pRDF semantics A world W is a set of simple triples (with no probabilities) An interpretation I associates a probability to each world I satisfies a pRDF theory:  For each (s, p, (V,δ)), δ(v) <= Σ I(W), where W contains (s,p,v)  Same applies to paths w.r.t. to a given t-norm

14 pRDF semantics A theory is consistent iff it has a satisfying interpretation  Every pRDF theory is consistent Entailment: T entails T’ iff every satisfying interpretation of T satisfies T’ Closure of a theory: The entire set of triples entailed by the theory  Maximal w.r.t. the probability values

15 pRDF fixpoint semantics The closure operator Δ adds exactly one entailed triple at each step (Flu associatedWith, (Acute Bronchitis,.7)) and (Acute Bronchitis associatedWith (Pneumonia,.65)) yields: (Flu associatedWith, (Pneumonia, 0.455)) w.r.t. the product t-norm Δ has a fixpoint which is the theory closure.

16 pRDF query processing We will consider only simple queries: a triple with a variable term  Example (? associatedWith Pneumonia 4)  What is associated with Pneumonia with probability above.4? Simple method:  Compute the closure  Select any triple in the closure that matches the query  VERY expensive computationally

17 pRDF query processing Set of algorithms for answering simple queries and conjunctions:  pRDF_Subject, pRDF_Property, …, pRDF_conjunction Central idea:  Apply Δ in only those directions that yield tuples relevant to the query  Cut off path computations when the threshold can no longer be reached. min  (current_probability, threshold)

18 Experimental results Implementation  Java, 1700 LOC  Disk-based storage for pRDF theories Synthetically generated datasets  According to varying underlying distributions Datasets extracted from Web sources

19 Experimental questions Does the underlying distribution affect query running time? From a practical point of view, which are the “fastest” types of queries? How does running time vary with the number of atoms in a conjunction? What other theory-dependent factors affect running time?  Theory width  Number of properties

20 Query running time (Poisson)

21 Query running time (zipf)

22 Conjunctive queries running time

23 Dependence on property width

24 Number of properties

25 Take away points RDF syntax with uncertainty Model-theory and fixpoint semantics for pRDF Efficient query algorithms for pRDF

26 The end http://om.umiacs.umd.edu/ Thank you! Questions & comments


Download ppt "Probabilistic RDF Octavian Udrea 1 V.S. Subrahmanian 1 Zoran Majkić 2 1 University of Maryland College Park 2 University “La Sapienza”, Rome, Italy."

Similar presentations


Ads by Google