# Selectivity Estimation Example Mohammad Farhan Husain.

## Presentation on theme: "Selectivity Estimation Example Mohammad Farhan Husain."— Presentation transcript:

Selectivity Estimation Example Mohammad Farhan Husain

Example Data SubjectPredicateObject R1P1L1 R2P1L2 R3P1R4 R5P1R2 R6P1L3 R7P2L4 R8P2R1 R3P2L5 R1, R2, …, R8 are resources i.e. URIs P1 and P2 are predicates, also URIs L1, L2, …, L5 are literals R = Total number of unique resources = 8 T = Total number of triples = 8 T P1 = Total number of triples having predicate P1 = 5 T P2 = Total number of triples having predicate P2 = 3 For any query: Selectivity of a bound subject s = sel(s) = 1 / R = 1 / 8 = 0.125 Selectivity of predicate P1 = sel(P1) = T P1 / T = 5 / 8 = 0.625 Selectivity of predicate P2 = sel(P2) = T P2 / T = 3 / 8 = 0.375 Selectivity of unbound subject and predicate and object = 1.0

Example Histogram for P1 Suppose there is a hash function which assigns the object values of triples having predicate P1 in two bins in the following manner: Bin 1 contains: L1, L2 and R2 Bin 2 contains: R4 and L3

Example Histogram for P2 Suppose the same hash function assigns the object values of triples having predicate P2 in two bins in the following manner: Bin 1 contains: L5 Bin 2 contains: L4 and R1

Estimation Approach – Base Equations EquationNotes sel(t) = sel(s) * sel(p) * sel(o)t refers to a triple pattern sel(s) = 1/RR - No. of unique Resources in knowledge store sel(p) = Tp/T T – Total No. of triples, Tp – Triples matching predicate p sel(o) = hc(p,oc)/Tpwhere (p,oc) represents the class of the histogram for predicate p in which object o falls sel(?a) = 1when ?a is unbound subject, predicate, or object

Selectivity Estimation for Triple Pattern Example with Bound Predicate Triple Pattern: ?s P1 L2 Estimated selectivity = sel(s) x sel(P1) x sel(L2) = 1.0 x 0.625 x sel(P1, L2) = 1.0 x 0.625 x (h 1 (P1, L2) / T P1 ) = 1.0 x 0.625 x (Height of Bin 1 / T P1 ) = 1.0 x 0.625 x (3 / 5) = 0.375 Here, h 1 (P1, L2) denotes the bin of the histogram of predicate P1 where the hash function puts L2 in.

Selectivity Estimation for Triple Pattern Example with Unbound Predicate Triple Pattern: ?s ?p L2 Estimated selectivity = sel(s) x sel(p) x sel(L2) = 1.0 x 1.0 x {∑ Pi ϵ P sel(Pi, L2)} = 1.0 x 1.0 x {sel(P1, L2) + sel(P2, L2)} = 1.0 x 1.0 x {h 1 (P1, L2) / T P1 + h 1 (P2, L2) / T P2 } = 1.0 x 1.0 x {Height of Bin 1 of P1 Histogram / T P1 + Height of Bin 1 of P2 Histogram / T P2 } = 1.0 x 1.0 x {3 / 5 + 1 / 3} = 0.933 Note that the hash function always puts the value L2 into bin 1. That is why we pick the height of Bin 1 of the histogram for P2 even though P2 does not have the value L2 as its object in any of the triples.

Selectivity Estimation for Triple Pattern Example with Unbound Object Triple Pattern: ?s P1 ?o Estimated selectivity = sel(s) x sel(P1) x sel(o) = 1.0 x 0.625 x 1.0 = 0.625