Chase Methods based on Knowledge Discovery Agnieszka Dardzinska & Zbigniew W. Ras &

Chase Methods based on Knowledge Discovery Agnieszka Dardzinska & Zbigniew W. Ras agnadar@wp.pl & ras@uncc.edu

XFaculty-NameDept.Chair x1x1 Bob x2x2 JohnJones x3x3 Mike x4x4 EE x5x5 TomEE GIVEN:  Incomplete Information System (IIS)  Constraints (functional dependencies,..) which IIS satisfies [ Dept  Chair, Chair  Dept  Faculty-Name, … Dept(x 1 ) =Dept(x 2 ) ] Algorithm Chase

Tableau System for IIS – information system with null values replaced by variables XFaculty NameDepartmentChair x1x1 Bobvdvd n1n1 x2x2 Johnvdvd Jones x3x3 Miken2n2 n3n3 x4x4 vEvE EEn4n4 x5x5 TomEEn5n5

 distinguished variables, one for each attribute (if b is an attribute of interest, then v b is the corresponding distinguished variable)  nondistinguished variables (there are countably many of them: n 1, n 2, n 3, ….) Variables in Tableaux System

XFaculty NameDepartmentChair x1x1 Bobvdvd n1n1 x2x2 Johnvdvd Jones x3x3 Miken2n2 n3n3 x4x4 vEvE EEn4n4 x5x5 TomEEn5n5 XFaculty NameDepartmentChair x1x1 Bobvdvd Jones x2x2 Johnvdvd Jones x3x3 Miken2n2 n3n3 x4x4 TomEEn4n4 x5x5 TomEEn4n4 Functional Dependencies: [Department → Chair] [Department *Chair → Faculty Name]

Input: tableaux system S and set of functional dependencies F Output: tableaux system CHASE F (S) Begin S 1 :=S; while there are t 1, t 2  S 1 and (B  b)  F such that t 1 [B]= t 2 [B] and t 1 [b] < t 2 [b] dochange all the occurrences of the value t 2 [b] in S 1 to t 1 [b] CHASE F (S):=S 1 End Algorithm Chase

Input: tableaux system S and set of functional dependencies F Output: tableaux system CHASE F (S) Begin S 1 :=S; while there are t 1, t 2  S 1 and (B  b)  F such that t 1 [B]= t 2 [B] and t 1 [b] < t 2 [b] dochange all the occurrences of the value t 2 [b] in S 1 to t 1 [b] CHASE F (S):=S 1 End The algorithm always terminates if applied to a finite tableaux system. If one execution of the algorithm generates a tableaux system satisfying F, then every execution of the algorithm generates the same tableaux system. Algorithm Chase

Algorithm Chase 1

1.Chase 1 identifies all incomplete attributes (their values are called concepts) in IS. 2. Main Algorithm - Extraction of rules from IS describing these concepts, - Null values in IS are replaced by values suggested by these rules. 3. These two steps are repeated till fixpoint is reached. Chase supported by rules extracted from IIS (Chase 1)

Xbcdefg x1x1 b1b1 c1c1 e2e2 f1f1 x2x2 b2b2 c2c2 d2d2 e1e1 f2f2 g3g3 x3x3 b1b1 c1c1 d3d3 e1e1 f1f1 g1g1 x4x4 b3b3 c3c3 d3d3 e3e3 f1f1 g3g3 x5x5 b2b2 c2c2 e3e3 f1f1 g2g2 x6x6 c1c1 d2d2 f2f2 g1g1 x7x7 b1b1 d2d2 e2e2 f4f4 g1g1 x8x8 d2d2 e2e2 f2f2 g3g3 x9x9 b3b3 c1c1 d1d1 f2f2 x 10 b2b2 c1c1 e3e3 f4f4 g2g2 X = {x 1, x 2, x 3, x 4, x 5, x 6, x 7, x 8, x 9, x 10 } A = {b, c, d, e, f, g} e 2  b 1 (support 2), c 1  f 1  b 1 (support 2), g 2  b 2 (support 2), c 3  b 3 (support 1), c 2  b 2 (support 2), g 3  d 2  b 2 (support 1), e 3  d 3  b 3 (support 1), f 2  d 2  b 2 (support 1). Attribute b Example (Chase1)

Xbcdefg x1x1 b1b1 c1c1 e2e2 f1f1 x2x2 b2b2 c2c2 d2d2 e1e1 f2f2 g3g3 x3x3 b1b1 c1c1 d3d3 e1e1 f1f1 g1g1 x4x4 b3b3 c3c3 d3d3 e3e3 f1f1 g3g3 x5x5 b2b2 c2c2 e3e3 f1f1 g2g2 x6x6 c1c1 d2d2 f2f2 g1g1 x7x7 b1b1 d2d2 e2e2 f4f4 g1g1 x8x8 d2d2 e2e2 f2f2 g3g3 x9x9 b3b3 c1c1 d1d1 f2f2 x 10 b2b2 c1c1 e3e3 f4f4 g2g2 Attribute b Two null values in S: b(x 6 ), b(x 8 ) b(x 6 ): e 2  b 1 (support 2), c 1  f 1  b 1 (support 2), g 2  b 2 (support 2), c 3  b 3 (support 1), c 2  b 2 (support 2), g 3  d 2  b 2 (support 1), e 3  d 3  b 3 (support 1), f 2  d 2  b 2 (support 1). Example (Chase1)

Xbcdefg x1x1 b1b1 c1c1 e2e2 f1f1 x2x2 b2b2 c2c2 d2d2 e1e1 f2f2 g3g3 x3x3 b1b1 c1c1 d3d3 e1e1 f1f1 g1g1 x4x4 b3b3 c3c3 d3d3 e3e3 f1f1 g3g3 x5x5 b2b2 c2c2 e3e3 f1f1 g2g2 x6x6 c1c1 d2d2 f2f2 g1g1 x7x7 b1b1 d2d2 e2e2 f4f4 g1g1 x8x8 d2d2 e2e2 f2f2 g3g3 x9x9 b3b3 c1c1 d1d1 f2f2 x 10 b2b2 c1c1 e3e3 f4f4 g2g2 Attribute b Two null values in S: b(x 6 ), b(x 8 ) b(x 8 ): e 2  b 1 (support 2), c 1  f 1  b 1 (support 2), g 2  b 2 (support 2), c 3  b 3 (support 1), c 2  b 2 (support 2), g 3  d 2  b 2 (support 1), e 3  d 3  b 3 (support 1), f 2  d 2  b 2 (support 1). b(x 6 ) = b 2 Example (Chase1)

Xbcdefg x1x1 b1b1 c1c1 e2e2 f1f1 x2x2 b2b2 c2c2 d2d2 e1e1 f2f2 g3g3 x3x3 b1b1 c1c1 d3d3 e1e1 f1f1 g1g1 x4x4 b3b3 c3c3 d3d3 e3e3 f1f1 g3g3 x5x5 b2b2 c2c2 e3e3 f1f1 g2g2 x6x6 c1c1 d2d2 f2f2 g1g1 x7x7 b1b1 d2d2 e2e2 f4f4 g1g1 x8x8 d2d2 e2e2 f2f2 g3g3 x9x9 b3b3 c1c1 d1d1 f2f2 x 10 b2b2 c1c1 e3e3 f4f4 g2g2 c(x 7 ): b(x 6 ) = b 2 Two null values in S: c(x 7 ), c(x 8 ). b 1  c 1 (support 2), e 2  c 1 (support 1), f 4  c 1 (support 1), g 1  c 1 (support 2), b 2  d 2  c 2 (support 1), b 2  e 1  c 2 (support 1), b 2  f 2  c 2 (support 1), b 2  g 3  c 2 (support 1), d 2  e 1  c 2 (support 1), d 2  g 3  c 2 (support 1). Example (Chase1)

Xbcdefg x1x1 b1b1 c1c1 e2e2 f1f1 x2x2 b2b2 c2c2 d2d2 e1e1 f2f2 g3g3 x3x3 b1b1 c1c1 d3d3 e1e1 f1f1 g1g1 x4x4 b3b3 c3c3 d3d3 e3e3 f1f1 g3g3 x5x5 b2b2 c2c2 e3e3 f1f1 g2g2 x6x6 c1c1 d2d2 f2f2 g1g1 x7x7 b1b1 d2d2 e2e2 f4f4 g1g1 x8x8 d2d2 e2e2 f2f2 g3g3 x9x9 b3b3 c1c1 d1d1 f2f2 x 10 b2b2 c1c1 e3e3 f4f4 g2g2 c(x 8 ): c(x 7 ) = c 1 Two null values in S: c(x 7 ), c(x 8 ). b 1  c 1 (support 2), e 2  c 1 (support 1), f 4  c 1 (support 1), g 1  c 1 (support 2), b 2  d 2  c 2 (support 1), b 2  e 1  c 2 (support 1), b 2  f 2  c 2 (support 1), b 2  g 3  c 2 (support 1), d 2  e 1  c 2 (support 1), d 2  g 3  c 2 (support 1). b(x 6 ) = b 2, Example (Chase1)

Input: System S=(X, A, V) Set of incomplete attributes In(A)={a 1, a 2, …, a k } Set of rules L(D) Output: System Chase1(S) begin j:=1; while j ≤ k do begin S j :=S for all v  V aj do while there is x  X and rule (t  v)  L(D) such that x  N Sj (t) and card(a j(x) )≠1 begin a(x):=v; end j:=j+1 end S:=  {S j :1 ≤ j ≤ k}, Chase1 (S, In(A), L(D)) end Algorithm Chase1(S, In(A), L(D))

A1- “the no. of different attributes used in a query" A2- “the percent of null values in a queried IS" A3- “the no. of objects returned by QAS when IS-complete" A4- “the no. of objects returned by QAS when IS-incomplete" (optimistic interpretation) A5- “the no. of objects returned by QAS based on rule-based chase algorithm" (pessimistic interpretation) A6- “the no. of bad objects retrieved" A7- “the no. of passes of rule-based chase algorithm" ZOO Database

Rules Discovery from partially Incomplete Information Systems

Information System S = ( X, A, V ) X - finite set of objects, A - finite set of attributes, - set of their values. Assumption 1. For any, 2. For any Data (Incomplete)

x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX Example

Goal: Describe e in terms of {a,b,c,d} x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX Algorithm ERID for Extracting Rules from partially Incomplete Information System

Goal: Describe e in terms of {a,b,c,d} x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX For the values of the decision attribute we have: Algorithm ERID

Goal: Describe e in terms of {a,b,c,d}. x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX 2.Check the relationship “ ” between values of classification attributes {a,b,c,d} and values of decision attribute e Algorithm ERID

Goal: Describe e in terms of {a,b,c,d} x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX Let,. and confidence of the rule are above some threshold values. We say that: iff support Algorithm ERID

Goal: Describe e in terms of {a,b,c,d} x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX Let,. and confidence of the rule are above some threshold values. We say that: iff support How to define support and confidence of a rule ? Algorithm ERID

To define support and confidence of the rule a 1  e 3 we compute: Support of the rule: Support of the term a 1 : Confidence of the rule: Definition of Support and Confidence (by example)

Goal: Describe e in terms of {a,b,c,d} x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX - marked negative - marked positive Thresholds (provided by user): Minimal support (λ 1 = 1) Minimal confidence (λ 2 = ½) - marked negative Extracting Rules from partially Incomplete Information System (Algorithm ERID(λ1, λ2))

x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX but Algorithm ERID(λ1, λ2)

x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX but They all are not marked Algorithm ERID(λ1, λ2)

x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX and Algorithm ERID(λ1, λ2)

x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX and They all are marked positive. Algorithm ERID(λ1, λ2)

x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX and They all are marked positive. They all are marked negative. Algorithm ERID(λ1, λ2)

x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX The algorithm continues for terms of length 3, 4, … till all of them have either positive or negative marks. Rules are automatically constructed from relations marked positive. Algorithm ERID(λ1, λ2)

Algorithm Chase 2 (for Partially IIS)

- partially incomplete information system of type λ, if S is incomplete and the following three conditions hold:  is defined for any,   Algorithm Chase 2

S 1, S 2 - partially incomplete, both of type λ and both classifying the same sets of objects (from X) using the same sets of attributes (A) Let and The pair (S 1, S 2 ) satisfies containment relation Ψ (or Ψ(S 1 )= S 2 ) if:   We also denote that fact by Algorithm Chase 2

Assumptions:  - information system of type λ  - set of all pairwise independent rules extracted by ERID from S  N S (t) - standard interpretation of term t in S, meaning that:, for any      where for any, we have:  In(A) = {a 1, …, a k } - incomplete attributes in S

.................. for all do begin if and is a maximal subset of rules from L(D) such that then if then begin end p j := p j +n j ; end if / containment relation holds between a j (x), [b j (x)/p j ] / then.................. Algorithm Chase 2 (S, In(A), L(D))

x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX Incomplete Information System S of type λ = 0.3 λ 1 =1, λ 2 =0.5 ERID(λ 1, λ 2 )

x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX Incomplete Information System S of type λ = 0.3 λ 1 =1, λ 2 =0.5 ERID(λ 1, λ 2 ) Algorithm Chase 2 will try to replace by e new (x 1 ) = {(e 1, ), (e 2, ), (e 3, )}. We will show that Ψ(e(x 1 )) = e new (x 1 ) (the value e(x 1 ) will be changed by Chase 2).

x8x8 x7x7 x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 edcbaX For x1 : we have: Because the confidence assigned to e 3 is below the threshold λ, then only two values remain: (e 1, 0.48), (e 2, 0.97). The value of attribute e assigned to x 1 is: {(e 1, 0.33), (e 2, 0.67)}. Incomplete Information System S of type λ = 0.3

Distributed Chase Algorithm (Chase 3)

Let: - distributed autonomous information systems - information system for any (I - set of sites) - knowledge-base at site - set of (k, i)-rules (constructed at site k and sent to site i)

Strategy for constructing knowledge-base and Algorithm Chase 3 Notation: q S =[a, b, c : d, e] - request by S for definitions of a, b, c with additional information that d, e are complete attributes in S.

Global Ontology gabc g1g1 b2b2 g1g1 a2a2 b1b1 c2c2 g1g1 a2a2 c1c1 g1g1 a1a1 b1b1 c1c1 S2S2 bade a1a1 d2d2 b2b2 a2a2 d2d2 e2e2 b1b1 a2a2 d1d1 e1e1 d1d1 S1S1 abcd a1a1 b2b2 b1b1 c2c2 a2a2 b2b2 d2d2 a2a2 b1b1 c1c1 rulesupportsystem S KB KBS r1r1 r2r2 q S =[a, c, d : b] qSqS

Global Ontology gabc g1g1 b2b2 g1g1 a2a2 b1b1 c2c2 g1g1 a2a2 c1c1 g1g1 a1a1 b1b1 c1c1 S2S2 bade a1a1 d2d2 b2b2 a2a2 d2d2 e2e2 b1b1 a2a2 d1d1 e1e1 d1d1 S1S1 abcd a1a1 b2b2 b1b1 c2c2 a2a2 b2b2 d2d2 a2a2 b1b1 c1c1 rulesupportsystem b1a2b1a2 1S b2*d2a2b2*d2a2 1S b2a2b2a2 1S1S1 c1*b1a1c1*b1a1 1S2S2 S r1r1 r2r2 KBS r1r1 r2r2 q S =[a, c, d : b] qSqS

Assumption:. D i - granularity level of values of attributes used in rules from D i may differ from the granularity level of values of attribute used in descriptions of objects in Chase 3 algorithm to be applicable to S i has to be based on rules from D i satisfying the following two conditions:.  attribute value used in the decision part of a rule has the granularity level either equal to or finer than the granularity level of the corresponding attribute in S i  the granularity level of any attribute used in the classification part of a rule is either equal or softer than the granularity level of the corresponding attribute in S i

Hierarchical attributes: age, salary Rule in D i: (age, young)  (salary, 40k) age young middle-aged old salary low medium high 18 … 29 30 … 60 61 … 80 10k…40k 50k 60k 70k 80k…100k Example

Assumption: tuple t in S i supports rule. 1. An overlapping attribute between rule and the tuple is the decision attribute in. If two attributes, involved in that match, have different granularities, then the decision value d has to be replaced by a softer value which granularity will match the granularity of the corresponding attribute in S i. 2. An overlapping attribute between rule and the tuple is the classification attribute in. If two attributes, involved in that match, have different granularities, then the value of attribute a has to be replaced by a finer value which granularity will match the granularity of a in S i. Two cases: Algorithm Chase 3 (Construction of new D i followed by Chase 2)

Chase 4 (All Information Systems are equally involved in chase)

Chase Methods based on Knowledge Discovery Agnieszka Dardzinska & Zbigniew W. Ras &

Similar presentations

Presentation on theme: "Chase Methods based on Knowledge Discovery Agnieszka Dardzinska & Zbigniew W. Ras &"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chase Methods based on Knowledge Discovery Agnieszka Dardzinska & Zbigniew W. Ras &

Similar presentations

Presentation on theme: "Chase Methods based on Knowledge Discovery Agnieszka Dardzinska & Zbigniew W. Ras &"— Presentation transcript:

Similar presentations

About project

Feedback