Presentation is loading. Please wait.

Presentation is loading. Please wait.

based on Knowledge Discovery

Similar presentations


Presentation on theme: "based on Knowledge Discovery"— Presentation transcript:

1 based on Knowledge Discovery
Chase Methods based on Knowledge Discovery Zbigniew W. Ras

2 Algorithm Chase GIVEN:  Incomplete Information System (IIS)
Constraints (functional dependencies,..) which IIS satisfies [ Dept  Chair, Chair  Dept  Faculty-Name, … Dept(x1) =Dept(x2) ] X Faculty-Name Dept. Chair x1 Bob x2 John Jones x3 Mike x4 EE x5 Tom

3 Tableau System for IIS –
information system with null values replaced by variables X Faculty Name Department Chair x1 Bob vd n1 x2 John Jones x3 Mike n2 n3 x4 vE EE n4 x5 Tom n5

4 Variables in Tableaux System
distinguished variables, one for each attribute (if b is an attribute of interest, then vb is the corresponding distinguished variable) nondistinguished variables (there are countably many of them: n1, n2, n3, ….)

5 Functional Dependencies:
X Faculty Name Department Chair x1 Bob vd n1 x2 John Jones x3 Mike n2 n3 x4 vE EE n4 x5 Tom n5 Functional Dependencies: X Faculty Name Department Chair x1 Bob vd Jones x2 John x3 Mike n2 n3 x4 Tom EE n4 x5 [Department → Chair] [Department *Chair → Faculty Name]

6 Algorithm Chase Begin S1:=S; End
Input: tableaux system S and set of functional dependencies F Output: tableaux system CHASEF(S) Begin S1:=S; while there are t1, t2  S1 and (B  b)  F such that t1[B]= t2[B] and t1[b] < t2[b] do change all the occurrences of the value t2[b] in S1 to t1[b] CHASEF(S):=S1 End

7 Algorithm Chase S1:=S; Begin End
Input: tableaux system S and set of functional dependencies F Output: tableaux system CHASEF(S) Begin S1:=S; while there are t1, t2  S1 and (B  b)  F such that t1[B]= t2[B] and t1[b] < t2[b] do change all the occurrences of the value t2[b] in S1 to t1[b] CHASEF(S):=S1 End The algorithm always terminates if applied to a finite tableaux system. If one execution of the algorithm generates a tableaux system satisfying F, then every execution of the algorithm generates the same tableaux system.

8 Algorithm Chase 1

9 Chase supported by rules extracted
from IIS (Chase 1) Chase 1 identifies all incomplete attributes (their values are called concepts) in IS . 2. Main Algorithm - Extraction of rules from IS describing these concepts, - Null values in IS are replaced by values suggested by these rules. 3. These two steps are repeated till fixpoint is reached.

10 Example (Chase1) X = {x1, x2, x3, x4, x5, x6, x7, x8, x9, x10}
b c d e f g x1 b1 c1 e2 f1 x2 b2 c2 d2 e1 f2 g3 x3 d3 g1 x4 b3 c3 e3 x5 g2 x6 x7 f4 x8 x9 d1 x10 X = {x1, x2, x3, x4, x5, x6, x7, x8, x9, x10} A = {b, c, d, e, f, g} Attribute b e2  b1 (support 2), c1 f1  b1 (support 2), g2  b2 (support 2), c3  b3 (support 1), c2  b2 (support 2), g3d2  b2 (support 1), e3d3  b3 (support 1), f2d2  b2 (support 1).

11 Example (Chase1) Attribute b Two null values in S: b(x6), b(x8) b(x6):
d e f g x1 b1 c1 e2 f1 x2 b2 c2 d2 e1 f2 g3 x3 d3 g1 x4 b3 c3 e3 x5 g2 x6 x7 f4 x8 x9 d1 x10 Attribute b Two null values in S: b(x6), b(x8) b(x6): e2  b1 (support 2), c1 f1  b1 (support 2), g2  b2 (support 2), c3  b3 (support 1), c2  b2 (support 2), g3d2  b2 (support 1), e3d3  b3 (support 1), f2d2  b2 (support 1).

12 Example (Chase1) Attribute b Two null values in S: b(x6), b(x8) b(x6):
d e f g x1 b1 c1 e2 f1 x2 b2 c2 d2 e1 f2 g3 x3 d3 g1 x4 b3 c3 e3 x5 g2 x6 x7 f4 x8 x9 d1 x10 Attribute b Two null values in S: b(x6), b(x8) b(x6): e2  b1 (support 2), c1 f1  b1 (support 2), g2  b2 (support 2), c3  b3 (support 1), c2  b2 (support 2), g3d2  b2 (support 1), e3d3  b3 (support 1), f2d2  b2 (support 1).

13 Example (Chase1) Attribute b Two null values in S: b(x6), b(x8) b(x8):
d e f g x1 b1 c1 e2 f1 x2 b2 c2 d2 e1 f2 g3 x3 d3 g1 x4 b3 c3 e3 x5 g2 x6 x7 f4 x8 x9 d1 x10 Attribute b Two null values in S: b(x6), b(x8) b(x8): e2  b1 (support 2), c1 f1  b1 (support 2), g2  b2 (support 2), c3  b3 (support 1), c2  b2 (support 2), g3d2  b2 (support 1), e3d3  b3 (support 1), f2d2  b2 (support 1). b(x6) = b2

14 Example (Chase1) Two null values in S: c(x7), c(x8). c(x7):
b c d e f g x1 b1 c1 e2 f1 x2 b2 c2 d2 e1 f2 g3 x3 d3 g1 x4 b3 c3 e3 x5 g2 x6 x7 f4 x8 x9 d1 x10 Two null values in S: c(x7), c(x8). c(x7): b1  c1 (support 2), e2  c1 (support 1), f4  c1 (support 1), g1  c1 (support 2), b2 d2  c2 (support 1), b2e1  c2 (support 1), b2f2  c2 (support 1), b2g3  c2 (support 1), d2e1  c2 (support 1), d2g3  c2 (support 1). b(x6) = b2

15 Example (Chase1) Two null values in S: c(x7), c(x8). c(x7):
b c d e f g x1 b1 c1 e2 f1 x2 b2 c2 d2 e1 f2 g3 x3 d3 g1 x4 b3 c3 e3 x5 g2 x6 x7 f4 x8 x9 d1 x10 Two null values in S: c(x7), c(x8). c(x7): b1  c1 (support 2), e2  c1 (support 1), f4  c1 (support 1), g1  c1 (support 2), b2 d2  c2 (support 1), b2e1  c2 (support 1), b2f2  c2 (support 1), b2g3  c2 (support 1), d2e1  c2 (support 1), d2g3  c2 (support 1). b(x6) = b2

16 Example (Chase1) Two null values in S: c(x7), c(x8). c(x8):
b c d e f g x1 b1 c1 e2 f1 x2 b2 c2 d2 e1 f2 g3 x3 d3 g1 x4 b3 c3 e3 x5 g2 x6 x7 f4 x8 x9 d1 x10 Two null values in S: c(x7), c(x8). c(x8): b1  c1 (support 2), e2  c1 (support 1), f4  c1 (support 1), g1  c1 (support 2), b2 d2  c2 (support 1), b2e1  c2 (support 1), b2f2  c2 (support 1), b2g3  c2 (support 1), d2e1  c2 (support 1), d2g3  c2 (support 1). b(x6) = b2 , c(x7) = c1

17 Algorithm Chase1(S, In(A), L(D))
Input: System S=(X, A, V) Set of incomplete attributes In(A)={a1, a2, …, ak} Set of rules L(D) Output: System Chase1(S) begin j:=1; while j ≤ k do begin Sj:=S for all vVaj do while there is xX and rule (t v)L(D) such that xNSj(t) and card(aj(x))≠1 a(x):=v; end j:=j+1 S:={Sj:1 ≤ j ≤ k}, Chase1 (S, In(A), L(D))

18 ZOO Database A1- “the no. of different attributes used in a query"
A2- “the percent of null values in a queried IS" A3- “the no. of objects returned by QAS when IS-complete" A4- “the no. of objects returned by QAS when IS-incomplete" (optimistic interpretation) A5- “the no. of objects returned by QAS based on rule-based chase algorithm" (pessimistic interpretation) A6- “the no. of bad objects retrieved" A7- “the no. of passes of rule-based query A1 A2 A3 A4 A6 A7 A5 4 2 8 q1 q2 3 13 12 1 5 14 15 17 9 27 16 q3 22 25 32 30 28 23 ZOO Database

19 Rules Discovery from partially Incomplete Information Systems

20 Data (Incomplete) Information System S = ( X, A, V )
X - finite set of objects, A - finite set of attributes, - set of their values. Assumption 1. For any , 2. For any

21 Example x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X

22 Algorithm ERID for Extracting Rules
from partially Incomplete Information System Goal: Describe e in terms of {a,b,c,d} x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X

23 Algorithm ERID for Extracting Rules
from partially Incomplete Information System Goal: Describe e in terms of {a,b,c,d} x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X

24 Algorithm ERID Goal: Describe e in terms of {a,b,c,d} x8 x7 x6 x5 x4
For the values of the decision attribute we have:

25 Algorithm ERID Goal: Describe e in terms of {a,b,c,d}. x8 x7 x6 x5 x4
Check the relationship “ ” between values of classification attributes {a,b,c,d} and values of decision attribute e

26 Algorithm ERID Goal: Describe e in terms of {a,b,c,d} x8 x7 x6 x5 x4
Let , We say that: iff support and confidence of the rule are above some threshold values.

27 support and confidence
Algorithm ERID Goal: Describe e in terms of {a,b,c,d} x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X Let , We say that: iff support and confidence of the rule are above some threshold values. How to define support and confidence of a rule ?

28 Definition of Support and Confidence
(by example) To define support and confidence of the rule a1  e3 we compute: Support of the rule: Support of the term a1: Confidence of the rule:

29 Extracting Rules from partially Incomplete Information System
(Algorithm ERID(λ1, λ2)) x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X Goal: Describe e in terms of {a,b,c,d} Thresholds (provided by user): Minimal support (λ1 = 1) Minimal confidence (λ2 = ½) - marked negative - marked negative - marked positive

30 Algorithm ERID(λ1, λ2) but x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X but but

31 Algorithm ERID(λ1, λ2) but x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X but but
They all are not marked

32 Algorithm ERID(λ1, λ2) x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X and and

33 Algorithm ERID(λ1, λ2) x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X and and
They all are marked positive.

34 Algorithm ERID(λ1, λ2) x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X and and
They all are marked positive. They all are marked negative.

35 Algorithm ERID(λ1, λ2) x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X
The algorithm continues for terms of length 3, 4, … till all of them have either positive or negative marks. Rules are automatically constructed from relations marked positive.

36 Algorithm Chase 2 (for Partially IIS)

37 Algorithm Chase 2 partially incomplete information system of type λ, if S is incomplete and the following three conditions hold: is defined for any ,

38 Algorithm Chase 2 S1, S2 - partially incomplete, both of type λ and both classifying the same sets of objects (from X) using the same sets of attributes (A) Let and The pair (S1, S2) satisfies containment relation Ψ (or Ψ(S1)= S2) if: We also denote that fact by

39 x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X System S1 System S2 c2

40 - information system of type λ - set of all pairwise
Assumptions: - information system of type λ - set of all pairwise independent rules extracted by ERID from S NS(t) - standard interpretation of term t in S, meaning that: , for any where for any , we have: In(A) = {a1, … , ak} - incomplete attributes in S

41 Algorithm Chase 2 (S, In(A), L(D))
for all do begin if and is a maximal subset of rules from L(D) such that then if then begin end pj:= pj +nj; if /containment relation holds between aj(x), [bj(x)/pj]/ then

42 Incomplete Information System S
x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X ERID(λ1, λ2) λ1=1, λ2=0.5 Incomplete Information System S of type λ = 0.3

43 Incomplete Information System S
x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X ERID(λ1, λ2) λ1=1, λ2=0.5 Algorithm Chase 2 will try to replace by enew(x1) = {(e1, ), (e2, ), (e3, )}. We will show that Ψ(e(x1)) = enew(x1) (the value e(x1) will be changed by Chase 2). Incomplete Information System S of type λ = 0.3

44 Incomplete Information System S
x8 x7 x6 x5 x4 x3 x2 x1 e d c b a X For x1: we have: Because the confidence assigned to e3 is below the threshold λ, then only two values remain: (e1, 0.48), (e2, 0.97). The value of attribute e assigned to x1 is: {(e1, 0.33), (e2, 0.67)}. Incomplete Information System S of type λ = 0.3

45 Distributed Chase Algorithm

46 Let: - distributed autonomous information systems - information system for any (I - set of sites) - knowledge-base at site set of (k, i)-rules (constructed at site k and sent to site i)

47 Strategy for constructing knowledge-base and Algorithm Chase 3
Notation: qS=[a, b, c : d, e] - request by S for definitions of a, b, c with additional information that d, e are complete attributes in S.

48 Global Ontology S2 S1 S KB KBS g a b c b a d e a b c d rule support
qS qS=[a, c, d : b] S a b c d a1 b2 b1 c2 a2 d2 c1 rule support system KB KBS

49 Global Ontology S2 S1 S KBS g a b c b a d e a b c d rule support
qS qS=[a, c, d : b] S a b c d a1 b2 b1 c2 a2 d2 c1 rule support system b1a2 1 S b2*d2a2 b2a2 S1 c1*b1a1 S2 r1 r2 KBS

50 Assumption: Di - granularity level of values of attributes used in rules from Di may differ from the granularity level of values of attribute used in descriptions of objects in Chase 3 algorithm to be applicable to Si has to be based on rules from Di satisfying the following two conditions: . attribute value used in the decision part of a rule has the granularity level either equal to or finer than the granularity level of the corresponding attribute in Si . the granularity level of any attribute used in the classification part of a rule is either equal or softer than the granularity level of the corresponding attribute in Si

51 Example Hierarchical attributes: age, salary
Rule in Di: (age, young)  (salary, 40k) age young middle-aged old salary low medium high 18 … … … 80 10k…40k 50k 60k 70k 80k…100k

52 (Construction of new Di followed by Chase 2)
Algorithm Chase 3 (Construction of new Di followed by Chase 2) Assumption: tuple t in Si supports rule Two cases: 1. An overlapping attribute between rule and the tuple is the decision attribute in If two attributes, involved in that match, have different granularities, then the decision value d has to be replaced by a softer value which granularity will match the granularity of the corresponding attribute in Si. 2. An overlapping attribute between rule and the tuple is the classification attribute in If two attributes, involved in that match, have different granularities, then the value of attribute a has to be replaced by a finer value which granularity will match the granularity of a in Si.

53 (All Information Systems are equally
Chase 4 (All Information Systems are equally involved in chase)

54 S3 S2 KB S1 qS2 qS3 qS1 qS1=[a, c, d : b] qS2=[b, a, e : d]
g e d S3 S2 KB qS2 qS3 S1 qS1 qS1=[a, c, d : b] qS2=[b, a, e : d] qS3=[a, b, c : g]

55 S3 S2 KB S1 qS2 qS3 qS1 qS1=[a, c, d : b] qS2=[b, a, e : d]
g e d S3 S2 r1, r2 r5, r6 KB r3, r4 qS2 qS3 r3, r4 – extracted from S3 r1, r2 – extracted from S2 S1 r5, r6 – extracted from S1 qS1 qS1=[a, c, d : b] qS2=[b, a, e : d] qS3=[a, b, c : g]

56


Download ppt "based on Knowledge Discovery"

Similar presentations


Ads by Google