Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)

Similar presentations


Presentation on theme: "An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)"— Presentation transcript:

1 An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)

2 Overview 1.Overview of LAD 2.Decomposability -Importance & motivation 3.An index of decomposability -#data vectors needed to extract reliable decomposable structures -Based on probabilistic analyses 4.Numerical experiments 5.Conclusion

3 Logical Analysis of Data (LAD) Input: Output: discriminant function T: positive examples (the phenomenon occurs) F: negative examples (the phenomenon does not occur) f(x): a logical explanation of the phenomenon For a phenomenon

4 Example: influenza FeverHeadacheCoughSnivelStomachache 11011 10111 11110 10011 11000 01011 : Set of patients having influenza : Set of patients having common cold An example of discriminant functions: 1=Yes, 0=No Discriminant function f (x) represents knowledge “influenza”. One kind of knowledge acquisition

5 Guideline to find a discriminant function Simplicity Explain the structure of the phenomenon

6 x1x1 x2x2 x3x3 x4x4 x5x5 h(x[S 1 ]) T 110111 101111 111101 F 100110 110001 010111 Decomposability S 0  {1, 4, 5} h(x[S 1 ])  x 2  x 3 f (x)  x 1 x 2 x 4  x 1 x 3 x 4    x 1 x 4 h(x[S 1 ]) decomposable! S 1  {2, 3} f is decomposable  f (x)  g(x[S 0 ], h(x[S 1 ])) (T, F) is decomposable   decomposable discriminant f

7 Example: concept of “square” i1110 ii1111 iii0110 iv1001 v1101 : the lengths of all edges are equal : the number of vertices is 4 : contains a right angle : the area is over 100 iii iv i ii v

8 Example: concept of “square” Square - the lengths of all edges are equal - the number of vertices is 4 - contains a right angle Square - rhombus - the lengths of all edges are equal - the number of vertices is 4

9 Hierarchical structures and decomposable structures Concept attribute

10 Hierarchical structures and decomposable structures Concept attribute Sub-Concept

11 Previous research on decomposability Finding basic decomposable functions (e.g, ) for given and attribute sets case: polynomial time [Boros, et al. 1994] Finding other classes (positive, Horn, and their mixtures ) of decomposable functions for and attribute set [Makino, et al. 1995] Finding a (positive) decomposable function for given ( is not given) NP-hard proposing a heuristic algorithm [Ono, et al. 1999]

12 The number of data and decomposable structures Case 1: The size of given data is small. –Advantage: Less computational time is needed to find a decomposable structure. –Disadvantage: Decomposable structures easily exist in data (because of less constraints) = Most decomposable structures are deceptive.

13 The number of data and decomposable structures Case 2: The size of given data is large. –Advantage: Deceptive decomposable structures will not be found. –Disadvantage: More computational time is needed. How many data vectors should be prepared to extract real decomposable structures? Index of decomposability

14 (T, F) is decomposable conflict graph of (T, F) is bipartite Overview of our approach Assume that (T, F) is the set of l randomly chosen vectors from {0, 1} n. 1.Compute the probability of an edge to appear in the conflict graph 2.Regard the conflict graph as a random graph Investigate the probability of the conflict graph to be non-bipartite

15 Conflict graph 1 0 01 0 1 01 0 1 0 01 0 1 0 00 1 0 1 01 0 1 00 1 00 01 11 10 Conflict graph (T, F) is decomposable conflict graph of (T, F) is bipartite

16 Probability of an edge to appear in conflict graph There exists a linked pair. A pair of vectors is called linked if

17 Define a random variable by where edge appears in the conflict graph. We want to compute. There exists a linked pair.

18 Assumptions Generation of (T, F) - |T| + |F| = l vectors are randomly sampled from {0, 1} n without replacement. - A sampled vector is in T with probability p, and in F with probability q  1  p. M  2 n

19 How to compute is easier to compute. 1. Both of 2. They have different values (i.e., 0 and 1). 2. 1.

20 Upper and lower bounds on By Markov’s inequality and linearity of expectation, By the principle of inclusion and exclusion, Upper Bound Lower Bound

21 Approximation of

22 Random graph In our analysis, is assumed to be the probability of an edge to appear in the conflict graph. Random graph G(N, r) - N: the number of vertices - Each edge e  (u, v) appears in G(N, r) with probability r independently

23 Probability of a random graph to be non-bipartite Y odd : Random variable representing the number of odd cycles in G(N, r) Pr(Y odd  1): Probability that G(N, r) is not bipartite Markov’s inequality The number of sequences of k vertices

24 Taylor series of ln(1  z) Upper bound:

25 Lower bound when Nr  1: For sufficiently large N, (c  [0, 1) and   (0, 0.5) are constants)

26 Assumptions Our index Probability of an edge to appear in conflict graph Threshold for a random graph to be bipartite or not - probabilities p and q are given by p : q  |T| : |F| - conflict graph is a random graph (|S 0 |  |S 1 |  n)

27 Our index If, tends to have many deceptive decomposable structures. If tends to have no deceptive decomposable structure.

28 Numerical Experiments 1.Prepare non-decomposable randomly generated functions and construct 10 for each data size ( ) 2.Check their decomposability Randomly generated data Target functions are not decomposable Dimensions of data are n  10, 20 Two types of data: are biased and not biased

29 Randomly generated data our index Sampling ratio (%) Ratio of decomposable (T, F)s (%)

30 Randomly generated data Sampling ratio (%) Ratio of decomposable (T, F)s (%) our index

31 Breast Cancer in Wisconsin (a.k.a BCW) Already binarized The dimension is n  11 Comparison with randomly generated data with the same n, p and q Real-world data

32 BCW and randomly generated data BCWRandomly generated data Sampling ratio (%) Ratio of decomposable (T, F)s (%) our index

33 Discussion and conclusion An index to extract reliable decomposable structures Computational experiments on random & real-world data - proposed index is a good estimate - |S 0 |  1 or |S 1 |  2  threshold behavior is not clear

34 Future work Analyses on sharpness of the threshold behavior: to know sufficient |T| + |F| to extract reliable decomposable structures Apply similar approach to other classes of Boolean functions |T|  |F| #decomposable structures proposed index we want to estimate


Download ppt "An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)"

Similar presentations


Ads by Google