Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tadeusz Łuba Institute of Telecommunications Methods of Logic Synthesis and Their Application in Data Mining Prezentacja wygłaszana na KNU Daegu (Korea),

Similar presentations


Presentation on theme: "Tadeusz Łuba Institute of Telecommunications Methods of Logic Synthesis and Their Application in Data Mining Prezentacja wygłaszana na KNU Daegu (Korea),"— Presentation transcript:

1 Tadeusz Łuba Institute of Telecommunications Methods of Logic Synthesis and Their Application in Data Mining Prezentacja wygłaszana na KNU Daegu (Korea), Warsaw University of Technology 1 Faculty of Electronics and Information Technology

2 Logic synthesis vs. Data Mining 2 applicability of the logic synthesis algorithms in Data Mining data mining extends the application of LS: – medicine – pharmacology – banking – linguistics – telecommunication – environmental engineering Data Mining is the process of automatic discovery of significant and previously unknown information from large databases

3 Data mining is also called knowledge discovery in databases It is able to diagnose the patient It is able to decide of granting a loan to the bank customer It is able to classify data It is able to make a survey 3

4 Gaining the knowledge from databases at the abstract level of data mining algorithms - it means using the procedure of: – reduction of attributes, – generalization of decision rules, – making a hierarchical decision These algorithms are similar to those used in logic synthesis! 4

5 Data mining vs. logic synthesis  generalization of decision rules  reduction of attributes  hierarchical decision making  minimization of the Boolean function  reduction of arguments  functional decomposition 5 ( rule induction) (logic minimization)

6 Data mining systems Rough Set Toolkit for Analysis of Data: Biomedical Centre (BMC), Uppsala, Sweden. ROSETTA 6

7 Breast Cancer Database: Number of instances: 699 training cases Number of attributes : 10 Classification (2 classes) Sources: Dr. WIlliam H. Wolberg (physician); University of Wisconsin Hospital ;Madison; Wisconsin; USA 1.Clump Thickness 2.Uniformity of Cell Size 3.Uniformity of Cell Shape …. 9. Mitoses 7 Diagnosis of breast cancer

8 Breast Cancer database IDx1x2x3x4x5x6x7x8x9x  8

9 RULE_SET breast_cancer RULES 35 (x9=1)&(x8=1)&(x2=1)&(x6=1)=>(x10=2) (x9=1)&(x2=1)&(x3=1)&(x6=1)=>(x10=2) (x9=1)&(x8=1)&(x4=1)&(x3=1)=>(x10=2) (x9=1)&(x4=1)&(x6=1)&(x5=2)=>(x10=2) ………………….. (x9=1)&(x6=10)&(x1=10)=>(x10=4) (x9=1)&(x6=10)&(x5=4)=>(x10=4) (x9=1)&(x6=10)&(x1=8)=>(x10=4) REDUCTS (27) { x1, x2, x3, x4, x6 } { x1, x2, x3, x5, x6 } { x2, x3, x4, x6, x7 } { x1, x3, x4, x6, x7 } { x1, x2, x4, x6, x7 } ……………. { x3, x4, x5, x6, x7, x8 } { x3, x4, x6, x7, x8, x9 } { x4, x5, x6, x7, x8, x9 }

10 Increasing requirements References: [1] We are overwhelmed with data! 10

11 11 UC Irvine Machine Learning Repository Breast Cancer Database – 10 attr Audiology Database – 71 attr Dermatology Database – 34 attr Are existing methods and algorithms for data mining sufficiently efficient? Why does it take place? How these algorithms can be improved?

12 12 Classic method Discernibility matrix (DM) Discernibility function (DF) Conjunction of clauses Clause is a disjunction of attributes The key issue is to transform the DF: CNF DNF Every monomial corresponds to a reduct NP- HARD References: [9]

13 The method can be significantly improved... by using the typical logic synthesis procedure for Boolean function complementation 13 Instead of transforming CNF to DNF we represent CNF in binary matrix M BM is treated as Boolean function F Complement of function F F is always unate function!

14 x3x4x1x2x3x4x1x M: Using Complement Theorem… 14.i 4.o 1.p end F = (x 1 + x 2 + x 4 ) (x 3 + x 4 ) (x 1 + x 2 )(x 1 + x 4 ) = f M = x 1 x 2 x 4 + x 3 x 4 + x 1 x 2 + x 1 x 4 (x 1 + x 2 )(x 1 + x 4 ) (x 3 + x 4 ) =(x 1 + x 2 x 4 ) (x 3 + x 4 ) = x 1 x 3 + x 2 x 4 + x 1 x 4 The same result! + x 3 x 4 = x 1 x 4 + x 1 x 2 Discernibility function

15 The key element 15 where is called the cofactor of F with respect to variable x j Fast Complementation Algorithm Recursive Complementation Theorem: The problem of complementing function F is transformed into the problem of finding the complements of two simpler cofactors F =

16 Unate Complementation The entire process reduces to three simple calculations:  The choice of the splitting variable  Calculation of Cofactors  Testing rules for termination 16 Matrix M Cofactor …. …. F = x j  F 0 + F 1  Merging Cofactor 1 Cofactor 0 Complement

17 An Example F = x 4 x 6 (x 1 + x 2 ) (x 3 + x 5 + x 7 )(x 2 + x 3 )(x 2 + x 7 ).i 7.o 1.p end 17

18 An Example… 18 x 2, x 3, x 2, x 5, x 2, x 7, x 1, x 3, x 7 Reducts: {x 1,x 3,x 4,x 6,x 7 } {x 2,x 3,x 4,x 6 } {x 2,x 4,x 5,x 6 } {x 2,x 4,x 6,x 7 }

19 Verification (x 1 + x 2 ) (x 3 + x 5 + x 7 ) (x 2 + x 3 ) (x 2 + x 7 ) = = (x 2 +x 1 )(x 2 + x 3 )(x 2 + x 7 )(x 3 + x 5 + x 7 ) = =(x 2 +x 1 x 3 x 7 ) (x 3 + x 5 + x 7 ) = = x 2 x 3 + x 2 x 5 +x 2 x 7 + x 1 x 3 x 7 {x 2,x 3,x 4,x 6 } {x 2,x 4,x 5,x 6 } {x 2,x 4,x 6,x 7 }  {x 4,x 6 } {x 1,x 3,x 4,x 6,x 7 } The same set of reducts! 19 Calculating reducts using the standard method:

20 20 Boolean function KAZ.type fr.i 21.o 1.p end All solutions : With the smallest number of arguments: 35, With the minimum number of arguments : 5574 Computation time: RSES = 70 min. Proposed method = 234 ms times faster !

21 21 The new method reduces the computation time a couple of orders of magnitude Conclusion How this acceleration will affect the speed of calculations for typical databases?

22 Experimental results databaseattr.inst. RSES/ ROSETTA compl. method reducts compl. method (least) reducts house172321s187ms4 171ms1 (8 attr) breast-cancer -wisconsin s823ms27 826ms24 (5 attr) KAZ223170min234ms ms35 (5 attr) trains3310 out of memory (5h 38min) 6ms689 1ms1 (1 attr) agaricus-lepiota -mushroom min4m 47s507 4m 51s3 (4 attr) urology36500 out of memory (12h) 42s 741ms s 499ms1 (2 attr) audiology71200 out of memory (1h 17min) 14s 508ms ms1 (1attr) dermatology35366 out of memory (3h 27min) 3m 32s s 474ms27 (6 attr) lung-cancer5732 out of memory (5h 20min) 111h 57m ms613 (4 attr) 22 The absolute triumph of the complementation method!

23 Further possibilities… of application of logic synthesis methods in issues of Data Mining 23 generalization of decision rules hierarchical decision making minimization of the Boolean function Functional decomposition

24 24 RSES vs Espresso RSES.i 7.o 1.type fr.p e ESPRESSO TABLE extlbis ATTRIBUTES 8 x1 numeric 0 x2 numeric 0 x3 numeric 0 x4 numeric 0 x5 numeric 0 x6 numeric 0 x7 numeric 0 x8 numeric 0 OBJECTS (x1=1)&(x5=1)&(x6=1)&(x2=1)=>(x8=0) (x1=1)&(x2=0)&(x5=1)&(x3=0)&(x4=0)&(x6=0)=>(x8=0) (x4=0)&(x1=1)&(x2=0)&(x7=0)=>(x8=1) (x2=1)&(x4=0)&(x5=1)&(x6=0)=>(x8=1)

25 Hierarchical decision making BA DT (G) DT (H) final decision attributes decision table intermediate decision decision attributes 25 F = H(A,G(B))  G  P(B) P(A)   G  P D Is it possible to use the decomposition to solve difficult tasks of data mining? Decomposition

26 Data compression 26 democrat n y y n y y n n n n n n y y y y republican n y n y y y n n n n n y y y n y democrat y y y n n n y y y n y n n n y y democrat y y y n n n y y y n n n n n y y democrat y n y n n n y y y y n n n n y y democrat y n y n n n y y y n y n n n y y democrat y y y n n n y y y n y n n n y y republican y n n y y n y y y n n y y y n y ……………………… democrat y y y n n n y y y n y n n n y y republican y y n y y y n n n n n n y y n y republican n y n y y y n n n y n y y y n n democrat y n y n n n y y y y y n y n y y democrat y n y n n n y y y n n n n n n y democrat y n y n n n y y y n n n n n y y G H Decomposition 68% space reduction The data set HOUSE (1984 United States Congressional Voting Records Database) Decomposition

27 27 Summary Typical logic synthesis algorithms and methods are effectively applicable to seemingly different modern problems of data mining Also, it is important to study theoretical foundations of new concepts in data mining e.g. functional decomposition Solving these challenges requires the cooperation of specialists from different fields of knowledge

28 References 1.Abdullah, S., Golafshan, L., Mohd Zakree Ahmad Nazri: Re-heat simulated annealing algorithm for rough set attribute reduction. International Journal of the Physical Sciences 6(8), 2083–2089 (2011) 2.Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. In: Advances in KDD, pp. 307–328. AAAI,Menlo Park (1996) 3.An, A., Shan, N., Chan, C., Cercone, N. and Ziarko, W. Discovering rules for water demand prediction: an enhanced rough-set approach, Engineering Application and Articial Intelligence, 9, , Bazan, J., Nguyen, H.S., Nguyen, S.H., Synak, P., Wróblewski, J.: Rough set algorithms in classification problem. In: Rough Set Methods and Applications: New Developments in Knowledge Discovery in Information Systems, vol. 56, pp. 49–88. Physica-Verlag, Heidelberg (2000) 5.Bazan, J., Skowron, A., Synak, P.: Dynamic Reducts as a Tool for Extracting Laws from Decision Tables. In: Ra´s, Z.W., Zemankova, M. (eds.) ISMIS LNCS (LNAI), vol. 869, pp. 346–355. Springer, Heidelberg (1994) 6.Bazan, J.G., Szczuka, M.S.: RSES and RSESlib - A Collection of Tools for Rough Set Computations. In: Rough Sets and Current Trends in Computing, pp. 106–113 (2000) 7.Bazan, J.G., Nguyen, H.S., Nguyen, S.H., Synak, P. and Wroblewski, J. Rough set algorithms in classi¯cation problem, in: Polkowski, L., Tsumoto, S. and Lin, T.Y. (Eds.), Rough Set Methods and Applications, 49-88, Information Sciences, 178(17), , Elsevier B.V., Beynon, M. Reducts within the variable precision rough sets model: a further investigation, European Journal of Operational Research, 134, , Borowik, G., Łuba, T., Zydek, D.: Features Reduction using logic minimization techniques. In: Intl. Journal of Electronics and Telecommunications, vol. 58, No.1, pp , (2012) 10.Brayton, R.K., Hachtel, G.D., McMullen, C.T., Sangiovanni-Vincentelli, A.: Logic Minimization Algorithms for VLSI Synthesis. Kluwer Academic Publishers (1984) 11.Brzozowski, J.A., Łuba, T.: Decomposition of boolean functions specified by cubes. Journal of Multi-Valued Logic & Soft Computing 9, 377–417 (2003) 12.Dash, R., Dash, R., Mishra, D.: A hybridized rough-PCA approach of attribute reduction for high dimensional data set. European Journal of Scientific Research 44(1), 29–38 (2010) 13.Feixiang, Z., Yingjun, Z., Li, Z.: An efficient attribute reduction in decision information systems. In: International Conference on Computer Science and Software Engineering. pp. 466–469. Wuhan, Hubei (2008), DOI: /CSSE Grzenda, M.: Prediction-Oriented Dimensionality Reducition of Industrial Data Sets. in: Modern Approaches in Applied Inteligence, Mehrotra, K.G.; Mohan, C.K.; Oh, J.C.; Varshney, P.K.; Ali, M. (Eds.), LNAI 6703, (2011) 15.Hedar, A.R., Wang, J., Fukushima, M.: Tabu search for attribute reduction in rough set theory. Journal of Soft Computing – A Fusion of Foundations, Methodologies and Applications 12(9), 909–918 (Apr 2008), DOI: /s Herbert, J.P. and Yao, J.T. Rough set model selection for practical decision making, Proceedings of the 4th International Conference on Fuzzy Systems and Knowledge Discovery, , Huhtala, Y., Karkkainen, J., Porkka, P., Toivonen, H.: TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies. The Computer Journal 42(2), 100–111 (1999) 18.Inuiguch, M. Several approaches to attribute reduction in variable precision rough set model, Modeling Decisions for Arti¯cial Intelligence, , Jelonek, J., Krawiec, K., Stefanowski, J.: Comparative study of feature subset selection techniques for machine learning tasks. In: Proceedings of IIS, Malbork, Poland, pp. 68–77 (1998) 20.Jensen R., Shen Q. Semantics-preserving dimensionality reduction: Rough and fuzzy rough-based approaches. IEEE Transactions on Knowledge and Data Engineering, vol. 16, pp. 1457–1471, (2004) 21.Jing, S., She, K.: Heterogeneous attribute reduction in noisy system based on a generalized neighborhood rough sets model. World Academy of Science, Engineering and Technology 75, 1067–1072 (2011) 22.Kalyani, P., Karnan, M.: A new implementation of attribute reduction using quick relative reduct algorithm. International Journal of Internet Computing 1(1), 99–102 (2011) 23.Katzberg, J.D. and Ziarko, W. Variable precision rough sets with asymmetric bounds, in: W. Ziarko (Ed.) Rough Sets, Fuzzy Sets and Knowledge Discovery, Springer, London, , Kryszkiewicz, M., Cicho´n, K.: Towards scalable algorithms for discovering rough set reducts. In: Peters, J., Skowron, A., Grzyma la-Busse, J., Kostek, B., Świniarski, R., Szczuka, M. (eds.) Transactions on Rough Sets I, Lecture Notes in Computer Science, vol. 3100, pp. 120–143. Springer Berlin / Heidelberg, Berlin (2004), DOI: / Pawlak, Z. and Skowron, A. Rudiments of rough sets, Information Sciences, 177, 3-27, Pei, X., Wang, Y.: An approximate approach to attribute reduction. International Journal of Information Technology 12(4), 128–135 (2006) 42.Rawski, M., Borowik, G., Łuba, T., Tomaszewicz, P., Falkowski, B.J.: Logic synthesis strategy for FPGAs with embedded memory blocks. Electrical Review 86(11a), 94–101 (2010) 43.Shan, N., Ziarko, W., Hamilton, H.J., Cercone, N.: Discovering Classification Knowledge in Databases Using Rough Sets. In: Proceedings of KDD, pp. 271–274 (1996) 44.Skowron, A.: Boolean Reasoning for Decision Rules Generation. In: Komorowski, J., Ra´s, Z.W. (eds.) ISMIS LNCS, vol. 689, pp. 295–305. Springer, Heidelberg (1993) 45.Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Słowiński, R. (ed.) Intelligent Decision Support – Handbook of Application and Advances of the Rough Sets Theory. Kluwer Academic Publishers (1992) 46.Slezak, D.: Approximate Reducts in Decision Tables. In: Proceedings of IPMU, Granada, Spain, vol. 3, pp. 1159–1164 (1996) 47.Slezak, D.: Searching for Frequential Reducts in Decision Tables with Uncertain Objects. In: Polkowski, L., Skowron, A. (eds.) RSCTC LNCS, vol. 1424, pp. 52–59. Springer, Heidelberg (1998) 48.Slezak, D.: Association Reducts: Complexity and Heuristics. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., S_lowi´nski, R. (eds.) RSCTC LNCS, vol. 4259, pp. 157–164. Springer, Heidelberg (2006) 49.Slezak, D. and Ziarko, W. Attribute reduction in the Bayesian version of variable precision rough set model, Electronic Notes in Theoretical Computer Science, 82, , Słowinski, R. (ed.): Intelligent Decision Support, Handbook of Applications and Advances of the Rough Sets Theory, vol. 11. Kluwer Academic Publishers, Dordrecht (1992) 51.Słowiński, K., Sharif, E.: Rough Sets Analysis of Experience in Surgical Practice. International Workshop: Rough Sets: State of The Art and Perspectives, Poznan-Kiekrz (1992) 52.Stepaniuk, J.: Approximation Spaces, Reducts and Representatives. In: Rough Sets in Data Mining and Knowledge Discovery. Springer, Berlin (1998) 53.Swiniarski, R.W. Rough sets methods in feature reduction and classi¯cation, International Journal of Applied Mathematics and Computer Science, 11, , Swiniarski, R.W. and Skowron, A. Rough set methods in feature selection and recognition, Pattern Recognition Letters, 24, , Wang, C., Ou, F.: An attribute reduction algorithm based on conditional entropy and frequency of attributes. In: Proceedings of the 2008 International Conference on Intelligent Computation Technology and Automation. ICICTA ’08, vol. 1, pp. 752–756. IEEE Computer Society, Washington, DC, USA (2008), DOI: /ICICTA Wang, G., Yu, H. and Yang, D. Decision table reduction based on conditional information entropy, Chinese Journal of Computers, 25, , Wang, G.Y., Zhao, J. and Wu, J. A comparitive study of algebra viewpoint and information viewpoint in attribute reduction, Foundamenta Informaticae, 68, 1-13, Wróblewski, J.: Finding Minimal Reducts Using Genetic Algorithms. In: Proceedings of JCIS,Wrightsville Beach, NC, September/October 1995, pp. 186–189 (1995) 59.Wu, W.Z., Zhang, M., Li, H.Z. and Mi, J.S. Knowledge reduction in random information systems via Dempster-Shafer theory of evidence, Information Sciences, 174, , Yao, Y., Zhao, Y.: Attribute reduction in decision-theoretic rough set models. Information Sciences 178(17), 3356–3373 (2008), DOI: /j.ins Zhang, W.X., Mi, J.S. and Wu, W.Z. Knowledge reduction in inconsistent information systems, Chinese Journal of Computers, 1, 12-18, Zhao, Y., Luo, F., Wong, S.K.M. and Yao, Y.Y. A general definition of an attribute reduction, Proceedings of the Second Rough Sets and Knowledge Technology, , ROSE2 – Rough Sets Data Explorer, rose.htmlhttp://idss.cs.put.poznan.pl/site/ 64.ROSETTA – A Rough Set Toolkit for Analysis of Data, tools/rosetta/http://www.lcb.uu.se/ 65.RSES – Rough Set Exploration System, 1.ROSETTA – A Rough Set Toolkit for Analysis of Data, tools/rosetta/ 2.RSES – Rough Set Exploration System, 3.Borowik, G., Łuba, T., Zydek, D.: Features Reduction using logic minimization techniques. In: Intl. Journal of Electronics and Telecommunications, vol. 58, No.1, pp , (2012) 4.Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Słowiński, R. (ed.) Intelligent Decision Support – Handbook of Application and Advances of the Rough Sets Theory. Kluwer Academic Publishers (1992) 5.Tadeusiewicz R.: Rola technologii cyfrowych w komunikacji społecznej oraz w kulturze i edukacji. PPT presentation.


Download ppt "Tadeusz Łuba Institute of Telecommunications Methods of Logic Synthesis and Their Application in Data Mining Prezentacja wygłaszana na KNU Daegu (Korea),"

Similar presentations


Ads by Google