Presentation is loading. Please wait.

Presentation is loading. Please wait.

PROBABILISTIC AND LOGIC APPROACHES TO MACHINE LEARNING AND DATA MINING

Similar presentations


Presentation on theme: "PROBABILISTIC AND LOGIC APPROACHES TO MACHINE LEARNING AND DATA MINING"— Presentation transcript:

1 PROBABILISTIC AND LOGIC APPROACHES TO MACHINE LEARNING AND DATA MINING
Marek Perkowski Portland State University

2 Essence of logic synthesis approach to learning

3 Example of Logical Synthesis
John Dave Mark Jim Alan Nick Mate Robert

4 Good guys Bad guys A - size of hair B - size of nose C - size of beard
Dave Jim John Mark Good guys Alan Nick Mate Robert Bad guys A - size of hair C - size of beard D - color of eyes B - size of nose

5 - 1 – 00 01 11 10 Good guys A - size of hair B - size of nose
Mark John Dave Jim A’ B’CD A’ B’CD A’ BCD A’ BCD’ C - size of beard D - color of eyes A - size of hair B - size of nose 00 01 11 10 - 1 AB CD

6 - 1 – 00 01 11 10 Bad guys A - size of hair B - size of nose
Alan Nick Mate Robert Bad guys A’ B’C’D ABCD A’ BC’D’ AB’C’D C - size of beard D - color of eyes A - size of hair B - size of nose 00 01 11 10 - 1 AB CD A’C

7 - 1 – 00 01 11 10 Generalization 2: Generalization 1:
Bald guys with beards are good Generalization 2: All other guys are no good C - size of beard D - color of eyes A - size of hair B - size of nose 00 01 11 10 - 1 AB CD A’C

8 SOP (DNF) approach to learning

9 Sum of Products AND gates, followed by an OR gate that produces the output. (Also, use Inverters as needed.) There are many algorithms to minimize SOP They were created either in ML community ot Logic Synthesis community. We will illustrate three different algorithms.

10 SOP minimization based on graph coloring
Method 1 SOP minimization based on graph coloring

11 Reduction of SOP (DNF) Machine Learning to graph coloring
SOP through Graph Coloring In previous example there were 4 binary variables. Here there are two variables , each with 4 values. We encode every group or minterm using the encoding as in the right We check for every two groups if they can be combined. If they can be combined, the combined group does not cover zeros. If the combined group covers zeros, the groups cannot be combined. Let us try to combine a1 and b1. We do bitwise OR. The combined group does not cover zeros.

12 Reduction of SOP (DNF) Machine Learning to graph coloring
SOP through Graph Coloring Let us try to combine a2 and b2. We do bitwise OR. The combined group covers zeros. So groups a2 and b2 are not compatible. For every incompatible nodes in the graph there is an edge.

13 Reduction of SOP (DNF) Machine Learning to graph coloring
SOP through Graph Coloring Based on incompatibility of groups we create the INCOMPATIBILITY GRAPH. Every two incompatible nodes (for incompatible groups) there is an edge. We color graph with the minimum number of colors. The minimum number of colors is called the chromatic number. We combine the nodes that have the same color.

14 The minimum coloring corresponds to the minimum number of combined groups in the final solution.
These groups are usually products, but they may be also of the form PRODUCT1 * (PRODUCT2)’

15 SOP minimization based on set covering with primes
Method 2 SOP minimization based on set covering with primes

16 SOP through Set Covering
Find all prime implicants of the function. Create a table with columns being true minterms and rows being prime implicants. This is called the covering problem. You want to find the smallest subset of rows that covers all columns There are many algorithms for this problem. Some use BDDs, some SAT, some matrices. The same method can be used for Boolean minimization, test generation to cover all faults with minimum number of tests and to select best position of robots guarding a building from terrorists.

17 Columns correspond to minterms with value 1
SOP through Set Covering T0 T1 T2 T3 Rows correspond to prime implicants T0 and T2 is not a solution because column b0 is not covered. T0, T2 and T3 is a solution.

18 Method 3 SOP minimization based on set sequential finding of secondary essential primes

19 Machine Learning SOP through sequential finding of essential and secondary essential primes
1. Find essential primes Essential prime 00 01 11 10 1 - AB CD Essential prime

20 Machine Learning SOP through sequential finding of essential and secondary essential primes
2. remove essential primes Secondary Essential prime 00 01 11 10 - 1 AB CD orange is redundant prime Yellow is redundant prime

21 - 1 00 01 11 10 3. ITERATE Essential prime Secondary essential prime
1 AB CD Secondary essential prime ESSENTIAL prime The solution are essential primes and secondary essential primes of all levels. If algorithm does not terminate, make random choice and iterate OR use another algorithm.

22 Multivalued relations approach to learning

23 Short Introduction: multiple-valued logic
Signals can have values from some set, for instance {0,1,2}, or {0,1,2,3} {0,1} - binary logic (a special case) {0,1,2} - a ternary logic {0,1,2,3} - a quaternary logic, etc 1 MIN MAX Minimal value 1 2 2 2 3 Maximal value 3 3

24 Functional Decomposition
Evaluates the data function and attempts to decompose into simpler functions. F(X) = H( G(B), A ), X = A  B X A - free set B - bound set if A  B = , it is disjoint decomposition if A  B  , it is non-disjoint decomposition

25 Pros and cons In generating the final combinational network, BDD decomposition, based on multiplexers, and SOP decomposition, trade flexibility in circuit topology for time efficiency Generalized functional decomposition sacrifices speed for a higher likelihood of minimizing the complexity of the final network 6/24/2018

26 A Standard Map of function ‘z’
Bound Set a b \ c Columns 0 and 1 and columns 0 and 2 are compatible column compatibility = 2 Free Set z

27 Principle of finding patterns
We have a tabular representation of data We want to find patterns In this case we are looking for patterns in columns. Columns have the same pattern if the symbols in each row can be combined. We say that these columns are COMPATIBLE. If in one row we have 0 and 0 , 1 and 1, 0 and -, 1 and - , or – and – then the columns are compatible. If we have a 0 and a relation 0,1 then the columns are compatible as one can select 0 from the choice of 0,1.

28 Decomposition of Multi-Valued Relations
F(X) = H( G(B), A ), X = A  B A X Relation Relation Relation B if A  B = , it is disjoint decomposition if A  B  , it is non-disjoint decomposition

29 Forming a CCG from a K-Map
Bound Set Free Set a b \ c Columns 0 and 1 and columns 0 and 2 are compatible column compatibility index = 2 C1 C2 C0 Column Compatibility Graph z

30 Column Incompatibility Graph
Forming a CIG from a K-Map z a b \ c Columns 1 and 2 are incompatible chromatic number = 2 C1 C2 C0 Column Incompatibility Graph

31 Column Compatibility Graph Column Incompatibility Graph
CCG and CIG are complementary Graph coloring graph multi-coloring Maximal clique covering clique partitioning C1 C2 C0 C1 C2 C0 Column Compatibility Graph Column Incompatibility Graph

32 clique partitioning example.

33 Maximal clique covering example.

34 g = a high pass filter whose acceptance threshold begins at
Map of relation G G \ c G \ c After induction From CIG g = a high pass filter whose acceptance threshold begins at c > 1

35 The Meaning of Attributes

36 Attributes Static Facial features Gestures Objects to grasp
Objects to avoid Symptoms of illness View of body cells Crystallization parameters of liquids Dynamic Changes in stock market Changes in facial features – facial gestures. Change of object’s view when robot approaches it. Dynamical change of body part in motion. Changes of moles on the skin. Changing symptoms of an illness.

37 Static p1 p2 p3 p4 p5 p6 p7 p8 Dynamic t0 p1 p2 p3 p4 p5 p6 p7 p8 t1 p1 p2 p3 p4 p5 p6 p7 p8 t2 p1 p2 p3 p4 p5 p6 p7 p8 Three vectors in time represented as one long vector for Machine Learning p1 p2 p3 p4 p5 p6 p7 p8 p1 p2 p3 p4 p5 p6 p7 p8 p1 p2 p3 p4 p5 p6 p7 p8 …… p1 p2 p3 p4 p5 p6 p7 p8 Attributes in time t0

38 Representation Models for Logic Based Machine Learning

39 Types of Logical Synthesis
Sum of Products Decision Trees Decision Diagrams Functional Decomposition The method we are using

40 Binary Decision Diagrams
There are many types of Decision Trees and many generalizations of them, used in logic and in ML

41 Decision Diagrams A Decision diagram breaks down a Karnaugh map into set of decision trees. A decision diagram ends when all of branches have a yes, no, or do not care solution. Example Karnaugh Map This diagram can become quite complex if the data is spread out as in the following example.

42 Decision Tree for Example Karnaugh Map

43 BDD Representation of function
1 00 01 11 10 1 - AB CD Incompletely specified function 6/24/2018

44 BDD Representation of function
1 00 01 11 10 1 CD AB Completely specified function The problem is how to find minimum tree or decision diagram for your given data 6/24/2018

45 Absolutely Minimum Background on Binary Decision Diagrams (BDD)
BDDs are based on recursive Shannon expansion F = x Fx + x’ Fx’ Compact data structure for Boolean logic can represents sets of objects (states) encoded as Boolean functions Canonical representation reduced ordered BDDs (ROBDD) are canonical essential for simulation, analysis, synthesis and verification

46 Other expansions, other trees, other diagrams.
The standard Decision Tree is based on Shannon Expansion. F = x Fx + x’ Fx’ This is the same concept as this F Shannon node for variable x x’ x All examples Fx’ Fx WIND WIND=WEAK WIND=STRONG Data Separation in ML is the same as Shannon Expansion in Logic All examples for which WIND was Weak All examples for which WIND was STRONG

47 Absolutely Minimum Background on Binary Decision Diagrams (BDD) and Kronecker Functional Decision Diagrams BDDs are based on recursive Shannon expansion F = x Fx + x’ Fx’ Compact data structure for Boolean logic can represents sets of objects (states) encoded as Boolean functions Canonical representation reduced ordered BDDs (ROBDD) are canonical essential for simulation, analysis, synthesis and verification Positive cofactor of F with respect to variable x Negative cofactor of F with respect to variable x

48 BDD Construction Typically done using APPLY operator Reduction rules
remove duplicate terminals merge duplicate nodes (isomorphic subgraphs) remove redundant nodes Redundant nodes: nodes with identical children a b f 1 b c

49 BDD Construction – your first BDD
Construction of a Reduced Ordered BDD 1 edge 0 edge f = ac + bc 1 a b c f a b c f Truth table Decision tree

50 BDD Construction – cont’d
1 a b c f f 1 a b c f = (a+b)c 1 a b c 1. Remove duplicate terminals 2. Merge duplicate nodes 3. Remove redundant nodes

51 ROBDD MULTIPLEXOR CIRCUIT MUX F a b c d 1 F a E T b c E T E T What is decision Diagram in Theory and ML is a logic circuit from multiplexers in logic design d d E E T T 1

52 Kronecker Decision Diagrams
There are many types of Decision Trees and many generalizations of them, used in logic and in ML Kronecker Decision Diagrams are generalization of BDDs

53 Decomposition types Decomposition types are associated to the variables in Xn with the help of a decomposition type list (DTL) d:=(d1,…,dn) where di { S, pD, nD}

54 KFDD IN KF trees and KFDD we can have any variable and any of the three expansions in any level Definition

55 F = a’b Å ac F = x Fx + x’ Fx’ a b c A’ f = Å Shannon cell Dipal cell
representation with reversible gates The nodes of KFDD or KF tree can be interpreted as logic gates as shown below 6/24/2018 F = a’b Å ac F = x Fx + x’ Fx’ Positive cofactor of F with respect to variable x Negative cofactor of F with respect to variable x

56 Three different reductions types
Type I : Each node in a DD is a candidate for the application g f g f This applies to BDD and KDD xi xi xi

57 Three different reductions types (cont’d)
xi This applies to BDD and KDD xj xj g f g f

58 Three different reductions types (cont’d)
Type D xi This is used in functional diagrams such as KFDDs xj xj f f g g

59 Example for OKFDD Shannon 1 x1 X1 * X2 *1 * X4 Shannon F=

60 Example for OKFDD (cont’d)
This diagram below explains the expansions from previous slide X1 S-node X2 pD-node X3 nD-node

61 How to combine trees with any other Classifiers

62 It can be Shannon Tree or KFDD or any tree
You have a tree on top. It can be Shannon Tree or KFDD or any tree Any cofactor of variables on top of tree ANY REMINDER LOGIC

63 Overview of data mining

64 What is Data Mining? Databases with millions of records and thousands of fields are now common in business, medicine, engineering, and the sciences. To extract useful information from such data sets is an important practical problem. Data Mining is the study of methods to find useful information from the database and use data to make predictions about the people or events the data was developed from. Classification for small error Understanding data (patterns, rules, statistics)

65 Some Examples of Data Mining
1) Stock Market Predictions 2) Large companies tracking sales 3) Military and intelligence applications

66 Data Mining in Epidemiology
Epidemiologists track the spread of infectious disease and try to determines the diseases original source Often times Epidemiologist only have an initial suspicions about what is causing an illness. They interview people to find out what those people that got sick have in common. Currently they have to sort through this data by hand to try and determine the initial source of the disease. A data mining application would speed up this process and allow them to quickly track the source of an infectious diseases

67 Types of Data Mining 1) Neural Nets
Data Mining applications use, among others, three methods to process data 1) Neural Nets 2) Statistical (Probabilistic) Analysis The method we will discuss here 3) Logical Methods The method we will discuss here

68 Understanding of the underlying area like medicine or finance
DATA MINING Formal mathematics and programming MACHINE LEARNING

69 Conclusions

70 Logic Synthesis and ML classifier design are similar
Logic Synthesis and ML classifier design are similar. In ML we have continuous or MV data and many don’t cares. Also we design for accuracy and interpretability. Occam Razor is similar SOP is a very good method, not recognized sufficiently by ML community. We shown three algorithms to deal with SOP. There are many more. The compatibility of columns is important to find patterns in data. We will use this method. It can be reduced to clique partitioning, clique covering and graph coloring. Decision Trees as shown in the previous lectures are only some particular examples of hierarchically decomposed structures Other structures are Decision Diagrams and Functional Decompositions. Kronecker Diagrams and Trees generalize Decision Diagrams and Trees.

71 Questions and Problems
What is the principle of logic-based representation of data for Machine Learning. Give example of learning representations based on logic. Static versus dynamic attributes. What is the difference of decision trees and decision diagrams? How to generalize Boolean concepts to MV concepts in Machine Learning? Binary versus ternary Ashenhurst Decomposition What is Data Mining and how it can be used. Give your own example of Data Mining using known to you Machine Learning methods. What are bound and free sets in decomposition? What is disjoint versus non-disjoint decomposition? Can decomposition be combined with other learning methods? How?

72 Questions and Problems
Explain Graph coloring approach to SOP. Explain essential prime approach to SOP Explain set covering (clique covering) approach to SOP. Compare Covering and coloring ideas. What is maximum Clique? What is maximum independent set? Apply Shannon Expansion to function from any Kmap in these slides. Select any variable you want. Apply Positive Davio Expansion to any function above. Apply Negative Davio Expansion to any function above. What is the difference of clique partitioning and clique covering. What is a relation of clique covering and SOP? What is the difference between Decision Tree and Decision Diagram? How to create a KFDD for function with don’t cares? DIFFICULT


Download ppt "PROBABILISTIC AND LOGIC APPROACHES TO MACHINE LEARNING AND DATA MINING"

Similar presentations


Ads by Google