2/5/98UCLA Data Mining Short Course1 Pattern Evaluation and Process Control Wei-Min Shen Information Sciences Institute University of Southern California.

2/5/98UCLA Data Mining Short Course1 Pattern Evaluation and Process Control Wei-Min Shen Information Sciences Institute University of Southern California

2/5/98UCLA Data Mining Short Course2 Outline Intuition of Interestingness Principles for Measuring Interestingness Existing Measurement Systems Minimal Description Length Principle Methods for Process Control

2/5/98UCLA Data Mining Short Course3 Why Is a Pattern “Interesting”? I did not know X before It contradicts my thinking (surprise) It is supported by the majority of the data It is an exception of the usual cases Occam’s Razor: Simple is better More?

2/5/98UCLA Data Mining Short Course4 The Types of Classification Rule Let h be a hypothesis and e the evidence, then respect to any given tuple, we have –Characteristic rule: h  e –Discriminate rule: e  h e and h can be interpreted as sets of tuples satisfying e and h respectively

2/5/98UCLA Data Mining Short Course5 A Few Definitions Given a discriminate rule R: e  h –|e| is the cover of the rule –|h  e|/|e| is the confidence, reliability, or certainty factor of the rule R is “X% complete”: if |h  e|/|h| = X% (e satisfies X% of |h|) R is “Y% discriminate”: if |¬h  e|/|¬h| = (100-Y)% (e satisfies (100-Y)% of |¬h|)

2/5/98UCLA Data Mining Short Course6 Principles for Measuring “I” 1. I = 0 if h and e are statistically independent –e and h have no relation at all 2. I  monotonically with |h  e| when |h|, |¬h|, and |e| remain the same –I relates to reliability

2/5/98UCLA Data Mining Short Course7 Principles for Measuring “I” 3. I  monotonically with |h| (or |e|) when |h  e|, |e| (or |h|), and |¬h| remain the same –I relates to completeness 4. I  monotonically with |e| when reliability |h  e|/|e|, |h|, and |¬h| remain the same –I relates to cover when reliability is the same

2/5/98UCLA Data Mining Short Course8 Treat Discriminate and Characteristic Rules Differently Principles 1,2,3,4 apply to both discriminate and characteristic rules 5. Treat discriminate and characteristic rules differently –RuleEHDiscrimComplete –A FeverFlu80%30% –BSneezeFlu30%80% As discriminate rule I(A) > I(B) As characteristic rule I(B) > I(A)

2/5/98UCLA Data Mining Short Course9 Existing Measurement Systems –RI (Piatetsky-Shapiro 91) –J (Smyth and Goodman 92) –CE (Hong and Mao 91) –IC++ (Kamber and Shinghal 96)

2/5/98UCLA Data Mining Short Course10 IC++ Measurement for Characteristic Rules Given h, e, let rule d: e  h and rule c: h  e Nec(d) = P(¬e|h)/P(¬e|¬h) Suf(d) = P(e|h)/P(e|¬h) for h  e, C ++ = if 0  Nec(d)<1 then (1-Nec(d))*P(h), else 0. for h  ¬e, C +- = if 0  Suf(d)<1 then (1-Suf(d))*P(h), else 0. for ¬h  e, C -+ = if 0<Nec(d)<  then (1-1/Nec(d))*P(¬h), else 0. for ¬h  ¬e, C -- = if 0<Suf(d)<  then (1-1/Suf(d))*P(¬h), else 0.

2/5/98UCLA Data Mining Short Course11 Minimal Description Length Principle The goodness of a theory or hypothesis (H) relative to a set a data (D) is measured: –The sum of The length of H The length of explanation of D using H –Assuming both use the optimal coding schema

2/5/98UCLA Data Mining Short Course12 The Derivation of MDL Based on probability theory, the best hypothesis H with respect to D is: –the max of P(H)P(D|H) –or the max of logP(H) + logP(D|H) –or the min of -logP(H) - logP(D|H) Since the optimal encode of a set is related to the probability of the elements, so we have MDL –the min of |coding1(H)| + |coding2(D|H)|

2/5/98UCLA Data Mining Short Course13 An Illustration of MDL One line theory: explanation length = 294.9 Two line theory: explanation length = 298.7

2/5/98UCLA Data Mining Short Course14 Fit Points with Lines –Theory = lines (#,angle,length,center) –Explanation: for each point: –the line it belongs to –the position on the line –the distance to line –Notice that the current coding is (x,y) –It is different if we choose coding (r,theta)

2/5/98UCLA Data Mining Short Course15 Process Control The Goal: to predict future from past The Given: the past data sequence The methods: –Adaptive Control Theory –Chaotic theory –State Machines

2/5/98UCLA Data Mining Short Course16 Chaotic Theory The data sequence may appear chaotic The underlying model may be very simple Extreme sensitive to initial condition Difficult to make long term prediction Short term prediction is possible

2/5/98UCLA Data Mining Short Course17 An Example Chaotic Sequence s(k) 0.0 1.0 0.5 Time step k 20406080100 The simple logistic map model: s k+1 = a s k (1 - s k ), where a=4

2/5/98UCLA Data Mining Short Course18 Steps of Using Chaotic Theory Reconstruction of state space: –x k = [x k, x k- , …, x k-(m-1)  ] T –where  is a time delay, m is the embedding dimension Taken’s theorem:, one can always find an embedding dimension m  2[d]+1, where [d] is the integer part of the attractor’s dimension, to preserve the invariant measures Central task: chose m and 

2/5/98UCLA Data Mining Short Course19 State Machine Approach Identify the number of states by clustering all points in the sequence Construct a transition function by learning from the sequence

2/5/98UCLA Data Mining Short Course20 Construction & Synchronization Environment = (A, P, Q,  r) where |P|<|Q| Model = (A, P, S,  t) –Visibly equivalent –Perfect –Synchronized The Construction problem –when and how to construct new model states The Synchronization problem –how to determine which model state is current

2/5/98UCLA Data Mining Short Course21 Learning with a Reset Button Two environmental states p and q (they may appear the same to the learner) are different if and only if there exists a sequence e of actions that leads from p and q to states that are visibly different The interaction with the environment –Membership Query –Equivalence Query: “yes” or a counter example

2/5/98UCLA Data Mining Short Course22 Observation Table Model states: {row(s) : s in S} Initial state: row( ) Final state: {row(s) : s in S and T(s)=1 Transitions:  (row(s),a) = row(sa) Closed table:  s,a  s’ row(sa)=row(s’) Consistent table: row(s)=row(s’)  row(sa)=row(s’a) T: Observations E (experiments) States (actions from init state) Transitions S SxA

2/5/98UCLA Data Mining Short Course23 L* Algorithm Initialize T for and each action in A Loop Use membership queries to make T complete, closed, and consistent If EQ(T)=w /* an counter example */ then add w and all its prefixes into S; Until EQ(T)=yes.

2/5/98UCLA Data Mining Short Course24 The Little Prince Example A counter example ftf for M3 (Fig 5.3), the model ends at rose, but the real observation is volcano An inconsistency in T4 (Tab 5.5), where row(f)=row(ft), but row(ff)  row(ftf).

2/5/98UCLA Data Mining Short Course25 Homing Sequence L* is limited by a reset button Homing Sequence h: if two observation sequences of executing h are the same, then these two sequences lead to the same state Let q be observation sequence, and qh the ending state, then h is defined as for all p, q: [p =q ]  [ph=qh] e.g., {fwd} is a homing seq for Little Prince

2/5/98UCLA Data Mining Short Course26 Properties of Homing Seq Every FDA has a homing sequence Can be constructed from a FDA by appending actions (<n) that distinguish a pair of states The length of this construction is n 2 There are FDA whose shortest h is n 2 long h can be used as a reset h cannot guarantee go to a fixed state

2/5/98UCLA Data Mining Short Course27 L* with a Homing Sequence h Every time a reset is needed, repeat h until you see the desired observation sequence Or for each possible observation sequence of h, make a copy of L* (see Fig 5.6)

2/5/98UCLA Data Mining Short Course28 Learning the Homing Sequence If h is not a homing sequence, then we may discover that the same observation sequence  produced by executing h may lead us to two different states, p and q, for there is a sequence of actions x that p  q then, a better approximation of homing sequence is hx

2/5/98UCLA Data Mining Short Course29 L* + Learning h Assume a homing sequence h, initially h= When h is shown to be incorrect, extend h, and discard all copies of L* and start again When h is incorrect, then there exists x such that qh  ph, even if q =p

2/5/98UCLA Data Mining Short Course30 Learning h and the Model Revist and Shapire’s algorithm (Fig 5.7) Little Prince Example (notice the inconsistency produced by ff in Fig 5.10)

2/5/98UCLA Data Mining Short Course1 Pattern Evaluation and Process Control Wei-Min Shen Information Sciences Institute University of Southern California.

Similar presentations

Presentation on theme: "2/5/98UCLA Data Mining Short Course1 Pattern Evaluation and Process Control Wei-Min Shen Information Sciences Institute University of Southern California."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

2/5/98UCLA Data Mining Short Course1 Pattern Evaluation and Process Control Wei-Min Shen Information Sciences Institute University of Southern California.

Similar presentations

Presentation on theme: "2/5/98UCLA Data Mining Short Course1 Pattern Evaluation and Process Control Wei-Min Shen Information Sciences Institute University of Southern California."— Presentation transcript:

Similar presentations

About project

Feedback