Presentation is loading. Please wait.

Presentation is loading. Please wait.

Notes from 02_CAINE conference

Similar presentations


Presentation on theme: "Notes from 02_CAINE conference"— Presentation transcript:

1 Notes from 02_CAINE conference
Some data classification techniques are: Decision Tree Induction Bayesian Neural Networks K-Nearest Neighbor Case Based Reasoning Genetic Algorithm rough sets fuzzy logic techniques When using P-trees, are they all essentially the same?

2 Bayesian Classifier Based on Bayes Theorem: Pr ( X | C ) * Pr ( C ) i
= i Pr ( X ) Pr(Ci | X) is the posterior probability Pr(Ci) is the prior probability Can find conditional probabilities, Pr(X|Ci). Classify X with Max Pr(Ci | X) Pr(X) is independent of class, therefore, maximize Pr(X|Ci) * Pr(Ci). Use naïve assumption? Pr(X | Ci ) = Pr( X1 | Ci ) × Pr( X2 | Ci )… × Pr( Xn | Ci )

3 Calculating Probabilities Pr(X|Ci)
Traditional Bayesian classification calculates Pr(X|Ci) and Pr(Ci) for each X, by doing a training database (TDB) scan. If (X,Ci) exists in the training DB, then Pr(X | Ci ) is estimated = cnt(X,Ci)/cnt(Ci). But this requires a TDB scan to determine. If cnt(X,Ci)=0 for all I, then one can choose the default, else use the naïve assumption: cnt(X1___,Ci)/cnt(Ci) × cnt(_X2__,Ci)/cnt(Ci) × … × cnt(___Xn,Ci)/cnt(Ci). The use of Bayesian belief networks simply replaces the the naïve assumption with other methods of estimating Pr(X | Ci ) (e.g., domain knowledge). All probabilities above involve counts. They can all be computed using P-trees: Pr(X|Ci)*Pr(Ci) = {RC[P1(X1)^P2(X2)^…^Pn(Xn)^PC(Ci)]/ RC[PC(Ci)]}*RC[PC(Ci)] / cntTDB Pr(X|Ci)*Pr(Ci) = RC[P1(X1)^P2(X2)^…^Pn(Xn)^PC(Ci)]/ cntTDB Problem ? : if RC[ P1(X1) ^ P2(X2) ^ … ^Pn(Xn) ^ PC(Ci) ] = 0 for all i i.e unclassified pattern does not exist in the training set.

4 Band-based-P-tree Approach
When all RC = 0 for given pattern Reduce the restrictiveness of the pattern Removing the attribute with least information gain Calculate (assume attribute 2 has the least IG) Pr( X | Ci )*Pr( Ci ) = RC[ P1X1 ^ P3X3 ^ … ^ PnXn ^ PCCi ] / cnt[TDB] Can calculation of information gains Using P-trees 1 time calculation for the entire training data

5 Bit-based Approach Search for similar patterns by removing the least significant bits in the attribute space. The order of the bits to be removed is selected by calculating the info gain (IG). E.g., Calculate the Bayesian conditional probability value for the pattern [G,R] = [10,01] in 2-attribute space. Assume IG for 1st significant bit of R < that of G. Assume IG for 2nd significant bit of G < that of R. Initially, search for the pattern, [10,01] (a). If not found, search for [1_,01] considering IG for the 2nd significant bit. Search space will increase (b). If not found, search for [1_,0_] considering IG for the 2nd significant bit. Search space will increase (c). If not found, search for [1_,_ _] considering IG for the 1st significant bit. Search space will increase (d). This is almost identical to KNN using HOBbit nbrhds! Seems to be very similar to DTI also! Is there really just 1 P-tree classifier? R 00 01 10 11 G R 00 01 10 11 G (a) (b) R R 11 11 10 10 01 01 00 00 00 01 10 11 G 00 01 10 11 G (c) (d) Idea: Use domain knowledge to weight feature attributes. Decide as above using weighted IG. Use GA’s to improve weights! Note: would want to consider all patterns, not just the four above

6 Bit-based Approach For [G,R] = [10,01] the nine ways of ignoring bits (9 hobbit neighborhoods) Of course Hobbit neighborhoods can be replaced by Lp-neighborhoods at some cost in complexity but some benefit wrt accuracy (using the OR to get Lp-nbhds). R 00 01 10 11 G R 00 01 10 11 G R R R 11 11 11 10 10 10 01 01 01 00 00 00 00 01 10 11 G 00 01 10 11 G 00 01 10 11 [10,01] [1_,01] [10,0_] [_ _,01] [10,_ _] R R R R 11 11 11 11 10 10 10 10 01 01 01 01 00 00 00 00 00 01 10 11 G 00 01 10 11 G 00 01 10 11 G 00 01 10 11 G [1_,0_ ] [1_,_ _] [_ _,0_] [_ _,_ _]

7 Rank order for moving windows
The element of rank r in a sequence is the rth smallest element. Rank order filtering widely used in signal and image processing (and voice) These are called order-statistic filters Good for removing shot noise from images while maintaining sharp edges. Excellent noise elimination and reduction properties Good for image enhancement and restoration Research project: r-rank order in 2-D using 2nx2n-trees or using 3nx3n-trees


Download ppt "Notes from 02_CAINE conference"

Similar presentations


Ads by Google