Presentation is loading. Please wait.

Presentation is loading. Please wait.

732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Constrained frequent itemset mining.

Similar presentations


Presentation on theme: "732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Constrained frequent itemset mining."— Presentation transcript:

1 732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña jospe@ida.liu.se Constrained frequent itemset mining

2  A constraint C(.) is  Monotone  If C(A) then C(B) for all A  B.  E.g. A’  A.  Antimonotone  If C(A) then C(B) for all B  A.  Or, if not C(B) then not C(A) for all B  A.  E.g. support ≥ min_support.  The apriori property applies to any antimonotone constraint. Constraints

3  sum(S.Price)  v is monotone (positive prices).  min(S.Price)  v is monotone.  range(S.Price)  15 is monotone.  Itemset ab satisfies C  So does every superset of ab ItemPrice a40 b0 c-20 d10 e-30 f30 g20 h-10 Constraints

4  sum(S.Price)  v is antimonotone (positive prices).  sum(S.Price)  v is not antimonotone.  range(S.Price)  15 is antimonotone.  Itemset ab violates C  So does every superset of ab ItemPrice a40 b0 c-20 d10 e-30 f30 g20 h-10 Constraints

5 ConstraintAntimonotoneMonotone v  S noyes S  V noyes S  V yesno min(S)  v noyes min(S)  v yesno max(S)  v yesno max(S)  v noyes count(S)  v yesno count(S)  v noyes sum(S)  v ( a  S, a  0 ) yesno sum(S)  v ( a  S, a  0 ) noyes range(S)  v yesno range(S)  v noyes avg(S)  v,   { , ,  } No but convertible support(S)   yesno support(S)   noyes Constraints

6 Apriori algorithm + any constraint Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3 Constraint: Sum{S.price} < 5, where item price equals item id

7 Apriori algorithm + antimonotone constraint Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3 Constraint: Sum{S.price} < 5, where item price equals item id Prune search space

8 Apriori algorithm + monotone constraint Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3 Constraint: Sum{S.price} ≥ 5, where item price equals item id ☺ ☺ ☺ ☺ Does not prune search space but avoids constraint checking Not in the output, since they don’t satisfy the constraint

9 FP grow algorithm + antimonotone constraint Similar in Apriori (prune search space) Specific of FP grow (avoids constraint check)

10  If C(α) then do not check C(.) in TDB| α FP grow algorithm + monotone constraint

11  avg(S.Price)  v and avg(S.Price) ≥ v are neither monotone nor antimonotone.  Convertible monotone  If there exists an item order R such that  If C(A) then C(B) for all A and B respecting R such that A is a suffix of B.  E.g. avg(S.Price) ≥ v wrt decreasing price order.  Convertible antimonotone  If there exists an item order R such that  If C(A) then C(B) for all A and B respecting R such that B is a suffix of A.  Or, if not C(B) then not C(A) for all A and B respecting R such that B is a suffix of A.  E.g. avg(S.Price) ≥ v wrt to increasing price order. Constraints

12  avg(X)  25 is convertible monotone wrt descending item price order R:  If an itemset d satisfies a constraint C, so do itemsets fd and afd, which have d as a suffix.  avg(X)  25 is convertible antimonotone wrt ascending item price item order R -1 :  If an itemset dfa satisfies a constraint C, so do itemsets fa and a, which are suffixes of dfa.  Thus, avg(X)  25 is strongly convertible.  Check that avg(X)  25 is also strongly convertible. Constraints

13 Constraint Convertible antimonotone Convertible monotone Strongly convertible avg(S) ,  v Yes median(S) ,  v Yes sum(S)  v (items could be of any value, v  0) YesNo sum(S)  v (items could be of any value, v  0) NoYesNo sum(S)  v (items could be of any value, v  0) NoYesNo sum(S)  v (items could be of any value, v  0) YesNo ……

14 Constraints Convertible antimonotone Convertible monotone Strongly convertible Inconvertible Antimonotone Monotone avg(S)-median(S)=0

15  Instead of ordering the items according to decreasing frequency, now the items are ordered according to the order R of the constraint. FP grow algorithm + convertible antimonotone constraint False: Such items can appear not only as suffix. False: No check is needed for those itemsets that are a suffix of α U β. The check is needed for the rest of items. True: α will be added as suffix to any itemset derived from TDB|α and the result respects R.

16  With monotone constraint  If C(α) then do not check C(.) in TDB| α  With convertible monotone constraint  Instead of ordering the items according to decreasing frequency, now the items are ordered according to the order R of the constraint.  If C(α) then do not check C(.) in TDB| α because α will be added as suffix to any itemset derived from TDB| α and the result respects R. FP grow algorithm + convertible monotone constraint

17  How would you incorporate covertible constraints in the Apriori algorithm ? Exercise

18


Download ppt "732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Constrained frequent itemset mining."

Similar presentations


Ads by Google