Presentation is loading. Please wait.

Presentation is loading. Please wait.

Facilitating Interactive Mining of Global and Local Association Rules Abhishek Mukherji* Elke A. Rundensteiner Matthew O. Ward Department of Computer Science,

Similar presentations


Presentation on theme: "Facilitating Interactive Mining of Global and Local Association Rules Abhishek Mukherji* Elke A. Rundensteiner Matthew O. Ward Department of Computer Science,"— Presentation transcript:

1 Facilitating Interactive Mining of Global and Local Association Rules Abhishek Mukherji* Elke A. Rundensteiner Matthew O. Ward Department of Computer Science, Worcester Polytechnic Institute, MA, USA. *Samsung Research America, CA, USA. Xmdvtool is an open source multivariate visual analytics tool developed at WPI with a series of NSF grants over the past 20 years (http://sourceforge.net/projects/xmdvtool/). This PhD research work was partly supported by NSF under grants IIS , CCF and IIS

2 Era of Big Data …. And we are DRIVING! 11/03/ Where’s the Data in the Big Data Wave? Gerhard Weikum, Res. Director at Max Planck Institute, 2. Analytic DB Technology for the Data Enthusiast. Pat Hanrahan, Stanford & Tableau, SIGMOD‘12 Keynote Talk. Volume Veracity Variety Velocity

3 XmdvTool’s Efforts Towards This Paradigm Shift 11/03/20143 Visualize Static DataI. Visualize Stream & Sensor Data SNIFTool & FireStream *Di Yang et al., Interactive visual exploration of neighbor-based patterns in data streams, ACM SIGMOD’10 Demo. ViStream* II. Visualize Mined Results Visualize Data Records PARAS/FIRECOLARM

4 I. Stream & Sensor Data Processing 1. SNIFTool/FireStream: Discover Patterns in Live Stream [CIKM ’08, ICDE Demo ’07] 2. JAQPOT: High Velocity Streams MJoin Exec. [BNCOD ’11] Summary of Graduate Research Works 11/03/20144 CAPE*XMDVTool^ * ^http://davis.wpi.edu/xmdv/index.html III. Scalable Nugget-guided Hypothesis Testing 1. SPHINX: Evidence-Hypotheses Explor.[CIKM’13] 2. Iterative Multi-Evidence-Hypotheses Model II. Interactive Mining 1.PARAS /FIRE [VLDB’13, SIGMOD’13, CIKM’13] 2.COLARM [EDBT’14]

5 PARAS/FIRE: Interactive Visual Support for Parameter Space-Driven Mining of Global Rules [PVLDB 2013, SIGMOD 2013, CIKM 2013] Joint work with Xika Lin, Christopher Ryan Botaish, Jason Whitehouse, Elke A. Rundensteiner, Matthew O. Ward Department of Computer Science, Worcester Polytechnic Institute (WPI), MA, USA.

6 Association Rule Mining (ARM) Basics  and   Support = 40%, Confidence = 100% RecordIDAgeMarriedNumCars 10023No Yes No Yes Yes2 R. Agrawal and R. Srikant, Fast algorithms for mining association rules in large databases, VLDB’94. R. Srikant and R. Agrawal, Mining quantitative association rules in large relational tables, SIGMOD’96. 6 Which customers to target for multi-car discount promos? 11/03/2014

7 Motivation for Interactive Mining Data Miner (minsupp, minconf) {ARs}  Improve turnaround times of mining queries.  Provide parameter recommendations.  Preprocess data to enable fast interactive mining experience.  Unacceptably long response time.  Trial-and-error iterations.  Forced to rerun for each subset. Data Analyst C.C. Aggarwal and P.S. Yu, A new approach to online generation of association rules, IEEE TKDE’01. C. Hidber. Online Association Rule Mining, SIGMOD’99. B. Nag, P. M. Deshpande, and D. J. DeWitt, Using a knowledge cache for interactive discovery of association rules, SIGKDD’99. M. Kubat et al., Itemset trees for targeted association querying, IEEE TKDE’03. M. Kaya and R. Alhajj. Online mining of fuzzy multidimensional weighted association rules. Applied Intelligence’08. Limitations Research Goals 711/03/2014

8 XYZ XYZ {} XYXZYZ 100 II. Rule Generation I. Frequent Itemset Generation Offline Online Assumptions 1.Cost(Freq. Itemset Generation) >> Cost(Rule Generation), 2.Count(Itemsets) << Count(Rules). C.C. Aggarwal and P.S. Yu, A new approach to online generation of association rules, IEEE TKDE’01. M. Kubat et al., Itemset trees for targeted association querying, IEEE TKDE’03. M. Kaya and R. Alhajj. Online mining of fuzzy multidimensional weighted association rules. Applied Intelligence’08. ^ XYZ X The State-of-the-art in Online Rule Mining ^Cost per GB of RAM: $1000 (in 2000)  $25 (in 2012). 811/03/2014

9 XYZ XYZ {} XYXZYZ 100 Adjacency Lattice and Redundancy C.C. Aggarwal and P.S. Yu, A new approach to online generation of association rules, IEEE TKDE’01. Strict Redundancy [ (AUC) X  YZ ﬤ (AUC) X  Y ] 11/03/20149 Simple Redundancy [ (AUC) = XYZ, ( A) X  YZ (A) XY  Z ] ∩ XYZ X  Starting with maximal ancestors of XYZ, i.e., X, Y and Z. If (X  YZ qualifies) Then skip XY and XZ as antecedent (simple) or consequent (strict).

10 Research Challenges PARAS: Preprocessing/Computational aspects 11/03/ Instead of the itemsets, can we pre-store association rules to altogether avoid the online rule generation step? 2. Instead of the itemset index, can we have a direct look-up using (minsupp, minconf)? 3. How to handle Redundancy Relationships in the context of the parameterized Index? 1. How should we visually present these mining results to the users? 2. How can we leverage these results to support interactive rule exploration? 3. Can we utilize some data visualization techniques to help users better understand these mined results? FIRE: Visualization aspects

11 XYZ XYZ {} XYXZYZ 100 Confidence Support XZXZ XYXY X  YZ (0.1, 0.125) XY  Z, Z  XY XZ  Y, YZ  X Y  X (0.4,0.67) Z  X, Z  Y Parameter Space Model (PARAS) S 2 = S S 1= S (0,0.5) (0.4,0.67) (0.2,0) (0.4,0.5) l1l1 l2l2 l3l3 Stable Regions 1111/03/2014

12 Stable Regions {S } w/ Neighbors + Rules + Further, re-examining the redundancy definitions, we observed certain properties that enabled us to optimize computation and storage of redundancy information with respect to the parameter space.* 1211/03/2014 S2S2 S1S1 PARAS: Parameter Space Framework for Online Association Mining, VLDB * Xika Lin majorly contributed in the redundancy results.

13 Framework for Interactive Rule Exploration (FIRE) Mushroom dataset Chess dataset 1311/03/2014

14 All rules versus unique rules view 1411/03/2014

15 Unique + non-redundant rules view 1511/03/2014

16 Two-region Comparison 1611/03/2014

17 Rule Glyph View 17 Lined Glyph* Filled Glyph 11/03/2014 Filled Glyph MDS layout {poisonous? = edible}  {gill-attachment = free, veil-type = partial, veil-color = white} *M. O. Ward, A taxonomy of glyph placement strategies for multidimensional data visualization, Information Visualization 2002.

18 PARAS: Experimental Evaluation Data sets  Synthetic*: IBM Quest Generator (T10I4D100k and T10I4D5000k). Tx_Iy_Dz = x avg # of items per transaction, y x 1k total # of items, z transactions.  Real : Chess, Mushroom, Webdocs`. ~ 1811/03/2014 * R. Agrawal and R. Srikant, Fast algorithms for mining association rules in large databases, VLDB’94. ~ A. Asuncion and D. Newman, UCI ML repository ` C. Lucchese and S. Orlando and R. Perego and F. Silvestri, Webdocs: a real-life huge transactional dataset. FIMI’04.

19 PARAS: Experimental Evaluation ^ C. Borgelt, Efficient apriori, eclat & fp-growth, w/o redundancy resolution 1.Apriori 2.Eclat 3.FP-Growth 4.PARAS Tested Algorithms^ w/ redundancy resolution 1.Apriori_RR 2.Eclat_RR 3.FP-Growth_RR 4.AdjLattice_RR 5.PARAS_RR 1911/03/2014

20 PARAS: Experimental Methodologies 1.Average Online Processing Times (w/ and w/o RR).  Varying minsupp, fixed minconf  Fixed minsupp, varying minconf 2.Offline Preprocessing Times (AdjLatticeRR vs. PARAS) 2011/03/2014

21 1. Average Online Processing times (T5000k) w/ RRw/o RR 2111/03/2014 For a large diversity of online queries, PARAS consistently outperforms the state-of-the-art competitors from the literature by 2 to 5 orders of magnitude over the tested datasets.

22 2. Pre-processing Times C.C. Aggarwal and P.S. Yu, A new approach to online generation of association rules, IEEE TKDE’01. B. Nag, P. M. Deshpande, and D. J. DeWitt, Using a knowledge cache for interactive discovery of association rules, SIGKDD’99. PARAS requires ~10% extra offline preprocess time compared with AdjLatticeRR. 11/03/ Rule Generation  T5000k = 4 sec  Webdocs = 220 sec Confirmed: Cost(Freq. Itemset Generation) >> Cost(Rule Generation)

23 FIRE: User Study Questions  Stable Region Usage Tests T1: What are the most prominent rules by support and confidence? T2: Which settings (out of choice of 4) returns a different set of rules? T3: Find the common and unique rules for two distinct parameter settings.  Filter/Redundancy Test T4: Find the most frequent characteristics of edible and poisonous mushrooms.  Skyline View Test T5: Find the parameter settings that produce top-k rules in the dataset, where k = 20, 50, 100.  22 subjects  Mushroom and chess datasets  Cached Rule Miner (CRM) versus FIRE  Randomization to eliminate pre-knowledge 2311/03/2014

24 Mushroom Dataset: Tasks 1, 2 and /03/2014 Overall, FIRE outperforms the competitor CRM approach such that the users can achieve similar or better accuracy while having to use significantly less time for the tasks.

25 Tasks 4 and /03/2014 Overall, FIRE outperforms the competitor CRM approach such that the users can achieve similar or better accuracy while having to use significantly less time for the tasks.

26 Conclusion Gains of several orders of magnitude when using PARAS for online processing outweigh the one-time minimal offline preprocessing time and storage requirements. 2611/03/2014 We proposed a novel parameter space model, developed optimal algorithms and designed effective visualizations to facilitate interactive rule exploration by tackling challenges related to both computational and visualization aspects of online rule mining. Our user study establishes usability and effectiveness of the proposed features and interactions of the FIRE system in facilitating interactive rule mining.

27 Recent works at Samsung Research America 27 MobileMiner: Mining Your Frequent Behavior Patterns On Your Phone V Srinivasan et al., ACM UbiComp 2014 (Best Paper Nominee), HotMobile Mobile Sequence Miner: Adding Intelligence to Your Mobile Device via On-Device Sequential Pattern Mining A Mukherji et al., ACM MCSS Workshop in UbiComp User Behavior Analysis via On-device Mobile Sensing Unobtrusively learn sequential patterns of mobile users “Typically, when I am home on Sunday nights, I call my parents” Association rule mining over multi-modal mobile context data 11/03/2014

28 Thanks Contact me with questions: Abhishek Mukherji Samsung Research America 2811/03/2014


Download ppt "Facilitating Interactive Mining of Global and Local Association Rules Abhishek Mukherji* Elke A. Rundensteiner Matthew O. Ward Department of Computer Science,"

Similar presentations


Ads by Google