Outline. Conceptualization of diversity with unbalanced hierarchies.

Slides:



Advertisements
Similar presentations
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Advertisements

Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
Welcome to the World of Investigative Tasks
FP-Growth algorithm Vasiljevic Vladica,
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining for High Complexity Regions Using Entropy and Box Counting Dimension Quad-Trees Rosanne Vetro, Wei Ding, Dan A. Simovici Computer Science Department.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Structures and Algorithms1 B-Trees with Minimum=1 2-3 Trees.
Tirgul 5 AVL trees.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Data Mining Association Analysis: Basic Concepts and Algorithms
Cache Placement in Sensor Networks Under Update Cost Constraint Bin Tang, Samir Das and Himanshu Gupta Department of Computer Science Stony Brook University.
On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.
CS2420: Lecture 15 Vladimir Kulyukin Computer Science Department Utah State University.
Frequency Distribution A Frequency Distribution organizes data into classes, or categories, with a count of the number of observations that fall into each.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.
CS401 presentation1 Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility Takahiro Hara Presented by Mingsheng Peng (Proc. IEEE.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Efficient Gathering of Correlated Data in Sensor Networks
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Low-Power Gated Bus Synthesis for 3D IC via Rectilinear Shortest-Path Steiner Graph Chung-Kuan Cheng, Peng Du, Andrew B. Kahng, and Shih-Hung Weng UC San.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Heaps, Heapsort, Priority Queues. Sorting So Far Heap: Data structure and associated algorithms, Not garbage collection context.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Discover The Association Rules of Different Patterns Xuemei Fan.
The Binary Heap. Binary Heap Looks similar to a binary search tree BUT all the values stored in the subtree rooted at a node are greater than or equal.
Probabilistic and Statistical Techniques 1 Lecture 3 Eng. Ismail Zakaria El Daour 2010.
Mining various kinds of Association Rules
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University.
A Study of Balanced Search Trees: Brainstorming a New Balanced Search Tree Anthony Kim, 2005 Computer Systems Research.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM ’ 05 報告者:林靜怡.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Efficient Computing k-Coverage Paths in Multihop Wireless Sensor Networks XuFei Mao, ShaoJie Tang, and Xiang-Yang Li Dept. of Computer Science, Illinois.
1 The Threshold Join Algorithm for Top-k Queries in Distributed Sensor Networks D. Zeinalipour-Yazti, Z. Vagena, D. Gunopulos, V. Kalogeraki, V. Tsotras.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Balanced Billing Cycles and Vehicle Routing of Meter Readers by Chris Groër, Bruce Golden, Edward Wasil University of Maryland, College Park American University,
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Trees Chapter 15.
CSE Algorithms Quicksort vs Heapsort: the “inside” story or
Frequent Pattern Mining
Analysis and design of algorithm
Data Structures and Algorithms
Farzaneh Mirzazadeh Fall 2007
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Effective Replica Allocation
Not Always Feasible 2 (3,5) 1 s (0,2) 3 (, u) t.
Geometrically Inspired Itemset Mining*
Not Always Feasible 2 (3,5) 1 s (0,2) 3 (, u) t.
Presentation transcript:

Outline

Conceptualization of diversity with unbalanced hierarchies

Lower-bound balanced Frequent Pattern: Consider an unbalanced frequent pattern Y = {i1, i2 · · · in} with n items, a concept hierarchy of height h and h(i1), h(i2) · · · h(in) be the heights of the corresponding items. Let h(ij) be the least value among all the heights of items. A lower-bound balanced frequent pattern is defined when all the items in Y are brought down to same level h(ij). Construction: Given an unbalanced frequent pattern, first calculate the height of the item which lies highest in the concept hierarchy. Let this height be h(l). Remove all the extra edges below the level h(l). For the items which are below level h(l), we replace them by their corresponding parents at level h(l) (with duplicates removed).

Conceptualization of diversity with unbalanced hierarchies Upper-bound Balanced Frequent Pattern: Consider an unbalanced frequent pattern Y = {i1, i2 · · · in} with n items, a concept hierarchy of height h and h(i1), h(i2) · · · h(in) be the heights of the corresponding items. Let h(ij) be the highest value among all the heights of items. A upper-bound balanced frequent pattern is defined when all the items in Y are brought down to same level h(ij). Construction: First calculate the height of the item which lies highest and deepest in the concept hierarchy. Let these height be h(il) and h(id) respectively. Starting from the level h(il),we keep on adding one dummy edge for all the imbalanced nodes till we reach the level h(id).

Conceptualization of diversity with unbalanced hierarchies Figure : Unbalanced and corresponding Lower-Bound and Upper-Bound Balanced Frequent Patterns. The set {i1, i2} in (b) denotes the parent of items i1 and i2 at level 1.

Outline

Computing diversity

Generalization for certain items missing. Diversity of a balanced frequent pattern depends on : Merging Factor Level Factor. Contribution of MF at every level as some items are already at a generalized level. Notion of Adjustment Factor (AF) to calculate the contribution of MF at each level.

Adjustment Factor (AF)

Example of Adjustment Factor (AF) Y = {whole milk, pepsi, coke, shampoo} Y’= {whole milk, node2, node3, node4} Height of the hierarchy = 5 Height of the item which lies deepest = 5 Number of edges in Y at level 4. |EUFP(Y, 4)| = 1 Number of edges in upper-bound balanced pattern of Y = 4 |EUBFP(Y, 4)| = 4. AF(Y,4) = ¼ = 0.25 Similarly, AF(Y,3) = ¾ = 0.75 root Drink Beauty milk Soft Drink hair Original Cola shampoo fat coke pepsi node1 whole milk node2 node3 node4

DiverseRank(DRB)

Outline

Algorithm 14

Outline

Experimental Analysis 16

Generating Concept hierarchy 17

18 Number of diverse-frequent patterns VS minDiv  With the increase in minDiv, the number of diverse-frequent patterns has decreased irrespective of minSup threshold.  This is because of the fact that the items in the several frequent patterns belong to one or few categories.

Top 10 frequent patterns w.r.t to support 19  Extracted top 10 frequent patterns of size 3 w.r.t. support.  Highest support count for a pattern is 2.3(%).

Top 10 diverse-frequent patterns 20  Extracted top 10 diverse-frequent patterns of size 3.  The highest DiverseRank value is 1.

Experimental Observations 21

Experimental Results 22

Height of the simulated concept hierarchy : 14. Distribution of items in the simulated concept hierarchy is as follows: Experimental Results 23

Experimental Results 24

Outline

Related Work 26

Concept Hierarchies in Data Mining 27

Related Work 28

Outline

Conclusions and Future Work 30

References