Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gedas Adomavicius Jesse Bockstedt

Similar presentations


Presentation on theme: "Gedas Adomavicius Jesse Bockstedt"— Presentation transcript:

1 C-TREND: A New Technique for Identifying and Visualizing Trends in Transactional Data
Gedas Adomavicius Jesse Bockstedt Winter Conference on Business Intelligence February 23, 2007

2 © Adomavicius and Bockstedt 2007
Motivation Graphing patterns of technology evolution Current techniques: time-series, sequence analysis/data streams, data mining Representing multi-attribute transaction data and identify trends Provide domain experts an interactive tool for data visualization C-TREND: Cluster-based Temporal Representation of EveNt Data Meta-analysis technique to visualizes hierarchical clustering output over multiple time periods. Mention that this came out of work on modeling technology evolution. We needed a method for representing trends in multi-attribute data. Available techniques include time-series, sequence analysis (a pattern matching) / data streams, and existing data mining methods. © Adomavicius and Bockstedt 2007

3 © Adomavicius and Bockstedt 2007
C-TREND C-TREND provides users a set of parameters to adjust visualization: k – cluster solution size  - within period trend strength  - cross period trend strength © Adomavicius and Bockstedt 2007

4 © Adomavicius and Bockstedt 2007
Example Graph 20 29 40 54 36 16 38 18 21 45 30 |D2| = 75 k2 = 4 |D3| = 94 k3 = 3 |D4| = 122 k4 = 5 |D1| = 89 k1 = 4  = 0.05  = 0.8 1.08 0.56 0.51 0.08 0.21 0.06 0.15 0.49 0.25 © Adomavicius and Bockstedt 2007

5 © Adomavicius and Bockstedt 2007
Data Structure Two data structures created in preprocessing Node list Dendrogram nodes indexed for efficient search Node center and size Edge list All possible edge generated Edge weight Updating graph simply switches Flags used to indicate inclusion of nodes and edges Draw on the board these data structures under the dendrogram © Adomavicius and Bockstedt 2007

6 Extracting k-sized solutions
k is set optimally initially using a gap statistic k parameter provides a “zoom” like feature for data visualization To extract a cluster solution of size k Create a cluster set starting at the root of the dendrogram If size of cluster set is k then output solution Else, replace the highest cluster in the cluster set by its children Repeat steps 2 and 3 19 6 5 4 7 3 2 1 8 9 10 12 13 14 15 16 17 18 11 Using the same dendrogram do an example extraction of k = 4. © Adomavicius and Bockstedt 2007

7 © Adomavicius and Bockstedt 2007
Node Filters User specifies   (0,1), the within-period trend strength to filter out spurious clusters Nodes included in output graph for partition i are clusters with size greater than |Di|, where |Di| is the number of data points in partition i. For example, if  = 0.02 and |Di| = 500, only clusters of size 10 or greater would render nodes. © Adomavicius and Bockstedt 2007

8 © Adomavicius and Bockstedt 2007
Edge Filters Edges represent similarity between clusters across periods User specifies   (0,1), the cross-period trend strength Edge is rendered if 1) Incident to two rendered nodes and 2) weight is less than or equal to , where  = (r(Vi)+r(Vi+1))/2 and r(Vi) is the average distance between data point and the center in a cluster of all points in partition i. For example,  = 0.8, r(V1) = 1.4, and r(V2) = 1.8, an edge will be rendered between x, y only if the d(x, y)  1.28 © Adomavicius and Bockstedt 2007

9 © Adomavicius and Bockstedt 2007
Performance © Adomavicius and Bockstedt 2007

10 © Adomavicius and Bockstedt 2007
WiFi Case Study: Data 2425 WiFi technology certifications issued by the WiFi Alliance between March 2000 and December (Source: 10 WiFi Alliance Technology Categories 3 types of technical functionality classifications each with 29 total attributes per certification © Adomavicius and Bockstedt 2007

11 Wi-Fi Case Study = 0.02, =0.8 |D1| = 67 |D4| = 445 k1 = 3 k2 = 7
802.11b Access Points 802.11b/g Access Points with WPA Security 802.11b Internal Cards 802.11b External Cards 802.11b Cards 802.11b with WPA Security 802.11b/g Cell Phones and PDAs with Security |D1| = 67 k1 = 3 |D4| = 445 k2 = 7 802.11b/g components with WPA Security © Adomavicius and Bockstedt 2007 = 0.02, =0.8

12 © Adomavicius and Bockstedt 2007
Wi-Fi Case Study DEMO © Adomavicius and Bockstedt 2007

13 © Adomavicius and Bockstedt 2007
Extensions Trend and graph metrics Hypothesis testing Optimal partitioning Predictive modeling Extension of interactive GUI Non-temporal data © Adomavicius and Bockstedt 2007

14 Questions and Discussion
© Adomavicius and Bockstedt 2007

15 © Adomavicius and Bockstedt 2007
Preprocessing Data set is separated into D1,…, Dt partitions Hierarchical clustering performed on each partition creating t dendrograms Data structure created to store results of clustering C-TREND uses agglomerative hierarchical clustering: Provide a quick example of hierarchical clustering and a dendrogram © Adomavicius and Bockstedt 2007

16 © Adomavicius and Bockstedt 2007
C-TREND Output |D1| = 67 k1 = 3 = 0.02, =0.8 |D4| = 445 k2 = 7 © Adomavicius and Bockstedt 2007


Download ppt "Gedas Adomavicius Jesse Bockstedt"

Similar presentations


Ads by Google