Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song.

Similar presentations


Presentation on theme: "1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song."— Presentation transcript:

1 1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song (hsong@cs.umd.edu)hsong@cs.umd.edu Maryam Farboodi (farboodi@cs.umd.edu)farboodi@cs.umd.edu Feb, 09 2006

2 2 HCE 3.0 HCE (Hierarchical Clustering Explorer) Main Idea: GRID principles Graphics, Ranking and Interaction for Discovery Feature Application http://www.cs.umd.edu/hcil/hce/ User Manual http://www.cs.umd.edu/hcil/hce/hce3- manual/hce3_manual.html http://www.cs.umd.edu/hcil/hce/hce3- manual/hce3_manual.html Dataset http://www.cs.umd.edu/hcil/hce/examples/application_ examples.html http://www.cs.umd.edu/hcil/hce/examples/application_ examples.html

3 3 Axis-Parallel vs. Non Axis-Parallel Approach Definition 3 dimensions X, Y & Z Axis-parallel: Projection on either X & Y; X & Z or Y & Z Non axis-parallel: Can project on a.X+b.Y & Z Simplicity vs. power Users

4 4 Related Works Axis-parallel: Machine learning, Info. Vis. Pattern recognition Subset of dimensions to find specific patterns Machine learning and Data mining Supervised/ Unsupervised classification Subspace-based clustering analysis Projections naturally partitioning the data set Information Visualization Permutation Matrix Parallel coordinates: dimension ordering Conditional Entropy

5 5 Related Work (cntd.) Non axis-parallel: statisticians Two-dimensional projection SOM (Self Organizing Maps) XGobi: Grand tour, Projection pursuit No ranking HD-Eye interactive hierarchical clustering OptiGrid (partitioning clustering algorithm)

6 6 Major Contributions GRID (Graphics, Ranking and Interaction for Discovery) Study 1D, study 2D, then find features Ranking guides insight, statistics confirm Visualization Techniques Overview Coordination (multiple windows) Dynamic query (item slider)

7 7 General Overview Menu Toolbar Overviews, Color setting Dendrogram (binary tree), scatterplot 7 tabs Color mosaic, Table view, Histogram Ordering, Scatterplot ordering, Profile search, Gene ontology, K-means

8 8 General Overview back

9 9 Load/Transformation Data Natural Log Standardization Normalization To the first column Median Linear scaling back

10 10 Clustering Algorithm 1. Initially, each data a cluster by itself 2. Merge the pair with highest similarity value 3. Update similarity values 4. Repeat 2 & 3 for n - 1 times to reach one cluster of size n  No predefined number of clusters

11 11 Choosing Algorithm Parameters

12 12 Linkage Method Average Linkage Average Group Linkage Complete Linkage Single Linkage Scheinderman’s 1by1 Linkage Tries to grow the newly merged cluster of last iteration first

13 13 Dendrogram View back

14 14 7 Tabs

15 15 1D Histogram Interface Interface description Control panel, Score overview, Ordered list, Histogram browser

16 16 1D Histogram Ordering Ranking criteria Normality of the distribution (0~∞) s: skewness, k: kurtosis: Uniformity of the distribution (0~∞) Number of potential outliers (0~n) IQR = Q3 – Q1, d: item value Suspected outlier: Extreme outlier: Number of unique values (0~n) Size of the biggest gap (0~max. dim. range) mf: max frequency, t: tolerance:

17 17 2D Scatterplot Interface Interface description Control panel, Score overview, Ordered list, Scatterplot browser

18 18 2D Scatterplot Ordering Ranking criteria Statistical Relationship Correlation coefficient(-1~1): Pearson’s coefficient Least square error for curvilinear regression(0~1) Quadracity(-∞~∞) Distribution Characteristics Number of potential outliers(0~n)  LOF-based: Density-based outlier detection Number of items in area of interest(0~n) Uniformity(0~∞) :

19 19 Demo

20 20 System Constraints Computational Complexity n data in m dimensional space : O(nm²) O(n) : scoring complexity O(m²) :combination of dimension Display Constraints Appropriate number of dimensions for score overview component: 0~130 Lack of sliders to adjust displacement

21 21 Evaluation of HCE 3.0 Linear color mapping (3 color or 1 color) Consistent layout of the components Focus-context F: dendrogram – C: rank-by-feature F: ordered list - C: histogram, scatter plot Item slider Dynamic query Multi-window view Dynamic update of data selection in different window

22 22 Futureworks User study Various statistical tools and data mining algorithms HCE 3.0 (HCE 3.5)HCE 4.0 ?? 1D, 2D axis parallel projection  3D projection Numerical data format  Numerical + categorical, binary, Nominal Limited number of applicable datasets ( us cities, cereal, netscan …) 1D - 5 ranking criteria 2D – 6 ranking criteria  More meaningful datasets to demonstrate the power of each ranking criteria  Incorporate more criterion into rank-by- feature framework

23 23 Thank you! Questions?


Download ppt "1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song."

Similar presentations


Ads by Google