Presentation is loading. Please wait.

Presentation is loading. Please wait.

A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department.

Similar presentations


Presentation on theme: "A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department."— Presentation transcript:

1 A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department

2 The tool Easy to use, efficient and expandable interface, for statistical research, based on the notion of data depth. For scientists with no computer science background.

3 Our goal Present the tool to the community Code\software available on request Run on real data Get feedback Is such a tool needed? Additions\improvements?

4 General C++ based software (no additional tools\software needed) Simple interface. Should allow to enter data files, sort the data points and filter unwanted data perform calculations present the results in an easy to understand graphical interface Save and output data for future use Fast Portable code

5 General description Data filter Contours display and selection Statistical modules output txt, excel files Geomview

6 Data filter Graphical user interface developed in C++ Used to crop\manipulate a data set before it is fed into the statistical modules Fast and light Convenient and easy to use user interface Portable code (UNIX, Solaris, Linux, Win)

7 Data filter

8 Statistical modules Depth contours (2D) Half-space (location) depth contours optimal O(n 2 ) time Supports two approaches for defining contours Including Tukey median and the bagplot Including contours’ parameters (size, etc..) Convex hull peeling depth contours Simplicial depth contours Tukey median computation (O(nlog 3 n)) Locating a new point in a set of depth contours (O(log n) query time)

9 Approaches for defining depth contours P. Rousseeuw et al. The k-th depth contour is the boundary of the set of points in the plane with depth  k R. Liu et al. (based on order statistics) The sample p-th central hull is the convex hull containing the most central fraction p sample points.

10 Half-space (location) depth contours module Depth contours for a sample set with 8 data points Depth contours for a data set describing diabetic patients

11 Statistical modules – cntd. Plots DD (Depth vs. Depth) plots O(n 2 ) time Shrinkage plots Fan plots

12 DD (Depth vs. Depth) plots module Two 2D data sets of 50 points each, created from normal distribution, centered at (0,0), with different covariance matrices (1 and 4 id). Depth according to set A Depth according to set B

13 Fan plots 50 data points, created from a random distribution, with covariance matrix 4 times identity. The fans are created for data sets containing the 1/6, 2/6,..central regions. For each region the area of the CH of 2, 4, 6,…% of the points is computed. Relative area (CH of p%/CH) Percentile of points

14 Graphical contour selection tool Plots depth contours and selects data ranges. Actions Import\export Select points Depth slider Filter

15 Future work Run the tool on existing data sets Distribute preliminary versions and get users feedback Data filter Group by row\column Filter by row\column Interactions between rows\columns (addition, substitution, logical operations) Statistical modules Implement additional modules Improve running times

16 Contributors Prof. Diane Souvaine Prof. Alva Couch Eynat Rafalin Michael Burr Joe Handelman James Hayes Ori Taka Alok Lal Janet Luan Kim Miller Tim Mitchell Nikolai Shvertner


Download ppt "A computational tool for depth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department."

Similar presentations


Ads by Google