Object Orie’d Data Analysis, Last Time Statistical Smoothing –Histograms – Density Estimation –Scatterplot Smoothing – Nonpar. Regression SiZer Analysis.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.
Independent Component Analysis Personal Viewpoint: Directions that maximize independence Motivating Context: Signal Processing “Blind Source Separation”
Extended Gaussian Images
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Next Week, Early, Oct.,
Rodent Behavior Analysis Tom Henderson Vision Based Behavior Analysis Universitaet Karlsruhe (TH) 12 November /9.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Statistics – O. R. 892 Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research University of North Carolina.
Object Orie’d Data Analysis, Last Time Finished Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp. gives rotation,
Chapter 2 Dimensionality Reduction. Linear Methods
Object Orie’d Data Analysis, Last Time OODA in Image Analysis –Landmarks, Boundary Rep ’ ns, Medial Rep ’ ns Mildly Non-Euclidean Spaces –M-rep data on.
Object Orie’d Data Analysis, Last Time
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
Object Orie’d Data Analysis, Last Time Organizational Matters
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
1 UNC, Stat & OR Nonnegative Matrix Factorization.
A Challenging Example Male Pelvis –Bladder – Prostate – Rectum.
Statistics – O. R. 891 Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research University of North Carolina.
Statistics – O. R. 891 Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research University of North Carolina.
Object Orie’d Data Analysis, Last Time Cornea Data & Robust PCA –Elliptical PCA Big Picture PCA –Optimization View –Gaussian Likelihood View –Correlation.
Robust PCA Robust PCA 3: Spherical PCA. Robust PCA.
Generalized Hough Transform
Object Orie’d Data Analysis, Last Time Discrimination for manifold data (Sen) –Simple Tangent plane SVM –Iterated TANgent plane SVM –Manifold SVM Interesting.
Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
SiZer Background Scale Space – Idea from Computer Vision Goal: Teach Computers to “See” Modern Research: Extract “Information” from Images Early Theoretical.
SWISS Score Nice Graphical Introduction:. SWISS Score Toy Examples (2-d): Which are “More Clustered?”
Object Orie’d Data Analysis, Last Time SiZer Analysis –Zooming version, -- Dependent version –Mass flux data, -- Cell cycle data Image Analysis –1 st Generation.
Maximal Data Piling Visual similarity of & ? Can show (Ahn & Marron 2009), for d < n: I.e. directions are the same! How can this be? Note lengths are different.
Common Property of Shape Data Objects: Natural Feature Space is Curved I.e. a Manifold (from Differential Geometry) Shapes As Data Objects.
Object Orie’d Data Analysis, Last Time Cornea Data & Robust PCA –Elliptical PCA Big Picture PCA –Optimization View –Gaussian Likelihood View –Correlation.
1 UNC, Stat & OR PCA Extensions for Data on Manifolds Fletcher (Principal Geodesic Anal.) Best fit of geodesic to data Constrained to go through geodesic.
Object Orie’d Data Analysis, Last Time Organizational Matters What is OODA? Visualization by Projection.
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
© 2009 Robert Hecht-Nielsen. All rights reserved. 1 Andrew Smith University of California, San Diego Building a Visual Hierarchy.
Object Orie’d Data Analysis, Last Time SiZer Analysis –Statistical Inference for Histograms & S.P.s Yeast Cell Cycle Data OODA in Image Analysis –Landmarks,
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Thurs., Early, Oct., Nov.,
Statistics – O. R. 892 Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research University of North Carolina.
1 Hailuoto Workshop A Statistician ’ s Adventures in Internetland J. S. Marron Department of Statistics and Operations Research University of North Carolina.
Participant Presentations Please Prepare to Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Next Week, Early,
1 UNC, Stat & OR U. C. Davis, F. R. G. Workshop Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research, University of North.
Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual.
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, I J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual.
Course 3 Binary Image Binary Images have only two gray levels: “1” and “0”, i.e., black / white. —— save memory —— fast processing —— many features of.
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Next Week, Early, Oct.,
Classification on Manifolds Suman K. Sen joint work with Dr. J. S. Marron & Dr. Mark Foskey.
GWAS Data Analysis. L1 PCA Challenge: L1 Projections Hard to Interpret (i.e. Little Data Insight) Solution: 1)Compute PC Directions Using L1 2)Compute.
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
PCA Data Represent ’ n (Cont.). PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution.
Object Orie’d Data Analysis, Last Time Organizational Matters
Distance Weighted Discrim ’ n Based on Optimization Problem: For “Residuals”:
SigClust Statistical Significance of Clusters in HDLSS Data When is a cluster “really there”? Liu et al (2007), Huang et al (2014)
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Landmark Based Shapes As Data Objects
Statistical Smoothing
Return to Big Picture Main statistical goals of OODA:
SiZer Background Finance "tick data":
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Radial DWD Main Idea: Linear Classifiers Good When Each Class Lives in a Distinct Region Hard When All Members Of One Class Are Outliers in a Random Direction.
Detection of Local Cortical Asymmetry via Discriminant Power Analysis
Statistics – O. R. 881 Object Oriented Data Analysis
(will study more later)
Maximal Data Piling MDP in Increasing Dimensions:
Landmark Based Shape Analysis
Today is Last Class Meeting
“good visual impression”
Presentation transcript:

Object Orie’d Data Analysis, Last Time Statistical Smoothing –Histograms – Density Estimation –Scatterplot Smoothing – Nonpar. Regression SiZer Analysis –Replaces bandwidth selection –Scale Space –Statistical Inference: Which bumps are “ really there ” ? –Visualization

Kernel Density Estimation Choice of bandwidth (window width)? Very important to performance Fundamental Issue: Which modes are “really there”?

SiZer Background Fun Scale Spaces Views (Incomes Data) Surface View

SiZer Background SiZer analysis of British Incomes data:

SiZer Background Finance "tick data": (time, price) of single stock transactions Idea: "on line" version of SiZer for viewing and understanding trends

SiZer Background Finance "tick data": (time, price) of single stock transactions Idea: "on line" version of SiZer for viewing and understanding trends Notes: trends depend heavily on scale double points and more background color transition (flop over at top)

SiZer Background Internet traffic data analysis: SiZer analysis of time series of packet times at internet hub (UNC) Hannig, Marron, and Riedi (2001)

SiZer Background Internet traffic data analysis: SiZer analysis of time series of packet times at internet hub (UNC) across very wide range of scales needs more pixels than screen allows thus do zooming view (zoom in over time) –zoom in to yellow bd ’ ry in next frame –readjust vertical axis

SiZer Background Internet traffic data analysis (cont.) Insights from SiZer analysis: Coarse scales: amazing amount of significant structure Evidence of self-similar fractal type process? Fewer significant features at small scales But they exist, so not Poisson process Poisson approximation OK at small scale??? Smooths (top part) stable at large scales?

Dependent SiZer Rondonotti, Marron, and Park (2007) SiZer compares data with white noise Inappropriate in time series Dependent SiZer compares data with an assumed model Visual Goodness of Fit test

Dep ’ ent SiZer : 2002 Apr 13 Sat 1 pm – 3 pm

Zoomed view (to red region, i.e. “ flat top ” )

Further Zoom: finds very periodic behavior!

Possible Physical Explanation IP “ Port Scan ” Common device of hackers Searching for “ break in points ” Send query to every possible (within UNC domain): –IP address –Port Number Replies can indicate system weaknesses Internet Traffic is hard to model

SiZer Overview Would you like to try a SiZer analysis? Matlab software: JAVA version (demo, beta): Follow the SiZer link from the Wagner Associates home page: More details, examples and discussions:

PCA to find clusters Return to PCA of Mass Flux Data:

PCA to find clusters SiZer analysis of Mass Flux, PC1

PCA to find clusters SiZer analysis of Mass Flux, PC1 Conclusion: Found 3 significant clusters! Correspond to 3 known “ cloud types ” Worth deeper investigation

Recall Yeast Cell Cycle Data “ Gene Expression ” – Micro-array data Data (after major preprocessing): Expression “ level ” of: thousands of genes (d ~ 1,000s) but only dozens of “ cases ” (n ~ 10s) Interesting statistical issue: High Dimension Low Sample Size data (HDLSS)

Yeast Cell Cycle Data, FDA View Central question: Which genes are “ periodic ” over 2 cell cycles?

Yeast Cell Cycle Data, FDA View Periodic genes? Na ï ve approach: Simple PCA

Yeast Cell Cycle Data, FDA View Central question: which genes are “ periodic ” over 2 cell cycles? Na ï ve approach: Simple PCA No apparent (2 cycle) periodic structure? Eigenvalues suggest large amount of “ variation ” PCA finds “ directions of maximal variation ” Often, but not always, same as “ interesting directions ” Here need better approach to study periodicities

Yeast Cell Cycles, Freq. 2 Proj. PCA on Freq. 2 Periodic Component Of Data

Frequency 2 Analysis

Project data onto 2-dim space of sin and cos (freq. 2) Useful view: scatterplot Angle (in polar coordinates) shows phase Colors: Spellman ’ s cell cycle phase classification Black was labeled “ not periodic ” Within class phases approx ’ ly same, but notable differences Now try to improve “ phase classification ”

Yeast Cell Cycle Revisit “ phase classification ”, approach: Use outer 200 genes (other numbers tried, less resolution) Study distribution of angles Use SiZer analysis (finds significant bumps, etc., in histogram) Carefully redrew boundaries Check by studying k.d.e. angles

SiZer Study of Dist ’ n of Angles

Reclassification of Major Genes

Compare to Previous Classif ’ n

New Subpopulation View

OODA in Image Analysis First Generation Problems: Denoising Segmentation (find object boundaries) Registration (align objects) (all about single images)

OODA in Image Analysis Second Generation Problems: Populations of Images –Understanding Population Variation –Discrimination (a.k.a. Classification) Complex Data Structures (& Spaces) HDLSS Statistics

HDLSS Data in Image Analysis Why HDLSS (High Dim, Low Sample Size)? Complex 3-d Objects Hard to Represent –Often need d = 100 ’ s of parameters Complex 3-d Objects Costly to Segment –Often have n = 10 ’ s of cases

Image Object Representation Major Approaches for Images: Landmark Representations Boundary Representations Medial Representations

Landmark Representations Main Idea: On each object find important points Treat point locations as features I.e. represent objects by vectors of point locations (in 2-d or 3-d) (Fits in OODA framework)

Landmark Representations Basis of Field of Statistical Shape Analysis: (important precursor of FDA & OODA) Main References: Kendall (1981, 1984) Bookstein (1984) Dryden and Mardia (1998) (most readable and comprehnsive)

Landmark Representations Nice Example: Fly Wing Data (Drosophila fruit flies) From George Gilchrist, W. & M. U. Graphic Illustrating Landmarks (next page) –Same veins appear in all flies –And always have same relationship –I.e. all landmarks always identifiable

Landmark Representations Landmarks for fly wing data:

Landmark Representations Important issue for landmark approaches: Location, i. e. Registration Illustration with Fly Wing Data (next slide) Problem: coordinates are “ locations in photo ” & unclear where wing is positioned …

Landmark Representations Illustration of Registration, with Fly Wing Data

Landmark Representations Standard Approach to Registration Problem: Procrustes Analysis Idea: mod out location Can also mod out rotation Can also mod out size Recommended reference: Dryden and Mardia (1988)

Landmark Representations Procustes Results for Fly Wing Data

Landmark Representations Effect of Procrustes Analysis: Study Difference Between Continents Flies from Europe & South America Look for important differences Project onto mean difference direction Visualize with movie –Equal time spacing –Through range of data

Landmark Representations No Procrustes Adjustment: Movies on Difference Between Continents

Landmark Representations Effect of Procrustes Analysis: Movies on Difference Between Continents Raw Data –Driven by location effects –Strongly feels size –Hard to understand shape

Landmark Representations Location, Rotation, Scale Procrustes: Movies on Difference Between Continents

Landmark Representations Effect of Procrustes Analysis: Movies on Difference Between Continents Raw Data –Driven by location effects –Strongly feels size –Hard to understand shape Full Procrustes –Mods out location, size, rotation –Allows clear focus on shape

Landmark Representations Major Drawback of Landmarks: Need to always find each landmark Need same relationship I.e. Landmarks need to correspond Often fails for medical images E.g. How many corresponding landmarks on a set of kidneys, livers or brains???

Landmark Representations Landmarks for brains??? (thanks to Liz Bullit) Very hard to identify

Landmark Representations Look across people: Some structure in common But “folds” are different Consistent Landmarks???

Landmark Representations Look across people: Some structure in common But “folds” are different Consistent Landmarks???

Boundary Representations Major sets of ideas: Triangular Meshes –Survey: Owen (1998) Active Shape Models –Cootes, et al (1993) Fourier Boundary Representations –Keleman, et al (1997 & 1999)

Boundary Representations Example of triangular mesh rep’n: From:

Boundary Representations Example of triangular mesh rep’n for a brain: From : meshlab.sourceforge.net/SnapMeshLab.brain.jpgmeshlab.sourceforge.net/SnapMeshLab.brain.jpg

Boundary Representations Main Drawback: Correspondence For OODA (on vectors of parameters): Need to “match up points” Easy to find triangular mesh –Lots of research on this driven by gamers Challenge match mesh across objects –There are some interesting ideas…

Medial Representations Main Idea: Represent Objects as: Discretized skeletons (medial atoms) Plus spokes from center to edge Which imply a boundary Very accessible early reference: Yushkevich, et al (2001)

Medial Representations 2-d M-Rep Example: Corpus Callosum (Yushkevich)

Medial Representations 2-d M-Rep Example: Corpus Callosum (Yushkevich) Atoms Spokes Implied Boundary

Medial Representations 3-d M-Rep Example: From Ja-Yeon Jeong Bladder – Prostate - Rectum Atoms - Spokes - Implied Boundary

Medial Representations 3-d M-reps: there are several variations Two choices: From Fletcher (2004)

Medial Representations Statistical Challenge M-rep parameters are: –Locations –Radii –Angles (not comparable) Stuffed into a long vector I.e. many direct products of these

Medial Representations Statistical Challenge: How to analyze angles as data? E.g. what is the average of: – ??? (average of the numbers) – (of course!) Correct View of angular data: Consider as points on the unit circle

Medial Representations What is the average (181 o ?) or (1 o ?) of:

Medial Representations Statistical Analysis of Directional Data: Common Examples: –Wind Directions (0-360) –Magnetic Fields (0-360) –Cracks (0-180) There is a literature (monographs): –Mardia (1972, 2000) –Fisher, et al (1987, 1993)

Medial Representations Statistical Challenge Many direct products of: –Locations –Radii –Angles (not comparable) Appropriate View: Data Lie on Curved Manifold Embedded in higher dim ’ al Eucl ’ n Space

Medial Representations Data on Curved Manifold Toy Example:

Medial Representations Data on Curved Manifold Viewpoint: Very Simple Toy Example (last movie) Data on a Cylinder = Notes: –Simplest non-Euclidean Example –2-d data, embedded on manifold in –Can flatten the cylinder, to a plane –Have periodic representation –Movie by: Suman Sen Same idea for more complex direct prod ’ s