Finding generators for H1.

Slides:



Advertisements
Similar presentations
Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

TEL-AVIV UNIVERSITY FACULTY OF EXACT SCIENCES SCHOOL OF MATHEMATICAL SCIENCES An Algorithm for the Computation of the Metric Average of Two Simple Polygons.
Hierarchical Clustering, DBSCAN The EM Algorithm
Surface Reconstruction From Unorganized Point Sets
Proximity graphs: reconstruction of curves and surfaces
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Flow Complex Joachim Giesen Friedrich-Schiller-Universität Jena.
By Groysman Maxim. Let S be a set of sites in the plane. Each point in the plane is influenced by each point of S. We would like to decompose the plane.
Topological Data Analysis
Lecture 6 Image Segmentation
©2008 I.K. Darcy. All rights reserved This work was partially supported by the Joint DMS/NIGMS Initiative to Support Research in the Area of Mathematical.
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
By Dor Lahav. Overview Straight Skeletons Convex Polygons Constrained Voronoi diagrams and Delauney triangulations.
Algorithmic Classification of Resonant Orbits Using Persistent Homology in Poincaré Sections Thomas Coffee.
Surface Reconstruction Some figures by Turk, Curless, Amenta, et al.
Computing the Delaunay Triangulation By Nacha Chavez Math 870 Computational Geometry; Ch.9; de Berg, van Kreveld, Overmars, Schwarzkopf By Nacha Chavez.
Nearest Neighbour Condensing and Editing David Claus February 27, 2004 Computer Vision Reading Group Oxford.
Part Two Multiresolution Analysis of Arbitrary Meshes M. Eck, T. DeRose, T. Duchamp, H. Hoppe, M. Lounsbery, W. Stuetzle SIGGRAPH 95.
Tamal K. Dey The Ohio State University Computing Shapes and Their Features from Point Samples.

Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Super-resolution addresses the problem.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Image Segmentation by Clustering using Moments by, Dhiraj Sakumalla.
Clustering methods Course code: Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu,
Recombination:. Different recombinases have different topological mechanisms: Xer recombinase on psi. Unique product Uses topological filter to only perform.
Algorithms for Triangulations of a 3D Point Set Géza Kós Computer and Automation Research Institute Hungarian Academy of Sciences Budapest, Kende u
Topological Data Analysis
TEL-AVIV UNIVERSITY RAYMOND AND BEVERLY SACKLER FACULTY OF EXACT SCIENCES SCHOOL OF MATHEMATICAL SCIENCES An Algorithm for the Computation of the Metric.
Lecture 7 : Point Set Processing Acknowledgement : Prof. Amenta’s slides.
Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Detecting Undersampling in Surface Reconstruction Tamal K. Dey and Joachim Giesen Ohio State University.
This work was partially supported by the Joint DMS/NIGMS Initiative to Support Research in the Area of Mathematical Biology (NSF ). Isabel K. Darcy.
A New Voronoi-based Reconstruction Algorithm
Vector Quantization CAP5015 Fall 2005.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : MARCO PIASTRA NN Self-organizing adaptive map: Autonomous learning of curves.
MATH:7450 (22M:305) Topics in Topology: Scientific and Engineering Applications of Algebraic Topology Sept 9, 2013: Create your own homology. Fall 2013.
A filtered complex is an increasing sequence of simplicial complexes: C 0 C 1 C 2 … UUU.
Detection of closed sharp edges in point clouds Speaker: Liuyu Time:
CDS 301 Fall, 2008 Domain-Modeling Techniques Chap. 8 November 04, 2008 Jie Zhang Copyright ©
Recombination:. Different recombinases have different topological mechanisms: Xer recombinase on psi. Unique product Uses topological filter to only perform.
On the Ability of Graph Coloring Heuristics to Find Substructures in Social Networks David Chalupa By, Tejaswini Nallagatla.
Recombination:. Different recombinases have different topological mechanisms: Xer recombinase on psi. Unique product Uses topological filter to only perform.
Creating a cell complex = CW complex Building block: n-cells = { x in R n : || x || ≤ 1 } 2-cell = open disk = { x in R 2 : ||x || < 1 } Examples: 0-cell.
Lecture 9 : Point Set Processing
Sept 25, 2013: Applicable Triangulations.
From Natural Images to MRIs: Using TDA to Analyze Image Data
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Clustering CSC 600: Data Mining Class 21.
Zigzag Persistent Homology Survey
Data Mining Soongsil University
If you use it, cite it.
Fitting: Voting and the Hough Transform
Oct 16, 2013: Zigzag Persistence and installing Dionysus part I.
Sept 23, 2013: Image data Application.
Application to Natural Image Statistics
MATH:7450 (22M:305) Topics in Topology: Scientific and Engineering Applications of Algebraic Topology Nov 15, 2013: Brief intro to tangles & Phylogeny.
plosone. org/article/info%3Adoi%2F %2Fjournal. pone
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
5.3. Mapper on 3D Shape Database

Domain-Modeling Techniques
Clustering Via Persistent Homology
A filtered complex is an increasing sequence of simplicial complexes: C0 C1 C2 …
Suppose your data points live in Rn.

Chapter 5: Morse functions and function-induced persistence
Betti numbers provide a signature of the underlying topology.
Presentation transcript:

Finding generators for H1

HanTun software available at http://web.cse.ohio-state.edu/~tamaldey/handle/hantun.html

HanTun software available at http://web.cse.ohio-state.edu/~tamaldey/handle/hantun.html

Shortloop software (more general) available at http://web.cse.ohio-state.edu/~tamaldey/shortloop.html Figures from http://web.cse.ohio-state.edu/~tamaldey/shortloop-pictures.html

http://web.cse.ohio-state.edu/~tamaldey/homology.html

Finding generators for H0

no horizontal transfer (i.e., no homologous recombination) Reconstructing phylogeny from persistent homology of avian influenza HA. (A) Barcode plot in dimension 0 of all avian HA subtypes. Influenza: For a single segment, no Hk for k > 0 no horizontal transfer (i.e., no homologous recombination) Reconstructing phylogeny from persistent homology of avian influenza HA. (A) Barcode plot in dimension 0 of all avian HA subtypes. Each bar represents a connected simplex of sequences given a Hamming distance of ε. When a bar ends at a given ε, it merges with another simplex. Gray bars indicate that two simplices of the same HA subtype merge together at a given ε. Solid color bars indicate that two simplices of different HA subtypes but same major clade merge together. Interpolated color bars indicate that two simplices of different major clades merge together. Colors correspond to known major clades of HA. For specific parameters, see SI Appendix, Supplementary Text. (B) Phylogeny of avian HA reconstructed from the barcode plot in A. Major clades are color-coded. (C) Neighbor-joining tree of avian HA (SI Appendix, Supplementary Text). ©2013 by National Academy of Sciences Chan J M et al. PNAS 2013;110:18566-18571

Hierarchical clustering Data Dendrogram http://en.wikipedia.org/wiki/File:Clusters.svg http://en.wikipedia.org/wiki/File:Hierarchical_clustering_simple_diagram.svg

Different type of hierarchical clustering What is the distance between 2 clusters? http://en.wikipedia.org/wiki/File:Hierarchical_clustering_simple_diagram.svg http://www.multid.se/genex/hs515.htm

http://statweb.stanford.edu/~tibs/ElemStatLearn/ The Elements of Statistical Learning (2nd edition) Hastie, Tibshirani and Friedman

Background for k-means clustering

Creating Delaunay triangulation via Voronoi diagrams data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html

Suppose your data points live in Rn. Voronoi diagram: Suppose your data points live in Rn. Choose data point v. The Voronoi cell associated with v is H(v,w) U w ≠ v data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html H(v,w) = { x in Rn : d(x, v) ≤ d(x, w) }

Suppose your data points live in Rn. Voronoi diagram: Suppose your data points live in Rn. Choose data point v. The Voronoi cell associated with v is H(v,w) U w ≠ v data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html H(v,w) = { x in Rn : d(x, v) ≤ d(x, w) }

The Voronoi cell associated with v is Choose data point v. The Voronoi cell associated with v is H(v,w) U w ≠ v data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html

The Voronoi cell associated with v is Choose data point v. The Voronoi cell associated with v is H(v,w) U w ≠ v data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html

The Voronoi cell associated with v is Choose data point v. The Voronoi cell associated with v is H(v,w) U w ≠ v data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html

The Voronoi cell associated with v is Choose data point v. The Voronoi cell associated with v is H(v,w) U w ≠ v data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html

Suppose your data points live in Rn. Voronoi diagram: Suppose your data points live in Rn. Choose data point v. The Voronoi cell associated with v is H(v,w) U w ≠ v data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html H(v,w) = { x in Rn : d(x, v) ≤ d(x, w) }

Voronoi diagram Suppose your data points live in Rn. Choose data point v. The Voronoi cell associated with v is H(v,w) U w ≠ v data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html The Voronoi cell associated with v is Cv= { x in Rn : d(x, v) ≤ d(x, w) for all w ≠ v }

k-means clustering k = desired number of clusters Data set = grey boxes Let k = 3 Randomly choose 3 points (points need not be in data set) 3 points = colored circle

Data set = grey boxes Let k = 3 Randomly choose 3 points (points need not be in data set) 3 points = colored circle Partition data set into 3 voronoi cells corresponding to the 3 colored circles

Find the centroids of the data cells in each of the voronoi cells

Re-partition data set into 3 voronoi cells corresponding to the 3 centroids

Re-partition data set into 3 voronoi cells corresponding to the 3 centroids Repeat

Lee-Mumford-Pedersen [LMP] study only high contrast patches. Collection: 4.5 x 106 high contrast patches from a collection of images obtained by van Hateren and van der Schaaf http://www.kyb.mpg.de/de/forschung/fg/bethgegroup/downloads/van-hateren-dataset.html

M(100, 10) U Q where |Q| = 30 On the Local Behavior of Spaces of Natural Images, Gunnar Carlsson, Tigran Ishkhanov, Vin de Silva, Afra Zomorodian, International Journal of Computer Vision 2008, pp 1-12.

Data set M has over 4 × 106 points in S7. Randomly choose 5000 points. is a point in S7 Data set M has over 4 × 106 points in S7. Randomly choose 5000 points. Take the T% densest points. Choose a subset of 50 Landmark points. http://www.ima.umn.edu/2005-2006/PISG7.10-28.06/activities/carlsson/mississippitwo.pdf

comptop.stanford.edu/preprints/witness.pdf

Witness complex Let D = set of point cloud data points. Choose L D, L = set of landmark points. U

Witness complex Let D = set of point cloud data points. Choose L D, L = set of landmark points. Normally L is a small subset, but in this example, L is a large red subset. U

v0,v1,...,vk span a k-simplex iff there is a point w ∈ D, whose k+1 nearest neighbours in L are v0,v1,...,vk and all the faces of {v0,v1,...,vk} belong to the witness complex. w is called a “weak” witness. W∞(D) = Witness complex Let D = set of point cloud data points. Choose L D, L = set of landmark points = vertices. U

v0,v1,...,vk span a k-simplex iff there is a point w ∈ D, whose k+1 nearest neighbours in L are v0,v1,...,vk and all the faces of {v0,v1,...,vk} belong to the witness complex. w is called a “weak” witness. W∞(D) = Witness complex Let D = set of point cloud data points. Choose L D, L = set of landmark points = vertices. U

W1(D) = Lazy witness complex Let L = set of landmark points. 1-skeletion of W1(D) = 1-skeletion of W∞ (D). Create the flag (or clique) complex: Add all possible simplices of dimensional > 1.

W1(D) = Lazy witness complex Let L = set of landmark points. 1-skeletion of W1(D) = 1-skeletion of W∞ (D). Create the flag (or clique) complex: Add all possible simplices of dimensional > 1.

W1(D) = Lazy witness complex Let L = set of landmark points. 1-skeletion of W1(D) = 1-skeletion of W∞ (D). Create the flag (or clique) complex: Add all possible simplices of dimensional > 1.

Choosing Landmark points: A.) Random B.) Maxmin 1.) choose point l1 randomly 2.) If {l1, …, lk-1} have been chosen, choose lk such that {l1, …, lk-1} is in D - {l1, …, lk-1} and min {d(lk, l1), …, d(lk, lk-1)} ≥ min {d(v, l1), …, d(v, lk-1)}

Choosing Landmark points data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html

Choosing Landmark points data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html

Choosing Landmark points data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html

Choosing Landmark points data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html

Choosing Landmark points data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html

Video: http://www.ima.umn.edu/videos/?id=2497 Tamal K. Dey http://www.cse.ohio-state.edu/~tamaldey/  Graph Induced Complex: A Data Sparsifier for Homology Inference Video: http://www.ima.umn.edu/videos/?id=2497 Slides: http://web.cse.ohio-state.edu/~tamaldey/talk/GIC/GIC.pdf Paper: http://web.cse.ohio-state.edu/~tamaldey/paper/GIC/GIC.pdf Graph Induced Complex on Point Data T. K. Dey,  F. Fan, and Y. Wang, (SoCG 2013) Proc. 29th Annu. Sympos. Comput. Geom. 2013, 107-116. Website: http://web.cse.ohio-state.edu/~tamaldey/GIC/gic.html The efficiency of extracting topological information from point data depends largely on the complex that is built on top of the data points. From a computational viewpoint, the most favored complexes for this purpose have so far been Vietoris-Rips and witness complexes. While the Vietoris-Rips complex is simple to compute and is a good vehicle for extracting topology of sampled spaces, its size is huge--particularly in high dimensions. The witness complex on the other hand enjoys a smaller size because of a subsampling, but fails to capture the topology in high dimensions unless imposed with extra structures. We investigate a complex called the {em graph induced complex} that, to some extent, enjoys the advantages of both. It works on a subsample but still retains the power of capturing the topology as the Vietoris-Rips complex. It only needs a graph connecting the original sample points from which it builds a complex on the subsample thus taming the size considerably. We show that, using the graph induced complex one can (i) infer the one dimensional homology of a manifold from a very lean subsample, (ii) reconstruct a surface in three dimension from a sparse subsample without computing Delaunay triangulations, (iii) infer the persistent homology groups of compact sets from a sufficiently dense sample. We provide experimental evidences in support of our theory.