Applications of Visualization and Data Clustering to 3D Gene Expression Data Oliver Rübel 1,2,3,7, Gunther H. Weber 3,7, Min-Yu Huang 1,7, E. Wes Bethel.

Slides:



Advertisements
Similar presentations
Visualization and analysis of large data collections: a case study applied to confocal microscopy data Wim de Leeuw, Swammerdam Institute for Life Sciences,
Advertisements

Tianyu Zhan, Sharon Huang, Nallammai Muthiah, Evangeline Giannopoulos, J Peter Gergen Stony Brook University, Department of Biochemistry and Cell Biology.
1 * egg: generate the system * larva: eat and grow
Announcements Exam this Wednesday: my “half” is 40%. Gerry Prody’s “half” is 60%. Exam regrade policy: if you have a question about how I graded an answer,
Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.
MSc GBE Course: Genes: from sequence to function Brief Introduction to Systems Biology Sven Bergmann Department of Medical Genetics University of Lausanne.
Inferring regulatory networks from spatial and temporal gene expression patterns Y. Fomekong Nanfack¹, Boaz Leskes¹, Jaap Kaandorp¹ and Joke Blom² ¹) Section.
Biol/Chem 473 Schulze lecture 5: Eukaryotic gene regulation: Early Drosophila development.
1 * egg: generate the system * larva: eat and grow
Gene regulation Ch 8 pp What will I learn? THAT; 1.Gene regulation is at the heart of development 2.The most important part of a gene is its.
Systems Biology of Pattern Formation, Canalization and Transcription in the Drosophila Blastoderm John Reinitz STAT Applied Math Retreat Gleacher Center.
Extraction and comparison of gene expression patterns from 2D RNA in situ hybridization images BIOINFORMATICS Gene expression Vol. 26, no. 6, 2010, pages.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
Object-based Image Representation Dr. B.S. Manjunath Sitaram Bhagavathy Shawn Newsam Baris Sumengen Vision Research Lab University of California, Santa.
Computing correspondences in order to study spatial and temporal patterns of gene expression Charless Fowlkes UC Berkeley, Computer Science.
Finding and exploiting correspondences in Drosophila embryos Charless Fowlkes and Jitendra Malik UC Berkeley Computer Science.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Evaluating the Quality of Image Synthesis and Analysis Techniques Matthew O. Ward Computer Science Department Worcester Polytechnic Institute.
Data Mining By Archana Ketkar.
Speaker: Li-xia Gao Supervisor: Jufang He Department of Rehabilitation Scienc, Hong Kong Polytechnic University 06/12/2010.
Fuzzy K means.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
B 5:0-3% 5:9-25% 5:51-75% Introduction The Berkeley Drosophila Transcription Network Project (BDTNP) is a multidisciplinary collaboration studying the.
Data Mining – Intro.
12 -1 Lecture 12 User Modeling Topics –Basics –Example User Model –Construction of User Models –Updating of User Models –Applications.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Gene Expression Analysis using Microarrays Anne R. Haake, Ph.D.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
ABSTRACT To fully understand and be able to computationally model the spatial complexity of developmental regulatory networks, it is critical to measure.
Data Mining Solutions (Westphal & Blaxton, 1998) Dr. K. Palaniappan Dept. of Computer Engineering & Computer Science, UMC.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.
ABSTRACT To fully understand and be able to computationally model the spatial complexity of developmental regulatory networks, it is critical to measure.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Geographic Information System GIS This project is implemented through the CENTRAL EUROPE Programme co-financed by the ERDF GIS Geographic Inf o rmation.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Microarrays.
A Graph-based Friend Recommendation System Using Genetic Algorithm
PreDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Department.
Introduction The Berkeley Drosophila Transcription Network Project is developing a suite of methods to convert image stacks generated by confocal microscopy.
Geovisualization and Spatial Analysis of Cancer Data: Developing Visual-Computational Spatial Tools for Cancer Data Research Challenges for Spatial Data.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
B Nameeta Shah 1, Michael Teplitsky 2, Len A. Pennacchio, 2,3, Philip Hugenholtz 3, Bernd Hamann 1, 2, and Inna Dubchak 2, 3 1 Institute for Data Analysis.
MATH 499 VIGRE Seminar: Mathematical Models in Developmental Biology
Stage 5:0-3% t5:4-8% t5:9-25% t5:26-50% t5:51-75% t5:76-100%. BDTNP data >6800 PointClouds for stages 4 and 5 ~3.5 Tb raw image data ~6.8 Gb stage 4 and.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 Dept. Computer Science and Information Engineering.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Annotating Gene List From Literature Xin He Department of Computer Science UIUC.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
A morphogenic framework for analyzing gene expression in Drosophila melanogaster blastoderms. Soile V.E. Keränen 1, Cris L. Luengo Hendriks 1, Charless.
In: Pattern Analysis and Machine Intelligence, IEEE Transactions on, Vol. 30, Nr. 1 (2008), p Group of Adjacent Contour Segments for Object Detection.
Computational Biology
Pathology Spatial Analysis February 2017
Albert Xue, Binbin Huang, Jianrong Wang
1 * egg: generate the system * larva: eat and grow
Volume 19, Issue 23, Pages (December 2009)
Whole-Embryo Modeling of Early Segmentation in Drosophila Identifies Robust and Fragile Expression Domains  Jonathan Bieler, Christian Pozzorini, Felix.
Reconstructing Complex Tissues from Single-Cell Analyses
Dagan Wells, Ph. D. , Mercedes G. Bermúdez, M. Sc
Imaging in Systems Biology
Precision of Hunchback Expression in the Drosophila Embryo
Diverse patterns, similar mechanism
Volume 133, Issue 2, Pages (April 2008)
Presentation transcript:

Applications of Visualization and Data Clustering to 3D Gene Expression Data Oliver Rübel 1,2,3,7, Gunther H. Weber 3,7, Min-Yu Huang 1,7, E. Wes Bethel 3, Mark D. Biggin 4,7, Charless C. Fowlkes 5,7, Cris L. Luengo Hendriks 6,7, Soile V. E. Keränen 4,7, Michael B. Eisen 4,7, David W. Knowles 6,7, Jitendra Malik 5,7, Hans Hagen 2, and Bernd Hamann 1,2,3,7 1. Institute for Data Analysis and Visualization, University of California, Davis, One Shields Avenue, Davis CA 95616, USA 2. International Research Training Group “Visualization of Large and Unstructured Data Sets,” University of Kaiserslautern, Germany 3. Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94620, USA 4. Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94620, USA 5. Computer Science Division,University of California, Berkeley, CA, USA 6. Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94620, USA 7. Berkeley Drosophila Transcription Network Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94620, USA, / / Biological Background Animals comprise dynamic 3D arrays of cells that express gene products in intricate spatial and temporal patterns. These patterns of gene expression determine the shape and form of the animal. Biologists have typically analyzed gene expression and morphology by visual inspection of 2D microscopic images. A rigorous understanding of developmental processes requires methods that can quantitatively analyze these phenomenally complex arrays at the level of cellular resolution. Single Pattern Analysis Genes are frequently expressed in complex patterns consisting of quantitative differences in expression between cells of an embryo. Clustering can be used effectively to discretize the expression pattern of a gene. Discretization of expression patterns can be very useful, e.g., to create logical models of gene networks. Here the pattern of eve (a) is classified into 2, 3, and 6 levels (b-d). Based on the results shown in (d), seven clusters, each selecting one stripe of the eve pattern, are created using cluster post-processing techniques. Characteristics of the seven stripes are revealed in the scatter-plot of three eve regulators gt, hb and, Kr. Temporal Variation Analysis Gene expression patterns are not static but are highly dynamic. Understanding the temporal profile of a gene expression pattern is therefore essential if we are to understand complex relationships between genes. To assist in the analysis of the spatio-temporal expression pattern of genes we use PointCloudXplore to cluster cells into groups based on the similarity of their temporal expression profiles. The example here shows the classification of the spatio-temporal pattern of giant (gt) expression. Cluster statistics, such as average temporal expression profiles of clusters, reveal the complex changes of gene patterns and allow quantitation of their temporal variation. Multiple Pattern Analysis To dissect the complex regulatory interactions between genes, the expression patterns of multiple potential regulatory transcription factors can be used as input to cluster analysis. Cells are classified into clusters that have similar combinations of expression for the input set of regulators. Each cluster describes one potential sub-pattern that a regulatory network composed of these factors could give rise to. The results of such a clustering can also be compared to the expression patterns of suspected target genes to assess possible regulatory relationships. Here, the pattern of the genes giant (gt), hunchback (hb), and Krüppel (Kr) have been used as input to the clustering. Clustering results are compared to stripe two pf the eve expression pattern, suggesting that the anterior and posterior border of the stripe as well as the ventral dip in eve expression can be modeled using gt, hb, and Kr expression levels. 3D Gene Expression Data The BDTNP has developed a suite of methods to quantitate the expression of genes in 3D at cellular resolution from whole Drosophila embryos. Drosophila embryos are first imaged using two- photon fluorescence microscopy. The resulting 3D image stacks are segmented in order to extract information about the expression of genes on a per cell basis. Currently datasets with information up to about 100 genes at up to six different time steps are available. PointCloudXplore: A Framework for Visualization and Clustering of 3D Gene Expression Data In our software called PointCloudXplore we have linked dedicated physical and information visualization views of the data via the concept of brushing (cell selection). A user can select and highlight cells of interest in any view. All brushes (cell selectors) are then stored in a central cell selector management system allowing one to highlight all selections in any view. Data clustering provides means for automatic detection and definition of data features by automatically classifying cells into groups of similar behavior, the clusters. Clusters, each defining a selection of cells, can be managed and visualized in the same way as user-defined cell selections. Visualization is used for validation and improvement of clustering results while clustering is used to analyze the data as well as to improve the visualization. For improvement of clustering results we have developed dedicated cluster post-processing techniques, such as splitting, merging and filtering of clusters based on spatial cell positions. e) Clustering-based False Coloring Using hierarchical clustering one can define a linear order of the cells. This linear order can be used as basis for false coloring of the data. By defining ranges in this linear cell order one can also easily define data features based on cell similarity. Data Clustering Cell Selector Management Data Selection Physical Views Abstract Views Cell Selector Statistics Post-Processing Visualization Data Clustering PointCloudXplore Clusters giant (gt) Krüppel (Kr) hunchback (hb) tailless (tll)