DACIDR for Gene Analysis

Slides:



Advertisements
Similar presentations
Scalable High Performance Dimension Reduction
Advertisements

Hidden Markov Model in Biological Sequence Analysis – Part 2
SALSA HPC Group School of Informatics and Computing Indiana University.
Analyzing large-scale cheminformatics and chemogenomics datasets through dimension reduction David J. Wild Assistant Professor & Director, Cheminformatics.
SCALABLE PARALLEL COMPUTING ON CLOUDS : EFFICIENT AND SCALABLE ARCHITECTURES TO PERFORM PLEASINGLY PARALLEL, MAPREDUCE AND ITERATIVE DATA INTENSIVE COMPUTATIONS.
Hybrid MapReduce Workflow Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US.
High Performance Dimension Reduction and Visualization for Large High-dimensional Data Analysis Jong Youl Choi, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox.
Optimal Sum of Pairs Multiple Sequence Alignment David Kelley.
Interpolative Multidimensional Scaling Techniques for the Identification of Clusters in Very Large Sequence Sets April 27, 2011.
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Protein Sequence Classification Using Neighbor-Joining Method
Multiple alignment: heuristics
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
Dimension Reduction and Visualization of Large High-Dimensional Data via Interpolation Seung-Hee Bae, Jong Youl Choi, Judy Qiu, and Geoffrey Fox School.
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Science in Clouds SALSA Team salsaweb/salsa Community Grids Laboratory, Digital Science Center Pervasive Technology Institute Indiana University.
Remarks on Big Data Clustering (and its visualization) Big Data and Extreme-scale Computing (BDEC) Charleston SC May Geoffrey Fox
Presenter: Yang Ruan Indiana University Bloomington
Yang Ruan PhD Candidate Computer Science Department Indiana University.
Harp: Collective Communication on Hadoop Bingjing Zhang, Yang Ruan, Judy Qiu.
Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.
S CALABLE H IGH P ERFORMANCE D IMENSION R EDUCTION Seung-Hee Bae.
SALSA HPC Group School of Informatics and Computing Indiana University.
Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a large-format printer. Customizing the Content: The placeholders in this.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Community Grids Lab. Indiana University, Bloomington Seung-Hee Bae.
Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm Seung-Hee Bae, Judy Qiu, and Geoffrey Fox SALSA group in Pervasive.
Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings, Trees and Sequences, chapter 14.
Deterministic Annealing Dimension Reduction and Biology Indiana University Environmental Genomics April Geoffrey.
Parallel Applications And Tools For Cloud Computing Environments SC 10 New Orleans, USA Nov 17, 2010.
SCALABLE AND ROBUST DIMENSION REDUCTION AND CLUSTERING
Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox
SALSA Group Research Activities April 27, Research Overview  MapReduce Runtime  Twister  Azure MapReduce  Dryad and Parallel Applications 
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Deterministic Annealing and Robust Scalable Data mining for the Data Deluge Petascale Data Analytics: Challenges, and Opportunities (PDAC-11) Workshop.
Yang Ruan PhD Candidate Salsahpc Group Community Grid Lab Indiana University.
Students Adda Zachary Deema Al Ghanim Horsley Jacqueline Sandrick Daniel Mentors Xiaoming Gao Xinjun Zhang Thilina Gunarathne Supervised by Dr.Judy Qiu.
Naifan Zhuang, Jun Ye, Kien A. Hua
Multiple sequence alignment (msa)
The ideal approach is simultaneous alignment and tree estimation.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Integration of Clustering and Multidimensional Scaling to Determine Phylogenetic Trees as Spherical Phylograms Visualized in 3 Dimensions  Introduction.
DA Algorithms Analytics February 2017 Software: MIDAS HPC-ABDS
Our Objectives Explore the applicability of Microsoft technologies to real world scientific domains with a focus on data intensive applications Expect.
I590 Data Science Curriculum August
Applying Twister to Scientific Applications
Data Science Curriculum March
Biology MDS and Clustering Results
Overview Identify similarities present in biological sequences and present them in a comprehensible manner to the biologists Objective Capturing Similarity.
SPIDAL and Deterministic Annealing
Adaptive Interpolation of Multidimensional Scaling
Parallel Applications And Tools For Cloud Computing Environments
SPIDAL Presentation December
Towards High Performance Data Analytics with Java
Multidimensional Scaling
Introduction to Bioinformatics
PHI Research in Digital Science Center
Multiple Sequence Alignment
Computational Genomics Lecture #3a
MDS and Visualization September Geoffrey Fox
Multiple Sequence Alignment
Presentation transcript:

DACIDR for Gene Analysis Deterministic Annealing Clustering and Interpolative Dimension Reduction Method (DACIDR) Use Hadoop for pleasingly parallel applications, and Twister (replacing by Yarn) for iterative MapReduce applications Pairwise Clustering All-Pair Sequence Alignment Streaming Visualization Multidimensional Scaling Simplified Flow Chart of DACIDR

PWA vs MSA Pairwise sequence alignment (PWA) is much faster and has very high correlation with multiple sequence alignment (MSA). The comparison using Mantel between distances generated by three sequence alignment methods and RAxML

Summarize a million Fungi Sequences Spherical Phylogram Visualization RAxML result visualized in FigTree. Spherical Phylogram visualized in PlotViz

MDS methods Sum of branch lengths will be lower if a better dimension reduction method is used. WDA-SMACOF finds global optima Sum of branch lengths of the SP generated in 3D space on 599nts dataset optimized with 454 sequences and 999nts dataset