1 Bio-Sequence Analysis with Cradle’s 3SoC™ Software Scalable System on Chip Xiandong Meng, Vipin Chaudhary Parallel and Distributed Computing Lab Wayne.

Slides:



Advertisements
Similar presentations
Improved Alignment of Protein Sequences Based on Common Parts David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.
Advertisements

Massively Parallel Computing for Protein Alignment Bertil Schmidt School of Computer Engineering Nanyang Technological University Singapore.
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
6/2/20151 Bioinformatics & Parallel Computing Jessica Chiang.
Structural bioinformatics
Sequence Similarity Searching Class 4 March 2010.
©CMBI 2005 Sequence Alignment In phylogeny one wants to line up residues that came from a common ancestor. For information transfer one wants to line up.
1-month Practical Course Genome Analysis (Integrative Bioinformatics & Genomics) Lecture 3: Pair-wise alignment Centre for Integrative Bioinformatics VU.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 18: Application-Driven Hardware Acceleration (4/4)
A Parallel Solution to Global Sequence Comparisons CSC 583 – Parallel Programming By: Nnamdi Ihuegbu 12/19/03.
From Pairwise Alignment to Database Similarity Search.
Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
? Peter Smooker, Heiko Schröder, Margaret Hamilton, Aditya, Mannan, Sundara, Saravanan, Rajalingam Aravinthan, Gad Abraham, Abdullah Al Amin, Nalinda,
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Parallel Computation in Biological Sequence Analysis: ParAlign & TurboBLAST Larissa Smelkov.
Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.
04/23/2003 Massively Parallel Solutions for Molecular Sequence Analysis Prabhakar R. Gudla CMSC 838T Presentation.
Algorithms Dr. Nancy Warter-Perez June 19, May 20, 2003 Developing Pairwise Sequence Alignment Algorithms2 Outline Programming workshop 2 solutions.
Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002.
A Study of Computational Methods for Storing and Sequencing Genetic Databases CSC 545 – Advanced Database Systems By: Nnamdi Ihuegbu 12/2/03.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Protein Sequence Comparison Patrice Koehl
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
A Study of GeneWise with the Drosophila Adh Region Asta Gindulyte CMSC 838 Presentation Authors: Yi Mo, Moira Regelson, and Mike Sievers Paracel Inc.,
From Pairwise Alignment to Database Similarity Search.
LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003.
Sequence comparison: Local alignment
Heuristic methods for sequence alignment in practice Sushmita Roy BMI/CS 576 Sushmita Roy Sep 27 th,
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Speed Up DNA Sequence Database Search and Alignment by Methods of DSP
Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Developing Pairwise Sequence Alignment Algorithms
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
SISAP’08 – Approximate Similarity Search in Genomic Sequence Databases using Landmark-Guided Embedding Ahmet Sacan and I. Hakki Toroslu
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
11 Overview Paracel GeneMatcher2. 22 GeneMatcher2 The GeneMatcher system comprises of hardware and software components that significantly accelerate a.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
1 Data structure:Lookup Table Application:BLAST. 2 The Look-up Table Data Structure A k-mer is a string of length k. A lookup table is a table of size.
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
PatternHunter II: Highly Sensitive and Fast Homology Search Bioinformatics and Computational Molecular Biology (Fall 2005): Representation R 林語君.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Rationale for searching sequence databases June 25, 2003 Writing projects due July 11 Learning objectives- FASTA and BLAST programs. Psi-Blast Workshop-Use.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Accelerating Bioinformatics Algorithms with Reconfigurable Computing Presentation to MAPLD Conference September 2004.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Construction of Substitution matrices
2016/1/27Summer Course1 Pattern Search Problems Part I: Fundament Concept.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
Introduction to Sequence Alignment. Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
INTRODUCTION TO BIOINFORMATICS
Sequence comparison: Local alignment
Sequence comparison: Significance of similarity scores
Pairwise Sequence Alignment
Parallel System for BLAST
Basic Local Alignment Search Tool (BLAST)
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

1 Bio-Sequence Analysis with Cradle’s 3SoC™ Software Scalable System on Chip Xiandong Meng, Vipin Chaudhary Parallel and Distributed Computing Lab Wayne State University Detroit, MI USA

2 Outline Motivation Smith-Waterman Algorithm Related Works Parallel Architecture of 3Soc Chip Implementation Strategy on 3SoC chip Performance Evaluation Future works References

3 z Genetic sequence databases are growing exponentially z Discovered sequences are analyzed by comparison with databases z Complexity of sequence comparison is proportional to the product of query size times database size Analysis too slow on sequential computers z  Analysis too slow on sequential computers z Two possible approaches Heuristics Heuristics, e.g. BLAST, FastA, Exhaustive, e.g. Smith-Waterman Exhaustive, e.g. Smith-Waterman, z The implementation of the most sensitive algorithms, Smith-Waterman algorithm on the 3SOC parallel chip multiprocessor architecture could provide high quality and performance at low cost. Motivation

4 Sequence Alignment Comparison BLAST FastA Smith- Waterman Slower Search Speed Faster LowerData QualityHigher Heuristics Heuristics, e.g. BLAST, FastA, but the more efficient the heuristics, the worse the quality of the results Exhaustive, e.g. Smith-Waterman Exhaustive, e.g. Smith-Waterman, get high-quality results but long computation time CCA – CGAAGCTTGGCTGGAACAGGACTTCTG - GG : : : : : : : : : : : : : : : : : : : : : : : CCAGCC AAGCTTCGTGGGCA -AGGAGGCCAGCGG

5 Smith-Waterman Algorithm  Optimal local alignment of two sequences  Performs an exhaustive search for the optimal local alignment –Complexity O(n  m) for sequence lengths n and m  Based on the 'dynamic programming' (DP) algorithm –Fill the DP matrix –Find the maximal value (score) in the matrix –Trace back from the score until a 0 value is reached

6 Smith-Waterman Algorithm (cont.) Optimal local alignment of two sequences Performs an exhaustive search for the optimal local alignment Based on the 'dynamic programming' (DP) algorithm

7 Smith-Waterman Algorithm (Example) ATCTCGTATGATGGTCTATCAC Align S1=ATCTCGTATGATG S2=GTCTATCAC  G T C T A T C A C  ATCTCGTATGATG  =1,  =1 A T C T C G T A T G A T G A T C T C G T A T G A T G G T C T A T C A C G T C  T A T C A C

8 Data dependencies in S-W Algorithm Computational dependencies in the Smith-Waterman alignment matrix. a1 a2 a3 a4 Database Sequence b1 b2 b3 b4 b5 b6

9 Related Works Kestrel parallel processor 512-Processing Elements board at UC at Santa Cruz Field Programmable Gate Array PCI board with one 144-PE FPGA at University of Tsukuba Fuzion PE on a single chip at Nanyang Technical University

10 Architecture of 3SOC chip Characteristics of 3SoC Quads Quad is the primary unit of replication for 3SoC. A 3SoC chip has one or more Quads. PEs Each PE is a 32-bit processor with 16-bit instructions and thirty- two 32-bit registers. The PE is rated at approximately 90 MIPS. DSEs Each DSE is a 32-bit processor with 128 registers. The DSE is the primary compute engine of the UMS and is rated at approximately 350 MIPs for integer or floating point performance.

11 Protein Sequence search on 3SoC chips Compare the Query Sequence with Well-Known Sequence in Sequence Database Do the S-W sequence alignment Output Results Input DNA/Protein Query Sequence 3SoC Chips

12 Implementation Strategy on 3SoC DSE2 DB2 DSE24 DB24 DSE23 DB23 DSE1 DB1 12 PEs Query Sequences Output results

13 Implementation Details of S-W Algorithm PE MTE Buffer 1 DSE 1DSE 2 Buffer 2 Buffer 1 Query Seq. DB Seq1 DB Seq2 DRAM Inside chip Outside Chip

14 Performance Analysis

15 Demonstrated that 3SOC chip multiprocessor architecture can be applied efficiently for Comparative Genomics. Optimize implementation. Apply the next generation 3SOC Architecture on High Performance Embedded Systems for Bioinformatics computing. Conclusions and Future Work

16 References 1.Smith, T.F. AND Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol., 147, (1981), Needleman, S. and Wunsch, C. A general method applicable to the search for similarities in the amino acid sequence of two sequences.. J. Mol. Biol., 48(3), (1970), Hughey, R. Parallel hardware for sequence comparison and alignment. (1996) Comput. Appl. Biosci. 12, Schmidt, B., Schroder, H. and Schimmler, M. Massively Parallel Solutions for Molecular Sequence Analysis, International Parallel and Distributed Processing Symposium: IPDPS Workshops (2002), p Yamagucchi Y., Maruyama, T. High Speed Homology Search with FPGA. Pacific Symposium on Biocomputing Grate, L., Diekhan, M., Dahle, D. and Hughey, H. Sequence Analysis With the Kestrel SIMD parallel Processor.Pacific Symposium on Biocomputing 2001 pp

17 Questions?