Massively Parallel Solutions for Molecular Sequence Analysis Bertil Schmidt School of Computer Engineering, Nanyang Technological University, Singapore.

Massively Parallel Solutions for Molecular Sequence Analysis Bertil Schmidt School of Computer Engineering, Nanyang Technological University, Singapore Heiko Schröder School of Computer Science and Information Technology, RMIT University, Melbourme, Australia Manfred Schimmler Institut für Datentechnik und Kommunikationsnetze, TU Braunschweig, Germany

Contents zMotivation zSmith-Waterman Algorithm zParallelization on the Hybrid Architecture zParallelization on the Fuzion 150 zPerformance Evaluation zConclusion and Future Work

zGenetic sequence databases are growing exponentially zGrowth rate will continue, since multiple concurrent genome projects have begun, with more to come Motivation

zDiscovered sequences are analyzed by comparison with databases zComplexity of sequence comparison is proportional to the product of query size times database size Analysis too slow on sequential computers z  Analysis too slow on sequential computers zTwo possible approaches yHeuristics yHeuristics, e.g. BLAST,FastA, but the more efficient the heuristics, the worse the quality of the results yParallel Processing yParallel Processing, get high-quality results in reasonable time

Full Genome Comparison zrelated Organisms, but Tuberculosis causes a disease  find common and different parts z16  10 6 pairwise sequence comparisons zMany Genome-Genome Comparisons will be required in the near future 3918 Protein Sequences 1.329.298 AminoAcids 4289 Protein Sequences 1.359.008 AminoAcids

Protein Sequence Alignment zBLAST, FastA, Smith-Waterman GGHSRLILSQLGEEG.RLLAIDRDPQAIAVAKT....IDDPRFSII GGHAERFL.E.GLPGLRLIGLDRDPTALDVARSRLVRFAD.RLTLV |||::::| : |::| ||:::||||:|:|||:: ::| |:::: BLAST FastA Smith- Waterman Slower Faster Search Speed Data Quality LowerHigher

Smith-Waterman Algorithm zOptimal local alignment of two sequences zPerforms an exhaustive search for the optimal local alignment yComplexity O(n  m) for sequence lengths n and m zBased on the 'dynamic programming' (DP) algorithm yFill the DP matrix using a substitution (mutation) matrix yFind the maximal value (score) in the matrix yTrace back from the score until a 0 value is reached

Smith-Waterman Algorithm zAligning S1 and S2 of length n and m using Recurrences: zCalculate three possible ways to extend the alignment yby one AminoAcid (AA) in each sequence yby one AA in the first sequence and align it with a gap in the second yby one AA in the second sequence and align it with a gap in the first

Smith-Waterman Algorithm ATCTCGTATGATGGTCTATCAC Align S1=ATCTCGTATGATG S2=GTCTATCAC  G T C T A T C A C  ATCTCGTATGATG 000002100210 0 0 0 0 0 0 0 0 0 0 0000000000000 2 0212114321132 0 0 2 1 0 2 1 1 2 2 4 3 2 1 4 3 2 3 6 5 4 3 6 5 4 5 5 4 4 5 5 4 6 5 7 3 4 4 4 5 5 6 3 5 4 6 5 4 5 3 4 7 5 5 7 6 2 5 6 9 8 7 6 1 4 5 8 8 7 6 0 3 6 7 7 10 9 2 2 5 8 7 9 9 2 1 4 7 7 8 8 8 9 7 5 34 2 0  =1,  =1 A T C T C G T A T G A T G G T C T A T C A C G T C  T A T C A C

Parallel Architectures for Bioinformatics zEmbedded Massively Parallel Accelerators ySystola 1024: PC add- on board with 1024 processors (ISATEC, Germany) yFuzion 150: 1536 processors on a single chip (Clearspeed Technology, UK)

Parallel Architectures for Bioinformatics High speed Myrinet switch Systola 1024 ySupercomputer performance at low cost Hybrid Computer ycombines SIMD and MIMD paradigm within a parallel architecture  Hybrid Computer

Previous Applications zScientific Computing zVolume Visualization zAutomatic Visual Quality Control zCryptography zComputer Tomography zVideo Compression zRange of Transforms (Fourier, Wavelet, Hough, Radon) zComputer Graphics

Architecture of Systola 1024 zInstruction Systolic Array: y32  32 mesh of processing elements ywavefront instruction execution

14 Instruction Systolic Array + row selectors column selectors instructions * - + - * - + * + + * - + + * * +- + + * - + * + * + * - ++ * * -* - + + * + * - - - + * + * - + * - - zwavefront instruction execution  fast accumulation operations (e.g. row sum, broadcast, ringshift)

Parallelization of Smith- Waterman zmatrix cells along a single diagonal are computed in parallel zcomparison is performed in A+B  1 steps on A PEs  G T C T A T C A C  ATCTCGTATGATG 000002100210 0 0 0 0 0 0 0 0 0 0 0000000000000 2 0212114321132 0 0 2 1 0 2 1 1 2 2 4 3 2 1 4 3 2 3 6 5 4 3 6 5 4 5 5 4 4 5 5 4 6 5 7 3 4 4 4 5 5 6 3 5 4 6 5 4 5 3 4 7 5 5 7 6 2 5 6 9 8 7 6 1 4 5 8 8 7 6 0 3 6 7 7 10 9 2 2 5 8 7 9 9 2 1 4 7 7 8 8 0 000 0 2 0 0 1 1 4 2 2 2 0 3 2 1 3 2 1 5 2 4 3 B A P1P1 P2P2 P 13

Mapping onto Systola 1024 a 30 a 31 a0a0 a 63 a 62 a 32 a 992 a 1022 a 1023 b k ….b 1 b 0 …c 1 c 0 X b b: subject sequence a a: query sequence (equal to 1024) zSubject sequences can be pipelined with only 1 step delay  k steps for subject sequence of length k zEfficient routing on the ISA: Row Ringshift and Broadcast

Performance Evaluation zScan times in seconds for TrEMBL 14 (351’834 Protein Sequences) for various query sequence lengths Query sequence length256512102420484096 Systola 1024 speedup to PIII 850 294 5 577 6 1137 6 2241 6 4611 6 Cluster of 16 Systolas speedup to PIII 850 20 81 38 86 73 91 142 94 290 94 zParallel implementation scales linearly with sequence length and number of PCs zComputing time dominates data transfer time

Fuzion 150 Architecture z0.25-  m, single-chip, SIMD architecture z1536 PEs @ 200 MHz  300 GOPS z600 GB/s on-chip, 6.4 GB/s off-chip bandwidth zMultithreading (control units interact via semaphores) zdeveloped by Clearspeed Technology (UK) for graphics, networking processing Linear SIMD Array 1536 PEs each with 2 Kbytes DRAM Linear SIMD Array 1536 PEs each with 2 Kbytes DRAM FUZION Bus 32-bit EPU (ARC) 32-bit EPU (ARC) Video I/O Video I/O Display Instruction Fetch SIMD Controller Local Memory Local Memory 1,2 or 4 Channels (6.4 GB/s) Host AGPRambus

Fuzion 150 Architecture PE (0,0) PE (0,1) PE (0,255) Fuzion Bus PE (1,0) PE (1,1) PE (1,255) PE (5,0) PE (5,1) PE (5,255) Local Memory Local Memory Block 5 Block 1 Block 0 ALU (8 bits) Register file 32 Bytes PE Memory 2 KByte DRAM Right PE Instructions Block I/O Channel Left PE

Mapping onto the Fuzion 150 Block 5 Block 1 Block 0 b b: subject sequence b k ….b 1 b 0 a1a1 a0a0 a 255 a 511 a 510 a 256 a 1280 a 1534 a 1535 a a: query sequence (equal to 1536) …c 1 c 0 X zNo fast global communication  2-step local communication zSubject sequence can be pipelined with only step delay

Mapping onto the Fuzion 150 zReduce communication time yAssign 16 AAs to each PE  query lengths up to 24576 AAs can be processed within a single pass zPartitioning for query lengths <24576: yeach subarray of corresponding size computes the alignment of the same query sequence with different subject sequences

Performance Evaluation zScan times in seconds for TrEMBL 14 (351’834 Protein Sequences) for various query sequence lengths Query sequence length256512102420484096 Fuzion 150 speedup to PIII 850 12 136 22 151 42 157 82 163 162 165 zParallel implementation scales linearly with sequence length zComputing time dominates data transfer time

Performance Evaluation zNormalized time Comparison for a 10 Mbase search on different parallel architectures with different query length z4  faster than 16K-PE MasPar z6  faster than Kestrel z5  faster than SAMBA (special-purpose 3-board architecture)

Performance Evaluation for Full Genome Comparison zScan times for pairwise protein sequence comparison of Mycobacterium Tuberculosis and Escherichia Coli Cluster of Systola 1024 speedup to PIII 850 17 min 79 Fuzion 150 speedup to PIII 850 11 min 133 zComparison has to be performed for several parameters (Substitution matrices, gap penalties) zMycobacterium Smegmatis will be published later this year zResults of the comparison will be interpreted with the Centre for Molecular Cell Biology, NUS, Singapore

Conclusions and Future Work zDemonstrated how fine-grained parallel architectures can be applied efficiently for Comparative Genomics zSignificant runtime savings for genome comparisons and database searching  More Discovery Is Possible at a good price-performance ratio zOther Computational Biology applications of interest to us: yClustalW yHMM ypattern matching algorithms, such as inverted repeats, short tandem repeats, etc zAvailability of accelerators as a special-resource in a Grid Environment

Contents zProtein Structure zProtein Structure Prediction zApproach based on Local Protein Structure zRefinements zConclusions and Future Work

Protein Structure zProteins are large molecules composed of smaller molecules called amino acids zThere are 20 kinds of amino acids found in natural proteins zAll share a common structure R side chain carboxyl groupamine group alpha carbon (with attached hydrogen)

Protein Structure

From Primary to Tertiary Structure zA protein’s 3D shape is determined by its primary amino acid sequence (Anfinsen, 1963) zPredicting tertiary structure from amino acid sequence is an unsolved problem yDifficult to model the energies that stabilize a protein molecule yConformational search space is enormous

Prediction Methods zGiven an amino acid sequence: ysearch a set of known folds by aligning sequence and a template fold representative ypredict the fold that gets the best scoring alignment Target amino acid sequence Template Fold library YLAADTYK Template amino acid sequence FISSETCNMEPSSYVTGLIRKN Target/template Score: 7212

Prediction Methods zThis method is very effective when target and template have >30% sequence identity zApproximately 1/3 of protein sequences can be assigned folds and modeled this way zOur aim is to contribute to determine tertiary structures in case matching sequences cannot be found

Local structure and prediction zWhat is Local structure ? ydescribes environment of an amino acid yan amino acid’s relationship to neighbors zwe use this information to predict structure from primary sequence

Dihedral Angles zThe 6 atoms in each peptide unit lie in the same plane   and  free to rotate  The structure of a protein is almost totally determined, if all angles  and  are known

Idea of our Approach zStiff  free zlocal predictability  database of sub-chain structures zreduction of the number of degrees of freedom by 10, reduces the computation time significantly in combination with a global optimization algorithm (e.g. GA or SA) Side chains Back bone CC  and  C N

Classification of Dihedral Angles Selected PDB structures Dihedral angle extraction Histogram for each amino acids pair stiff multiple flexible

Classification of Dihedral Angles  ALA-ALA Frequency   LEU-ARG Frequency   GLY-ILE Stiff multiple flexible

Classification of Dihedral Angles Selected PDB structures Dihedral angle extraction Histogram for each amino acids pair stiff multiple flexible zStiff angles: determine mean value zMultiple angles: determine sequence of mean values, one for each peak in decreasing order of these peaks zFlexible angles: determine mean value and mark as flexible

Prediction based on Classification zGiven a sequence of amino acids, find the subsequence in which all angles are of type stiff zpredict structure of these subsequences, using the mean values of the corresponding histograms

Prediction based on Classification zPart of a protein predicted with this method (backbone of a helix, original structure on the left, predicted structure on the right) zSuccessfully predicted certain stiff structures of subsequences up to the length of 15

Refinement of the method zFor multiple angles: yconsider sequences of length 3 or 4:  extract sequences (C,A,B,D) and determine the histogram of angles  and  related to the peptide chain between A and B  if histogram for  for amino acids (A,B) is multiple, check if angle for (A,B,C,D) is stiff xwith longer subsequences the occurrences of these sequences drops dramatically

Refinement of the method zFor multiple angles: yif an amino acid sequence has only a small number of multiple edges, it is possible to try all combinations of possible peaks ymany combinations lead to collisions in part of the protein, and thus can be eliminated

Conclusion and Future Work zPresented a method to predict stiff structures of subsequences up to the certain length zPresented a refinement of the method to handle multiple angles zhow to handle flexible angles ? zUsing the local prediction as an input for a global optimization method, e.g. based on Simulated Annealing

Massively Parallel Solutions for Molecular Sequence Analysis Bertil Schmidt School of Computer Engineering, Nanyang Technological University, Singapore.

Similar presentations

Presentation on theme: "Massively Parallel Solutions for Molecular Sequence Analysis Bertil Schmidt School of Computer Engineering, Nanyang Technological University, Singapore."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Massively Parallel Solutions for Molecular Sequence Analysis Bertil Schmidt School of Computer Engineering, Nanyang Technological University, Singapore.

Similar presentations

Presentation on theme: "Massively Parallel Solutions for Molecular Sequence Analysis Bertil Schmidt School of Computer Engineering, Nanyang Technological University, Singapore."— Presentation transcript:

Similar presentations

About project

Feedback