Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Study of GeneWise with the Drosophila Adh Region Asta Gindulyte CMSC 838 Presentation Authors: Yi Mo, Moira Regelson, and Mike Sievers Paracel Inc.,

Similar presentations


Presentation on theme: "A Study of GeneWise with the Drosophila Adh Region Asta Gindulyte CMSC 838 Presentation Authors: Yi Mo, Moira Regelson, and Mike Sievers Paracel Inc.,"— Presentation transcript:

1 A Study of GeneWise with the Drosophila Adh Region Asta Gindulyte CMSC 838 Presentation Authors: Yi Mo, Moira Regelson, and Mike Sievers Paracel Inc., Pasadena, CA

2 CMSC 838T – Presentation Motivation u Genome annotation  Extraction of biologically relevant knowledge from raw genomic sequence data u Need faster genome annotation methods  DNA sequences are very long (millions of nucleotides)  Current methods are computationally too expensive u Approach/Solution  GeneMatcher2 hardware acceleration of GeneWise

3 CMSC 838T – Presentation Outline u Motivation  Genome annotation u GeneMatcher2  Design  ASIC hardware u Comparison  GeneWise algorithm  HalfWise algorithm  Performance (time, precision) u Observations  Performance improvement  Cost effectiveness

4 CMSC 838T – Presentation Approach u Problem: make GeneWise run faster  “Embarassingly parallel” algorithm  Computationally too expensive when run in parallel on PC’s u Paracell’s solution: hardware acceleration  Don’t change the algorithm  Produce an implementation on the GeneMatcher2 supercomputer that works as much like the original software as possible  6LITE algorithm, now also in Wise2

5 CMSC 838T – Presentation GeneMatcher Architecture

6 CMSC 838T – Presentation ASIC Hardware u ASIC – application specific integration circuit  Designed to speed up dynamic programming algorithms l (could be used for Smith-Waterman)  Each ASIC board has 3072 processors  System has up to 9 boards  Cost per board around $40K

7 CMSC 838T – Presentation GeneWise Algorithm u Perform a search of genomic DNA sequence data using a protein HMM  Build HMMs from protein families  Scan genome using HMM l Look for start codon l “GT” sequence signals possible 5’ splice site l “AG” sequence signals possible 3’ splice site  Dynamic programming used in the scanning process l Obtain probability of the most likely path in HMM generating the sequence l Obtain alignment by backtracking

8 CMSC 838T – Presentation GeneWise model on GeneMatcher2

9 CMSC 838T – Presentation HalfWise Algorithm u Reduce cost by running BLAST to select HMMs with possible hits u Use these HMMs with GeneWise database search and sequence alignment algorithm u May miss some genes due to BLAST misses

10 CMSC 838T – Presentation Evaluation u Test data set  A genomic DNA sequence contig of about 2.9 Mb from the Drosophila Adh region  Focuss on finding all Pfam (Protein families database of alignments and HMMs) protein profile-HMMs that occur in the Adh genomic sequence

11 CMSC 838T – Presentation Evaluation: Speed

12 CMSC 838T – Presentation Evaluation: Score

13 CMSC 838T – Presentation Evaluation: Sensitivity and Specificity

14 CMSC 838T – Presentation Observations u Performance improvement  The speedup is several orders of magnitude. l Makes real target applications possible  Accuracy might be improved over HalfWise algorithm u Cost effectiveness  System used costs around $500K  500K worth Linux PC’s (500 processors at $1K each) would run about 10 times slower u Weaknesses  Cannot modify the algorithm  Not enough data to assess scalability


Download ppt "A Study of GeneWise with the Drosophila Adh Region Asta Gindulyte CMSC 838 Presentation Authors: Yi Mo, Moira Regelson, and Mike Sievers Paracel Inc.,"

Similar presentations


Ads by Google