Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel System for BLAST

Similar presentations


Presentation on theme: "Parallel System for BLAST"— Presentation transcript:

1 Parallel System for BLAST
Department of Computer Engineering Chulalongkorn University ประภาส จงสถิตย์วัฒนา วรินทร์ วัฒนพรพรหม

2 Introduction What is BLAST? BLAST – Basic Local Alignment Search Tools
matches between genetic materials or proteins Helps scientists to make inferences about the structures and functions of proteins screen new sequences for further investigation

3 Introduction Problem found
takes a very long time to search a large number of queries in a large database genome databases are enormous and doubling in size every 1.3 years

4 Background Knowledge DNA

5 Background Knowledge Gene Finding Search by contents Search by signal
Search by Sequence Similarity (BLAST)

6 Background Knowledge BLAST

7 Background Knowledge BLAST Algorithm

8 Background Knowledge BLAST Algorithm blastn -> Compare DNA bases
blastx -> Compare proteins for all 6 reading frames

9 Background Knowledge BLAST Algorithm
T T C T C G G A C C T G G A G A T T C A C A G T A A G A G C C T G G A C C T C T A A G T G T C A A A G A G C C T G G A C C T C T A A G T G T C T T T C T C G G A C C T G G A G A T T C A C A G T T T C T C G G A C C T G G A G A T T C A C A G U T T C T C G G A C C T G G A G A T T C A C A G U

10 Performance in memory search limited memory faster CPU

11 Cache memory small and fast to access memory embeded on the same chip as CPU store DATA and PROGRAM make CPU access DATA upto 10 times faster

12 Preliminary Study

13 Preliminary Study No. of Cut DB Part No. DB size (MB) Time used (min)
Time used to concat (min) Total time used (min) 1 234 70 2 109 17 38 125 20 3 100 16 1.5 38.5 15 34

14 Time(Original) Time(Improved)
Preliminary Study Speedup Improved System Time(Original) Time(Improved) SpeedUp = = 351/(180+5) = 1.9 times

15 Parallel BLAST separate Databases separate Queries hybrid

16 Parallel System for BLAST
Parallel Query Improving the performance of BLAST in a Memory Limited Environment

17 Parallel System for BLAST

18 Parallel System for BLAST
Considerations Compatibility Scalability Fault-tolerance Flexibility

19 Parallel System for BLAST
Processes Preprocessing Splitting database, Distribute, and create index for each of them Splitting queries Distributing Query Distribute the splited query to each node Postprocessing Merge the searched output together

20 Parallel System for BLAST
Data Structure ... : .:. DB1 DB2 DBn Max Clients Number of databases

21 Parallel System for BLAST
Data Structure Sequence1 Sequence2 : .:. Sequencen DB1 DB2 DBn Number of Sequences Number of databases

22 Parallel System for BLAST
BLASTServer

23 Parallel System for BLAST
BLASTClient

24 Parallel System for BLAST
-1 CID02 CID03 CID04 CID05 CID06 CID01 CID02 CID03 CID04 CID05 CID02 CID03 CID04 CID05 CID01 CID02 CID03 CID04 -1 CID02 CID03 CID04 CID05 -1 CID02 CID03 CID04 -1 CID04 CID02 CID05 CID03 CID06 CID01 CID02 CID03 CID01 CID06 CID02 CID03 CID04 CID05 CID01 CID02 BLASTClient no CID01-05 registers respectively BLASTClient no CID06 registers and was assign a task 2 1 4 3 BLASTServer assign tasks to each client Tasks for DB1 have been finished, BLASTClient no CID05 and CID06 was assign a new Database Each BLASTClient finish its task and request for a new task Tasks for DB2 and DB3 have been finished BLASTClient no CID01 disconnect from the System Each BLASTClient finish its task and request for a new task Seq1 -1 Seq2 Seq3 CID02 CID03 CID04 Seq4 Seq5 Seq6 Seq1 -1 Seq2 Seq3 Seq4 Seq5 CID04 Seq6 Seq1 -1 Seq2 Seq3 Seq4 Seq5 CID04 Seq6 CID02 Seq1 -1 Seq2 Seq3 CID05 CID02 CID03 CID04 Seq4 Seq5 CID06 Seq6 Seq1 -1 Seq2 Seq3 CID02 CID03 CID04 Seq4 CID05 CID06 Seq5 Seq6 Seq1 -1 Seq2 CID02 CID03 CID04 Seq3 Seq4 CID05 Seq5 Seq6 Seq1 -1 CID03 CID04 Seq2 CID05 CID02 Seq3 CID01 Seq4 Seq5 Seq6 Seq1 -1 CID02 CID03 CID04 Seq2 CID05 Seq3 CID01 Seq4 Seq5 Seq6 Seq5 Seq4 Seq6 Seq3 CID05 Seq2 CID04 CID03 CID02 CID01 Seq1 Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq1 -1 CID04 Seq2 CID05 CID02 CID03 Seq3 CID01 Seq4 Seq5 Seq6 Seq1 -1 Seq2 CID05 CID02 CID03 CID04 Seq3 CID01 Seq4 Seq5 Seq6 Seq1 -1 Seq2 CID04 Seq3 CID05 CID02 CID03 Seq4 Seq5 Seq6 Seq1 -1 Seq2 CID03 CID04 Seq3 CID05 CID02 Seq4 Seq5 Seq6 Seq1 -1 Seq2 CID03 CID04 Seq3 CID02 Seq4 CID05 Seq5 Seq6 Seq1 -1 Seq2 Seq3 CID05 CID02 CID03 CID04 Seq4 Seq5 Seq6 Seq1 -1 Seq2 CID02 CID03 CID04 Seq3 CID01 Seq4 CID05 Seq5 Seq6 2 1 4 3

25 Pre processing Time (min) Post processing Time (min)
Performance No of BlastClient Pre processing Time (min) Search Time (min) Post processing Time (min) Total Time (min) Speed up (times) 1 5 421 10 436 1.84 2 191 206 3.9 3 122 137 5.8 4 88 103 7.8 78 93 8.6 6 65 80 7 58 73 11 8 51 66 12

26 Performance

27 Performance

28 Performance

29 Performance

30 Performance

31 Conclusion The Superlinear speedup gain from the eliminating of the swapping time The performance of the system is limited due to the preprocessing and postprocessing still be sequential tasks.

32 Suggestions for further study
Should be able to support more BLAST algorithms and parameters Recovery system incase of BLASTServer or SharedStorage crashes Estimate the search time Improve the performance by eliminating the sequential tasks Dynamic database distribution during search

33 More Information Master thesis: Warin Wattanapornprom
A parallel processing system for a BLAST program Faculty of Engineering, Chulalongkorn University, 2002 ISBN

34 Thank you for your attention


Download ppt "Parallel System for BLAST"

Similar presentations


Ads by Google