Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown.

Similar presentations


Presentation on theme: "Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown."— Presentation transcript:

1 Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown

2 The Problem  An exhaustive search of proteins against a known database  Each string is between 400 and 600 characters long  Comparing a 10,000 strings against 1,000,000 random strings would take 44 days with a 1Ghz processor

3 Why an exhaustive search?  Initial intent was to analyze proteins to determine 3-Dimensional structure  Exhaustive search is required to ensure that the match that found is the best match

4 Solution  Distribute the search among many PCs to obtain an answer faster.  Solution raises more problems, however

5 How to Distribute  Distributing search strings is not enough  Also distribute the search space  Must find efficient way to distribute 1GB of data without duplication

6 Program Details  Client/Server architecture  Uses proprietary protocol over TCP/IP to distribute data  Server uses SQL database to store a list of ‘jobs’ and ‘known’ sequences

7 Program Details  Server issues a single ‘job’ to each client upon request  Client may also request a batch of data for comparison.  Server marks which data has been sent to clients and avoids resending that data to new clients

8 The Server

9 The Client

10 Problems  Server is slow at updating its database.  This is only seen once for each client however.

11 Performance Analysis  1 client = 44 days (best algorithm)  2 clients = 26 days  3 clients = 19 days  Adding more than 1 client increases time almost linearly, though distribution is expensive

12 Graphs

13 Graphs

14 Graphs

15 Graphs

16 Notes about the graphs  Graphs do not include initial distribution, since this is only done once per client  If search data distribution were to be included, efficiency would start at about 70% and increase to ~90% over time

17 Verifying Data Accuracy  Add an entry into the search table with a known score  Ensure that the result returned by the client is the known entry in the database

18 Lessons Learned  Reading and writing single elements to an SQL database can be very expensive.  Even the best designs aren’t perfect, especially when the problem is not fully understood.

19 Notes  Biggest problem was distribution of data  Distribution was very costly, so we tried to reuse data that we already distributed  Program is pluggable, so any comparison algorithm can be used

20 Question/Comments


Download ppt "Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown."

Similar presentations


Ads by Google