Presentation is loading. Please wait.

Presentation is loading. Please wait.

Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Similar presentations


Presentation on theme: "Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang."— Presentation transcript:

1 Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang

2 What is Bioinformatics?  Genetic sequencing  Massive amounts of data  Many simple operations  Perfect for distributed computing

3 Problem  Current solutions are not realistically feasible Too expensive Too expensive ○ Super computers ○ High powered servers Too slow Too slow ○ Some inputs can takes several days  Need for high speed, low cost solutions

4 Our Solution  Cell Processor Based on Phase 1 Based on Phase 1  Cluster of PlayStation 3s  MPI Message Passing Interface Message Passing Interface

5 IBM Cell Broadband Engine  1 Power Processing Element (PPE)  8 Synergistic Processing Elements (SPEs) Only 6 SPEs are accessible on a PlayStation 3 Only 6 SPEs are accessible on a PlayStation 3  4 high speed rings for processor communication

6 DNAPenny  Compares DNA strands from different species  Score indicates evolution similarities between two species  Branch and bound search algorithm

7 Functional requirements  FR1. Ported applications shall run on the Cell B.E.  FR2. The results returned shall be the same as the original program.  FR3. The applications shall return their runtime.  FR4. The applications shall execute in parallel on multiple Cell B.E.s.

8 Non-Functional Requirements  NF1. The Cells shall all run on the Linux OS.  NF2. The resulting runtimes of the ported applications shall be faster than on the original applications.  NF3. The ported application shall be coded in the C language.

9 Market Survey  Results of the survey point to a huge speed up of computationally intensive programs.  Dr. Gaurav Khanna at the University of Massachusetts Dartmouth used cluster of 8 PS3s to replace a supercomputer.  Universitat Pompeu Fabra, in Barcelona, deployed in 2007 a BOINC system called PS3GRID for collaborative biological computing.

10 Risk Assessment  Slow network speed  Software support  Limited RAM  Hardware Failure Lower quality entertainment hardware Lower quality entertainment hardware  Limited prior experience  Software development schedule

11 Resource Requirements  3 PlayStation 3s  High performance network switch  Cell programming books  Front node (desktop computer)  Time

12 Software Environment  Use Fedora 9 OS as it is currently supported by the Cell SDK 3.1.  Uses the command line for user interface.  Use the IBM XLC compiler and/or the current GCC compiler.

13 Hardware Environment  3 PlayStation 3s  High speed Crossbar switch  Private network  Front Node (desktop computer) Proxy server Proxy server Network File Store (NFS) Network File Store (NFS)

14 I/O  Input Inputs are DNA sequences stored in a text file. Inputs are DNA sequences stored in a text file. Text is a CustalW alignment organized in Phylip format, a standard format for biological applications. Text is a CustalW alignment organized in Phylip format, a standard format for biological applications.  Output Outputs are Outputs are ○ The parsimony score ○ The best trees ○ The execution time The score and best trees are output to the screen and to text files. The score and best trees are output to the screen and to text files. The execution time is output to a CSV (Comma Separated Value) file. The execution time is output to a CSV (Comma Separated Value) file.

15 Work Breakdown Structure Port Apps to Cluster PS3s Problem DefinitionResearch Cell/B.E Research Bioperf Suite Research Distributed Parallel Algorithms Research Previously Done Work End Product Design Design Requirements Design ProcessDesign Documents Considerations and Selections Decide Which Linux to Install Decide which applications to port End Product Implementation Hardware Implementation Prototyping Implementation Software Implementation End Product Testing Ensure Correctness of Output Results Benchmarking Final Documentation and Demonstration Create Final Report Create Project Poster Prepare for Presentation

16 Work Schedule  Gant chart

17 Deliverables  Source Code  Compiled Executable  Runtime Comparisons  Final Report  Poster  Final Presentation

18 Costs  Time Approximately 555 man hours total. Approximately 555 man hours total. Freely donated. Freely donated. Total cost $0.  Equipment 3 PlayStation 3s 3 PlayStation 3s ○ Provided by client Crossbar router Crossbar router ○ Provided by client Standard desktop computer Standard desktop computer ○ Provided by department Total cost $0.

19 Development: Initial Overview  Use MPI to distribute the program to the multiple PlayStations.  Each PlayStation would search one branch of the tree.  1 function (supplement) took 90% of the runtime Phase 1 ported this function to the SPEs Phase 1 ported this function to the SPEs

20 Development: Difficulties  Found a bug in supplement.  The bug did not affect results but did affect runtime.  We contacted the original developer, Dr. Felsenstein at the University of Washington, who fixed the bug.  The fix significantly improved runtime.  However, the fix negated all work done by Phase 1 as supplement no longer took a significant amount of runtime.

21 Development: Reworking  After the bug fix, no single function took a significant amount of runtime.  We decided to distribute branches of the tree search to different processors.

22 Development: Results  Completed our goals Divided work among 3 PlayStation 3s. Divided work among 3 PlayStation 3s. Produced faster code that comparable sequential environment. Produced faster code that comparable sequential environment.  Due to time constraints, we were not able to port the code to the SPEs.

23 Testing  Used script to test multiple inputs.  Averaged the runtimes.  Used several different code revisions and machines to provide comparisons.  Projected the speedup that could be attained if code was ported to SPEs.

24 Results: Actual  Our current code is 20.76 times faster than the it was at the beginning of the semester.  Surpassed our original projections, which assumed the use of the SPEs. Code revision Runtime (sec) X Speedup (compared to desktop) # of available cores Original (Core2) 1861.660.1161 With Bug Fixes (Core 2) 218.811 Original (1 PPE) 4953.5110.0441 With Bug Fixes (1 PPE) 662.70 7.470.3301 MPI with Bug Fixes (3 PPEs) 238.57 20.760.9173 MPI with Bug Fixes (3 PPEs, 18 SPEs) (Projected) 34.08145.356.42021 Original Projections 334.8214.790.65321

25 Results: MPI  The speedup for MPI was 2.78.  Excellent speedup for 3 nodes. Code revision Runtime (sec) X Speedup (compared to desktop) # of available cores Original (Core2) 1861.660.1161 With Bug Fixes (Core 2) 218.811 Original (1 PPE) 4953.5110.0441 With Bug Fixes (1 PPE) 662.70 7.470.3301 MPI with Bug Fixes (3 PPEs) 238.57 20.760.9173 MPI with Bug Fixes (3 PPEs, 18 SPEs) (Projected) 34.08145.356.42021 Original Projections 334.8214.790.65321

26 Results: Comparison  Our final code came close to a high powered desktop. Core 2 Quad at 2.66 GHz Core 2 Quad at 2.66 GHz  Our projected results indicate a speedup of 6.4. Code revision Runtime (sec) X Speedup (compared to desktop) # of available cores Original (Core2) 1861.660.1161 With Bug Fixes (Core 2) 218.811 Original (1 PPE) 4953.5110.0441 With Bug Fixes (1 PPE) 662.70 7.470.3301 MPI with Bug Fixes (3 PPEs) 238.57 20.760.9173 MPI with Bug Fixes (3 PPEs, 18 SPEs) (Projected) 34.08145.356.42021 Original Projections 334.8214.790.65321

27 Results: Projected  Using all SPEs, the speedup should be 145.35 Assuming SPEs run as fast as the PPEs Assuming SPEs run as fast as the PPEs ○ Before SPE vectorization Code revision Runtime (sec) X Speedup (compared to desktop) # of available cores Original (Core2) 1861.660.1161 With Bug Fixes (Core 2) 218.811 Original (1 PPE) 4953.5110.0441 With Bug Fixes (1 PPE) 662.70 7.470.3301 MPI with Bug Fixes (3 PPEs) 238.57 20.760.9173 MPI with Bug Fixes (3 PPEs, 18 SPEs) (Projected) 34.08145.356.42021 Original Projections 334.8214.790.65321

28 Conclusions  Achieved our goal of using MPI to get runtime improvement.  Contributed a major fix to a widely used application.  Surpassed our initial runtime goal.  Projected results show an even larger runtime improvement still possible.

29 Acknowledgements  May08-24 group (phase I) Kyle Byerly Kyle Byerly Shannon McCormick Shannon McCormick Matt Rohlf Matt Rohlf Bryan Venteicher Bryan Venteicher  DNAPenny Author Dr. Felsenstein Dr. Felsenstein  Advisor Zhao Zhang Zhao Zhang  Environment Help Steve Nystrom Steve Nystrom

30 Questions?


Download ppt "Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang."

Similar presentations


Ads by Google