Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks BiG: A Grid Service to Distribute Large BLAST.

Similar presentations


Presentation on theme: "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks BiG: A Grid Service to Distribute Large BLAST."— Presentation transcript:

1 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks BiG: A Grid Service to Distribute Large BLAST Runs Ignacio Blanquer Valencia University of Technology (Universidad Politécnica de Valencia)

2 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Contents Problem Addressed: BLAST. Requirements and Design Objectives. Architecture –Security. –Load Balancing. –Accessibility. Performance and Usage. Conclusions. 2 nd EGEE User Forum - Manchester 2

3 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 2 nd EGEE User Forum - Manchester BLAST BLAST (Basic Local Alignment Search Tool) is a Bioinformatics Procedure Applied to Identify Compatible Protein and Nucleotide Sequences in Protein and DNA Databases. BLAST can be Applied, Among Other Uses, to Annotate the Estimated Function of Unknown Sequences. BLAST is Computationally Intensive. 3

4 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Design Objectives and Requirements I Easy Interface with High Compatibility (Web Service + NCBI Based) –Same Parameters as BLAST. –User-friendly and Intuitive. Support to Searching Simultaneously on Multiple Databases –Parallel Process on Multiple Database Queries. Architecture Exportable to Other Common Problems –Modular Structure of the System Components. Secure and Efficient –Simple but Effective Control of Users. –No Exposure of Credentials. 2 nd EGEE User Forum - Manchester 4

5 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Design Objectives and Requirements II Scalability –Data Partition in Grid Approach Gives Scalability with Huge Quantities of Data. High Performance –Grid Computing + MPI Parallel Jobs in Dedicated Clusters. Robust –Fault Tolerance on Server and Client. Interoperable –Accessible Through Web Services, Stand-Alone Applications and Web Portals. Portable –Sessions Could Be Independent of the Server. 2 nd EGEE User Forum - Manchester 5

6 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Architecture FASTA File (Input Sequence) AGTACGTAGTAGCTGC TGCTACGTGGCTAGCT AGTACGTCAGACGTAG ATGCTAGCTGACTCGA FASTA File (Input Sequence) AGTACGTAGTAGCTGC TGCTACGTGGCTAGCT AGTACGTCAGACGTAG ATGCTAGCTGACTCGA Execution Parameters Execution Parameters Output Matches Xxxxx x x x x x xxx xx xxx x Output Matches Xxxxx x x x x x xxx xx xxx x Protein Database (Non Redunda nt e.g.) 2 nd EGEE User Forum - Manchester 6

7 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 BiG Security - Authentication Double-Credentials Level –Instead of Storing a Portal Certificate Private Key or Transferring User Private Keys (Even securely), a myProxy Certificate Server is Used. –Certificates in the MyProxy Server are Manually and Temporally Renewed (Planned Weekly) and Short-Time Certificates are Retrieved by the UI when Required. –This Enhances the Security and Does not Expose Credentials, Even in Secure Environments. 2 nd EGEE User Forum - Manchester 7

8 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 BiG Security - Authorisation Alternatives and Problems –Uploading VOMS Credentials: A myProxy Credential is Uploaded in a Proxy Server.  It is Suggested that VOMS Attributes Should be Added After the Retrieval of a Delegated Copy of the Proxy > It Does not Work.  VOMS Attributes Could not be Uploaded With Standard MyProxy Commands > Use an Updated Version from INFN. –VOMS Credentials Duration and Proxy Renewal: A Delegated myProxy Credential Needs to Be Renewed for a Long-Living Job.  It Does Not Work with VOMS Credentials. VOMS Life-Time is 24 Hours > Unsolved Problem for Long-living Executions.  Incorrect Configuration of Automatic Renewal on RBs. Proposals –Upload VOMS-Extended MyProxy Credentials and do not use Renewal. –Do not use VOMS if Automatic Renewal is Required. 2 nd EGEE User Forum - Manchester 8

9 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 BiG – Load Balancing BiG Provides a Grid Interface to MPIBlast –MPIBlast Scalability Depends Highly on the Efficiency of the MPI Version. –It is Configured in a Per-site Basis. –Value used Currently is 20 CPUs. –Databases are Pre-Distributed to Reduce Overhead. Larger Scalability is Managed Through Splitting the Input Sequences into Multiple Jobs –Multiple Parallel Jobs are Scheduled. –Embarassingly Parallel Approach. Multiple Databases can be Searched in Parallel –Directly Multiplies Performance. 2 nd EGEE User Forum - Manchester 9

10 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 BiG - Accessibility Users Access the System Through Stand-Alone Applications or Web Portals. Currently –BLAST2GO: www.blast2go.org.www.blast2go.org –Web from Cecalcula: http://portal-bio.ula.ve.http://portal-bio.ula.ve –Even CLI. 2 nd EGEE User Forum - Manchester 10

11 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 BiG: Usage Report Period: Jul’06-Dec’06. Usage Statistics: –Number of Jobs: 284. –CPU Consumed: 173 CPU/Days. –Resources Used: ramses.dsic.upv.es:2119/jobmanager-pbs- biomedg. –BiG is Being Used at the University of Los Andes to Work on the Complete Genome of the Plasmodiun Falciparum for the Identification of DHFR Antigenic Proteins. 0 20 40 60 80 100 120 140 dic-06nov-06oct-06sep-06ago-06jul-06 Cpu/hours Time#Jobs 2 nd EGEE User Forum - Manchester 11

12 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 BiG Usage Agreements Highest Difficulty is to Lead with the Quality of Service –Users Do not Understand Waiting Times and Impredictable Response Time. –Lack of MPI Resources Reduces the Availability of the System. Resources are Available for Short Executions (Below 15 Minutes in Total). Larger Executions Require Pre-Reservation of Resources –And Human Supervision due to Potential Unstability. –Users Negotiate the Experiments with the Resource Providers (UPV) A General Adoption of Such Mechanisms Inside the Infrastructure will be Necessary in the Long Term. 12 2 nd EGEE User Forum - Manchester

13 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 BiG Current Actions The work on BiG is Currently Focused on Three Areas –Improve Technical Issues  Better Management of Errors to Ease the Recovering on the Client Applications.  Migration of Sessions Among Different Portals.  Enhanced Robustness. –Foster the Usage  A New Portal is Being Developed.  New Users have been Identified in Computational Biology and Farmacoepidemiology. –Generalisation of the Service Model and Extension to Other Problems. 2 nd EGEE User Forum - Manchester 13

14 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Conclusions BiG is Service-Oriented, Being Interoperable with Many Application Models (Portals, Applications or Scripts). BiG is Intended for Processing Big Sets of Sequences, Although it Works Efficiently Even with Short Sequences. A Complete Genome Screening Implies Tens of Thousands of Sequences and Could Take More Than 30 Hours in a Conventional Computer. This is Done Periodically to Check the New Versions of the Target Databases. 2 nd EGEE User Forum - Manchester 14

15 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Contact Vicente Hernández / Ignacio Blanquer Universidad Politécnica de Valencia Camino de Vera s/n 46022 Valencia, Spain Tel: +34-963879743 Fax. +34-963877274 E-mail:vhernand@dsic.upv.esvhernand@dsic.upv.es iblanque@dsic.upv.es 15 2 nd EGEE User Forum - Manchester


Download ppt "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks BiG: A Grid Service to Distribute Large BLAST."

Similar presentations


Ads by Google