DynamicBLAST on SURAgrid: Overview, Update, and Demo John-Paul Robinson Enis Afgan and Purushotham Bangalore University of Alabama at Birmingham SURAgrid.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
1 A Common Application Platform (CAP) for SURAgrid -Mahantesh Halappanavar, John-Paul Robinson, Enis Afgane, Mary Fran Yafchalk and Purushotham Bangalore.
Validata Release Coordinator Accelerated application delivery through automated end-to-end release management.
SURAgrid Accounting Status September Accounting WG - Phase 1 Call for a working group at SURAgrid All-Hands March 07, to develop (at least) a.
RCAC Research Computing Presents: DiaGird Overview Tuesday, September 24, 2013.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
A Grid implementation of the sliding window algorithm for protein similarity searches facilitates whole proteome analysis on continuously updated databases.
Linux Platform  Download the source tar ball from the BLAST source code link  ncbi-blast src.tar.gz  Compilation  cd /BLASTdirectory/c++ ./configure.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
Parallelization and Grid Computing Thilo Kielmann Bioinformatics Data Analysis and Tools June 8th, 2006.
Cluster Computing Applications Project: Parallelizing BLAST The field of Bioinformatics needs faster string matching algorithms. What Exactly is BLAST?
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
How Parallelism Is Used In Bioinformatics Presented by: Laura L. Neureuter April 9, 2001 Using: Three Complimentary Approaches to Parallelization of Local.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
LIGO-G E ITR 2003 DMT Sub-Project John G. Zweizig LIGO/Caltech Argonne, May 10, 2004.
OnTimeMeasure Integration with Gush Prasad Calyam, Ph.D. (PI) Tony Zhu (Software Programmer) Alex Berryman (REU Student) GEC10 Selected.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Thursday, August 21, 2008 Cyberinfrastructure for Research Teams UAB High Performance Computing Services John-Paul Robinson.
Chao “Bill” Xie, Victor Bolet, Art Vandenberg Georgia State University, Atlanta, GA 30303, USA February 22/23, 2006 SURA, Washington DC Memory Efficient.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
INFSO-RI Module 01 ETICS Overview Alberto Di Meglio.
SG - OSG Improving Campus Research CI Through Leveraging and Integration: Developing a SURAgrid-OSG Collaboration John McGee, RENCI/OSG Engagement Coordinator.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
BOSCO Architecture Derek Weitzel University of Nebraska – Lincoln.
Module 2 Configuring Disks and Device Drivers. Module Overview Partitioning Disks in Windows® 7 Managing Disk Volumes Maintaining Disks in Windows 7 Installing.
Open Search Office Web Services Database Doc Mgt Sys Pipeline Index Geospatial Analysis Text Search Faceting Caching Query parsing Clustering Synonyms.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
INFSO-RI Module 01 ETICS Overview Etics Online Tutorial Marian ŻUREK Baltic Grid II Summer School Vilnius, 2-3 July 2009.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Internet2 Meeting 2006 UABgrid : A campus-wide distributed computational infrastructure University of Alabama at Birmingham UABgrid Architecture Team Jill.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
Week #3 Objectives Partition Disks in Windows® 7 Manage Disk Volumes Maintain Disks in Windows 7 Install and Configure Device Drivers.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Linking Research Data to Clinical Data – a Pilot The University of Alabama at Birmingham.
© 2007 UC Regents1 Track 1: Cluster and Grid Computing NBCR Summer Institute Session 1.1: Introduction to Cluster and Grid Computing July 31, 2007 Wilfred.
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
Using SWARM service to run a Grid based EST Sequence Assembly Karthik Narayan Primary Advisor : Dr. Geoffrey Fox 1.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Nguyen Tuan Anh. VN-Grid: Goals  Grid middleware (focus of this presentation)  Tuan Anh  Grid applications  Hoai.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Portal Update Plan Ashok Adiga (512)
Parallel Algorithm for Multiple Genome Alignment Using Multiple Clusters Nova Ahmed, Yi Pan, Art Vandenberg Georgia State University SURA Cyberinfrastructure.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Adrian Jackson, Stephen Booth EPCC Resource Usage Monitoring and Accounting.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
Shibboleth Use at the National e-Science Centre Hub Glasgow at collaborating institutions in the Shibboleth federation depending.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
GridWay Overview John-Paul Robinson University of Alabama at Birmingham SURAgrid All-Hands Meeting Washington, D.C. March 15, 2007.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Security in Research Computing John Sandefur UAB Comprehensive Cancer Center John-Paul Robinson UAB Research Computing.
Integrating Scientific Tools and Web Portals
Recap: introduction to e-science
GGUS Partnership between FZK and ASCC
MIK 2.1 DBNS - introduction to WS-PGRADE, 2013
Leigh Grundhoefer Indiana University
Introduction to Apache
Module 01 ETICS Overview ETICS Online Tutorials
Presentation transcript:

dynamicBLAST on SURAgrid: Overview, Update, and Demo John-Paul Robinson Enis Afgan and Purushotham Bangalore University of Alabama at Birmingham SURAgrid All-Hands Meeting March 15, 2007 Washington, DC

What is BLAST? BLAST stands for Basic Local Alignment Tool BLAST is search tool to find gene sequences in gene databases The search is really a statistical analysis of the “closeness” of a gene sequence, not a simple string pattern match There are many implementations NCBI (National Center for Biological Information) has much more information

How Do I Run BLAST? Find a gene sequence that interests you (research) Select a BLAST program that implements the search type you're interested in (nucleotide, protein) Select a sequence database for the organism of interest (yeast, E.coli,...) Begin the search Analyze the statistical results Repeat

BLAST Requirements BLAST application must be installed Many implementations NCBI BLAST packages for many common platforms Install target organism databases NCBI publishes many Size is 100's MB and growing Match your computer configuration to the application demands

How to Run BLAST Faster? Query Splitting Many queries as fast as possible Database reads affect speed Low IPC makes COTS clusters ideal Database Splitting One query as fast as possible Database chunks improve reads Much IPC for boundary condition on statistical analysis (MPI) SMP is best q1 d1d1 d1d1 d1d1 d1d1 d1d1 q3q2 q1 d

Can Grids make BLAST Faster? Focus on Query Splitting because of the low inter-process communication (IPC) Focus on WorkFlow management to maximize query throughput and resource utilization dynamicBLAST takes these approaches

dynamicBLAST architecture MySQL Perl scripts Java GT4 SGE BLAST Dynamic BLAST GT4 PBS BLAST GT4 N/A BLAST Web interface GridDeploy GridWay

dynamicBLAST Operation Accepts user query list Maintains resource & application configuration in local database Processes queries in a 3-stage workflow Uses GridWay to manage workflow and job distribution across grid nodes

dynamicBLAST workflow Data analysis File Parsing and Fragment creation Submit job Post-processing Jobs Job Monitor G J G J G J

Stage 1: Analysis Analyzes job processes user query list Helps with algorithm selection Determines granularity of splitting Uses GridDeploy to manage the analysis

Stage 2: Fragmentation Fragmentation job divides user query list Potential to divided query list based on resource capabilities Currently just divides query list across available resources

Stage 3: BLAST Searchs Master / Worker model Assigns queries to available resources All reporting resources used Loops through all queries until done Takes resources as they become available

Deployment on SURAgrid Incorporate ODU resource Mileva: 4-node cluster dedicated for SURAgrid purposes Local UAB BLAST web interface –Leverage existing UAB Identity Infrastructure –Cross certification with SURA Bridge CA Authorization on a per-user basis Initial Goals: –Solidify the execution requirements of DynamicBLAST –Perform scalability tests –Engage researchers further in the promise of grid computing

Demo drmaa]$./dblast51.sh [2007/03/15 08:27:08] Started Dynamic BLAST. [2007/03/15 08:27:08] Using job properties file: props.properties... [2007/03/15 08:27:08] Analisys job successfully submitted, job ID: 64 [2007/03/15 08:28:01] Job 64 finished regularly with exit status 0... [2007/03/15 08:28:01] Total number of fragments to be created: 10 [2007/03/15 08:28:01] Fragmentation job successfully submitted, ID: 65 [2007/03/15 08:29:03] Job 65 finished regularly with exit status 0... [2007/03/15 08:29:03] Gathering resource information... [2007/03/15 08:29:29] Succesfully accesed current resource data. availableResourceList=[[mileva.hpc.odu.edu, 4], [titanic.hpcl.cis.uab.edu, 4], [olympus.cis.uab.edu, 2], [everest.cis.uab.edu, 18]]... [2007/03/15 08:29:29] Remaining num jobs (frags) to submit: 10 [2007/03/15 08:29:29] Creating job(s) for mileva.hpc.odu.edu with 4 available node(s). [2007/03/15 08:29:29] Creating job(s) for titanic.hpcl.cis.uab.edu with 4 available node(s). [2007/03/15 08:29:29] Creating job(s) for olympus.cis.uab.edu with 2 available node(s). exit_time=08:30:37 Job usage (for jobs [66, 67, 68, 69]) on titanic.hpcl.cis.uab.edu:... ********************************** All fragments have been processed. **********************************...

dynamicBLAST performance Comparison of execution time between query splitting BLAST, Dynamic BLAST and mpiBLAST for searching 1,000 queries against yeast.nt database Comparison of execution time of Dynamic BLAST on variable number of nodes for 1,000 queries against yeast.nt database

dynamicBLAST Experiences Address how to split workflow across grid nodes Many jobs have distributable sub-tasks Manages coordination of dependencies BLAST tickles many problems of grid space Data distribution Coordinated Execution Holy Grail – MPI on grid :)

SURAgrid Experience Combine Local needs with regional needs No Need to Build Campus Grid in Isolation Synergistic Collaborations Leads to Solid Architectural Foundations Leads to Solid Documentation Extends Practical Experience for UAB Students

Next Steps Integrate dynamicBLAST with familiar BLAST web interface Integrate web interface with UABgrid 2.0 collaboration environment Leverage Shibboleth, GridShib, and myVocs to federate web inteface Extend the number of resources running BLAST ODU integration helps document steps and is model for additional sites Explore other applications relevant to UAB research community

References and Contacts Purushotham Bangalore, Computer and Information Sciences, UAB Enis Afgan, Computer and Information Sciences, UAB John-Paul Robinson, HPC Services, Office of the VP of IT, UAB Acknowlegements ODU for access to remote resources Mahantesh Halappanavar at ODU for his efforts to support dynamicBLAST on Mileva