Finding Discriminating DNA Probe Sequences by Implementing a Parallelized Solution in a Cluster REU Camilo A. Silva Professor and Advisor: Dr. S. Masoud.

Slides:



Advertisements
Similar presentations
IT Technical Support South Nottingham College. Aims Knowledge of the Registry Discuss the tools available to support a technician Gain an understanding.
Advertisements

Goal Setting Learning to Work Efficiently and Effectively.
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Does anything Emerge? CSS Interaction Task A review on the state of the art in the understanding, modelling and formal description of emergence.
Lectures on File Management
MapReduce.
Practical techniques & Examples
Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,
Chapter 1 An Overview of Computers and Programming Languages.
LECTURE 1 CMSC 201. Overview Goal: Problem solving and algorithm development. Learn to program in Python. Algorithm - a set of unambiguous and ordered.
MPI Program Structure Self Test with solution. Self Test 1.How would you modify "Hello World" so that only even-numbered processors print the greeting.
Reference: Message Passing Fundamentals.
CSCD 555 Research Methods for Computer Science
CIS101 Introduction to Computing Week 11. Agenda Your questions Copy and Paste Assignment Practice Test JavaScript: Functions and Selection Lesson 06,
CS /29/2004 (Recitation Objectives) and Computer Science and Objects and Algorithms.
Hands-On Microsoft Windows Server 2003 Administration Chapter 5 Administering File Resources.
Runtime alignment system SOFTWARE DESIGN IDEAS Wed 4 th May 2005 P Coe.
C++ fundamentals.
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Assignment 3: A Team-based and Integrated Term Paper and Project Semester 1, 2012.
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
An Approach to Test Autonomic Containers Ronald Stevens (IEEE Computer Society & ACM Student Member) August 1, 2006 REU Sponsored by NSF.
สาขาวิชาเทคโนโลยี สารสนเทศ คณะเทคโนโลยีสารสนเทศ และการสื่อสาร.
1 Advanced Software Engineering Association for Computing Machinery High School Competition System Prof: Masoud Sadjadi Fall 2004 First Deliverable By:
Parallelism and Robotics: The Perfect Marriage By R.Theron,F.J.Blanco,B.Curto,V.Moreno and F.J.Garcia University of Salamanca,Spain Rejitha Anand CMPS.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
CSC-115 Introduction to Computer Programming
Tunis International Centre for Environmental Technologies Small Seminar on Networking Technology Information Centers UNFCCC secretariat offices Bonn, Germany.
Se Over the past decade, there has been an increased interest in providing new environments for teaching children about computer programming. This has.
Computer Programming TCP1224 Chapter 3 Completing the Problem-Solving Process and Getting Started with C++
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
A modeling approach for estimating execution time of long-running Scientific Applications Seyed Masoud Sadjadi 1, Shu Shimizu 2, Javier Figueroa 1,3, Raju.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Process by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.
E-science grid facility for Europe and Latin America E2GRIS1 Gustavo Miranda Teixeira Ricardo Silva Campos Laboratório de Fisiologia Computacional.
LHCb Software Week November 2003 Gennady Kuznetsov Production Manager Tools (New Architecture)
Making Python Pretty!. How to Use This Presentation… Download a copy of this presentation to your ‘Computing’ folder. Follow the code examples, and put.
Weather Research & Forecasting Model Xabriel J Collazo-Mojica Alex Orta Michael McFail Javier Figueroa.
Connecting with Computer Science2 Objectives Learn how software engineering is used to create applications Learn some of the different software engineering.
Project18’s Communication Drawing Design By: Camilo A. Silva BIOinformatics Summer 2008.
Senior Project, 2015, Spring Senior Project Website –Version 5 Student: Yamel Peraza, Florida International University Mentor: Masoud Sadjadi, Florida.
Chapter 16 Quality Assurance Through Software Engineering Systems Analysis and Design Kendall & Kendall Sixth Edition.
Data Structure Introduction Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.
Exploring Algorithms PROGRAMMING FUNDAMENTALS. As you come in Find your section area. Find your team. One person from each team should get the team folder.
WRF - REU Project Presentation Michael McFail Xabriel J Collazo-Mojica Javier Figueroa Alex Orta.
MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.
Project18 Communication Design + Parallelization Camilo A Silva BIOinformatics Summer 2008.
Optimizing Parallel Programming with MPI Michael Chen TJHSST Computer Systems Lab Abstract: With more and more computationally- intense problems.
Unit 4: Processes, Threads & Deadlocks June 2012 Kaplan University 1.
JavaScript Introduction and Background. 2 Web languages Three formal languages HTML JavaScript CSS Three different tasks Document description Client-side.
Weather Research and Forecasting (WRF) Portal Seychelles Martinez School of Computing and Information Sciences Florida International University Elias Rodriguez.
ECE 544 Group Project : Routing KC Huang. Objective Application: message multicast. A message is sent from one sender to 1~3 recipients. Reach a protocol.
Chapter 29: Program Security Dr. Wayne Summers Department of Computer Science Columbus State University
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
WINTER 2016 – TERM PRESENTATION MICHAEL O’KEEFE. PAST RESEARCH - SUMMER 2015 Continued Jason Woodring’s research on UWCA Main issue with UWCA is the slow.
SQL Database Management
Memory Management.
CIS 115 Slingshot Academy / cis115.com
MASS Java Documentation, Verification, and Testing
Nonogram Solver Cs491b Software Design Prepared by :
Richard P. Simpson Midwestern State University
Overview of Hadoop MapReduce MapReduce is a soft work framework for easily writing applications which process vast amounts of.
Unit# 9: Computer Program Development
ECE 544 Group Project : Routing
Operating Systems.
CIS 488/588 Bruce R. Maxim UM-Dearborn
Overview of Workflows: Why Use Them?
Presentation transcript:

Finding Discriminating DNA Probe Sequences by Implementing a Parallelized Solution in a Cluster REU Camilo A. Silva Professor and Advisor: Dr. S. Masoud Sadjadi Summer 2008

BIOinformatics Group Members: Liu, Guangyuan Robinson, Michael Silva, Camilo A.

objectives Problem Problem Motivation Motivation Initial goals Initial goals Project Schedule Project Schedule Challenges Challenges Lessons learned Lessons learned Accomplishments Accomplishments Project Status Project Status Wrapping up Wrapping up Continuation of project Continuation of project Future work Future work Conclusion Conclusion Acknowledgements Acknowledgements References References

problem  What is the most efficient parallel program structure algorithm to be used in a cluster by implementing MPI and what type of algorithm is required for the program to be both self-healing and self optimized by maintaining an optimal performance?  How can this program be implemented in a web application by using tools that enhance the friendly user interface?

motivation Our project = discriminating probe finderOur project = discriminating probe finder Characteristics:Characteristics: –Capable of finding all possible probes combinations of a certain genome and compare them against another genome based on a probe length parameter –Find other variations such as reverse, inverse, and compliment whenever specified in a parameter –The output result are the probes that are present in one genome but not present in another—these are known as discriminating probes. ABLE TO BE RUN ON A CLUSTERABLE TO BE RUN ON A CLUSTER IMPLEMENT SELF-MANAGING FUNCTIONSIMPLEMENT SELF-MANAGING FUNCTIONS

initial goals  Implement the parallelization of the “finding discriminating probes” application  Create a self-managing system for the application  Implement a web application for the project

project schedule 6/12-6/23: MPI theory preparation + Autonomic computing 6/12-6/23: MPI theory preparation + Autonomic computing 6/25 WED: parallelization programming starts 6/25 WED: parallelization programming starts 7/7-7/13: Test simulated MPI programs; learn about MPI-IO and explain to my group members how to use MPI 7/7-7/13: Test simulated MPI programs; learn about MPI-IO and explain to my group members how to use MPI 7/15/2008: Deadline to have MPI implementation ready for the project 7/15/2008: Deadline to have MPI implementation ready for the project 7/16-7/23: Learn about MPI error handling and MPI debugging 7/16-7/23: Learn about MPI error handling and MPI debugging 7/23-7/27: First parallelized jobs were assigned to GCB during this past weekend 7/23-7/27: First parallelized jobs were assigned to GCB during this past weekend 7/27: MPI parallelized program was completed with an implementation of a self-healing attribute 7/27: MPI parallelized program was completed with an implementation of a self-healing attribute 7/27-8/14: Make modifications to the parallelized program if necessary 7/27-8/14: Make modifications to the parallelized program if necessary 7/31-8/14: Write and complete paper to be submitted on August 15, /31-8/14: Write and complete paper to be submitted on August 15, 2008

challenges Having to learn to be an independent researcher Communicating my ideas to my team members in an efficient fashion Being able to complete all my time projections on time Studying massive amounts of detailed material in short periods of time Debugging, debugging, debugging

lessons learned The ability to work virtually in a global team environment is an opportunity to take advantage of If someone has the willingness to explore new lands and learn new “magic” just do it, read it, and practice it Problems can be solved by communicating with others ENJOY and LOVE what YOU DO!

accomplishments Three topics will be discussed: Parallelization Self-management Results

parallelization The master node will acquire info from the user in regards to the different genomes to be compared for project18 The master node will acquire info from the user in regards to the different genomes to be compared for project18 The master node will administer the data and create jobs to each slave node. The master node will administer the data and create jobs to each slave node. Each slave node will receive the data from the master node and start execution of project18 Each slave node will receive the data from the master node and start execution of project18 After a node has completed its task, it will report its completion to the master node which will determine if there are more tasks to be completed. If there are, the proceeding task will be given to such node. After a node has completed its task, it will report its completion to the master node which will determine if there are more tasks to be completed. If there are, the proceeding task will be given to such node. When the program has finished, all results shall be stored in a predefined directory where such would be available for review. When the program has finished, all results shall be stored in a predefined directory where such would be available for review.

parallel program design ooooooo startend FM.N. C M.N  Master Node 1-7  Slave Nodes F  Finish C  Completion

a brief pseudo-code of the parallelization… //libraries + definitions #include … … //main Int main (…) { //variable definitions… If (rank == master node) { //ask user for input, create the queue and initialize all tasks While (//there are more items left in the queue) { //receive completion signals, keep fault control and task control active, and assign new available tasks to available nodes }//end while }end if //continue on the right Else { //receive the number of items left of the queue While (//there are more items left) { //receive from master node the genome parameter EXECUTE PROJECT18 Create output files Submit completion code to node0 }//end while } end of else MPI_Finalize () ; }// end of main Void taskControl (…) { Makes sure that each task is completed accordingly and is succesful }

self-management The self management implementation that was added in my program is a self-healing property. It is a trivial application that functions as a checker of errors whenever messages are sent to the slave nodes from the master node If an error is found, it will be detected and such task will be stored in an array that carries the error status of each task assigned Each task that has an error during the sending of a message will be reassigned accordingly

self-healing -7 … … TASK_NUMS Each time a message is sent from the master node to the slave nodes an MPI error handler is active checking for errors. If there is an error in the message being sent, such will be reported in an array. Afterwards, the master node will resend the message to the assigned slave node.

results As far as the parallelization of the program it works because our group leader Michael Robinson did a test with some of the small files two weeks ago. As far as the parallelization of the program it works because our group leader Michael Robinson did a test with some of the small files two weeks ago. As far as the self-healing property it has not being tested yet when errors are found due to the fact that as of know there have not been any errors. As far as the self-healing property it has not being tested yet when errors are found due to the fact that as of know there have not been any errors. The results that will be shown are from the parallelized program that Gary completed. The results that will be shown are from the parallelized program that Gary completed.

results table ENDJULYSTARTNODE TIME USED !FOUNDGENOMES 19:382517:00126:38:00109, :072517:00227:07:00137,25301cs2 21:432517:00328:43:0091, :522517:00429:52:0018, :182517:00530:18:0038,

results statistics

project status  As far as the parallelization of the project, that part is complete.  The self-healing part of the program could be enhanced in a better way by having two autonomic agents: one that checks for connectivity of nodes and another that checks on the functionality status of each slave node  There is another thing to fix which is an error that seems to be linked with memory leakage. Such error is present whenever there are more tasks assigned than nodes  One of the most important parts of what is left to do is data validation  Finally, performance tests will be completed in the following couple of days for the data analysis

wrapping up My goal is to help my group in writing the finalized draft of the paper My goal is to help my group in writing the finalized draft of the paper If necessary, I would be modifying my parallel program to fit the testing needs. For example, instead of asking the user for input—all the input should be read from a file If necessary, I would be modifying my parallel program to fit the testing needs. For example, instead of asking the user for input—all the input should be read from a file

continuation of project  I would like to have the opportunity to enhance my program to have two self-healing autonomic components that would help in finding faults in both connectivity and task functionality of the slave nodes  Find a way to self-optimize my program

future work One of my initials goals was to create a web interface that could initialize the tasks in the cluster. This would be a fun and interesting work to perform in the future One of my initials goals was to create a web interface that could initialize the tasks in the cluster. This would be a fun and interesting work to perform in the future

conclusion Through out my first research experience I had the opportunity to learn about what it takes to be an independent researcher, as well as working with a team in a specific task From my initial four goals, I was able to successfully accomplish two of those. Although I thought that I would be able to do everything I projected—I did not have into account the amount of reading and learning that I had to do prior programming in parallel. I did not have into account debugging and testing as well. Still, I am glad to know that I am not the same person that I was two months before. Now, I am more knowledgeable in a specific topic. And, I feel a desire to continue doing research and be able to contribute in science!

Acknowledgements Special thanks to: David Villegas Javier Delgado Javier Figueroa Juan C. Martinez Dr. S. Masoud Sadjadi Dr. Hector Duran Dr. Scott Graham Dr. Masoud Milani REU + PIRE Staff My Group Members: Guangyuan “Gary” Liu Michael Robinson And God for giving the strength to study hard… And to all of YOU for being here listening to me!

references -- MPI general: [1] [2] [3] [4] [5] -- MPI Error handling: [1] [2] [3] [4] [5] -- MPI Debugging: [1] [2] [3] [4] [5] -- MPI IO: [1] [2] [3] [4] Great Site containing all the information needed on MPI --ADvanced IBM's [5] by Rajeev Thakur -- Grid + Cluster info: [1]