A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte
Overview Parallel Applications on the Grid Latency Hiding by Redundant Processing (LHRP) PGAFramework Related work Conclusion
Parallel Applications on the Grid Advantages Access to more resources Lower costs Future profits from Grid Economy ? Challenges IO problem Need for easy-to-use Interface Heterogeneous hardware
Latency Hiding by Redundant Processing Latency Hiding problem LHRP Algorithm CPU type CPU task assigned to each CPU type Versioning system Mathematical model to describe LHRP Results
LHRP Latency Hiding Latency Hiding by Redundantly Processing
LHRP Algorithm Internal: Only communicates with LAN CPUs. Border: Communicates with LAN CPUs and one Buffer CPU Buffer: Communicates with LAN Border CPU and receives data from WAN Border CPU
Computation and Communication Stages Internal: Computes borders Transfers borders (Non-blocking) Computes core matrix Waits for transfer ACK
Computation and Communication Stages Border: Computes borders Transfers borders (Non- blocking) Sends far border Computes core matrix Waits for transfer ACK Checks on far border transfer ACK (if it is the last iteration Wait)
Computation and Communication Stages Buffer: Computes borders Transfers borders (Non- blocking) Receives far border Computes core matrix Waits for transfer ACK Checks on far border transfer ACK (if it is the last iteration Wait)
Buffer Node Versioning Algorithm
LHRP Algorithm Review Node types: Internal Border Buffer Far Border transfer Buffer Node Versioning system
Estimated Algorithm Performance G: Grid Latency I: Internal Latency B: Amount of data tuples used by the Buffer Node W: Total amount of work for all CPUs C: Amount of CPUs doing non-redundant work
Estimated Algorithm Performance
Experimental Result: Memory Footprint 21% increase memory use over conventional form of Latency Hiding. Causes: Extra Matrix in Buffer Node to store old column versions Extra far border buffers.
Experimental Results: Performance
PGAFramework Objective Design Requirements Implementation technology choices API Design API Workpool Example Other API features Synchronization option Recursive option
PGAFramework Objective: To create an efficient parallel application framework for the grid that allows a user programmer easy interaction with the Grid resources.
Design Requirements Platform independence Self Deployment Easy-to-Use Interface Provide the following services without requiring extra effort on the part of the user programmer: Load Balancing Scheduling Fault tolerance Latency/Bandwidth tolerance
Design GPAFramework User's Application API (Interface) Load Balancing Scheduling Fault Tolerance Latency Bandwidth Tolerance Globus Job Scheduler (Condor) GPAFramework User's Applications Hardware Resources
Deployment GridWay ? Globus CondorPBS Globus SGE Desktop PCs Node Cluster computer node Super computer Scheduling Service Job Submit Node Resource Discovery
Implementation Java Platform Independence JXTA (JXSE) Peer-to-peer API Provides tools to work-around NAT's and firewalls Provides library and module runtime loading API
Motivation for API Design Video Codecs Codecs follow an interfaces What happens inside the codec does not matter The input and output for the codec needs to be specified Display a Gui Load File... Output video to screen mpegoggh.264 Video Player Mpeg endoded stream Raw video Data
PGAFramework API There may be multiple “template” API's Each API has Interfaces that the user implements The user “Inserts” his module into the framework API Get data from framework Compute on data Return processed data Request sync (optional) Give data to framework Get data from framework Store or pipe data Schedule processes on Resource Load user Data Create network Determine topology and net behavior Send user process to compute nodes Get Data from user class Send to master node Repeat process in loop until done
API Sample Code
API
API Sample Code
Synchronization option RemoteHandler provides an Interface to synchronize data Data is synced non-blocking User creates blocking procedures if needed
Recursive Feature Allows multiple level of parallelization (granularity) Decode Video Cut Raw Video Into Pictures Blur pictures Blur portion of picture Pipeline Work pool Synchronous
Related Work MPI Implementation for the Grid MPICH-G2 GridMPI MPICH-V2 (MPICH-V1) Peer-to-peer parallel frameworks P2PMPI (for cluster computing) P3(for cluster computing) Self deploying frameworks Jojo
Conclusions Parallel Applications on the Grid Latency Hiding by Redundant Processing (LHRP) PGAFramework Related work