Presentation on theme: "Designing a Java-based Grid Scheduler using Commodity Services Patrick Wendel Arnold Fung Moustafa Ghanem Yike Guo Discovery NetInforSense Department of."— Presentation transcript:
Designing a Java-based Grid Scheduler using Commodity Services Patrick Wendel Arnold Fung Moustafa Ghanem Yike Guo Discovery NetInforSense Department of ComputingLondon Imperial College, London
19/09/2006All Hands Meeting, Nottingham Outline Discovery Net Project Platform Workflow Execution Design Deployment Conclusions – Future Works
19/09/2006All Hands Meeting, Nottingham Discovery Net Multidisciplinary project funded by the EPSRC under the UK e-Science programme (started Oct 2002, ended March 05) Develop an infrastructure for integrating various types of data sources, software and hardware resources, targeted at e-Scientists. Applications to: Life Sciences High throughput genomics and proteomics Real-time Environmental Monitoring High throughput dispersed air sensing technology Geo-Hazard modelling Earthquake modelling through satellite imagery The project covered many areas including infrastructure, applications and algorithms (e.g. text mining) Produced the Discovery Net platform which aims to integrate, compose, coordinate and deploy knowledge discovery services using a workflow technology.
19/09/2006All Hands Meeting, Nottingham Excel SQL Databases Data Processing Tools Analysis Services Third Party Tools Multiple data sources Multiple data sources Interactive Knowledge Discovery Interactive Solution Building Rapid Application Deployment Portal / Dashboard Application Discovery Net Overview Files Automation & Scheduling Data Applications Components Computations Services Integrative Analytics Workflow Environment Distribution to Scientists Dynamic Data & App Integration Business Process Online Sources Web Services Grid Services Web/Grid Service
19/09/2006All Hands Meeting, Nottingham Rail Network Data Analysis Collaboration between the London e- Science Centre, and AEA Technology Rail funded by DfT Project showed how it is possible to analyse the large amounts of data available within the rail industry using e-Science methods and Grid computing Imaging Applications Project using imageodesy algorithms Medical imaging Combinatorial Chemistry TOPCOMBI (EU Project) 22 partners Latest Applications
19/09/2006All Hands Meeting, Nottingham SIMDAT EU-funded project 4 years Start date: September 1st, partners InforSense is technology champion for workflow systems Pharma applications Automotive applications Knowledge services application Capability Providers Grid Technologists End Users
19/09/2006All Hands Meeting, Nottingham SIMDAT Work conducted within SIMDAT (EU-funded project) Extended workflow engine to support B2B use case scenario in the automotive, pharmaceutical Integration with GRIA Prototypes for coupled workflow engines Prototypes for workflow engine interoperability
19/09/2006All Hands Meeting, Nottingham Modules Interface Submission Execution/ Optimisation Monitoring Interaction Verification Enactment/Execution Intermediate Results Data Access Table Management Data ManagementActivities Activity Definition Activity Authorisation Workflows Workflow Storage Workflow Client Tool Web PortalWeb Service Workflow Execution History Persistent Results Workflow Authorisation Data Authorisation Authorisation
19/09/2006All Hands Meeting, Nottingham Workflow? Data-flow Dependency graph Workflow construction paradigm: Visual graph construction (layout, annotation) Aided construction through application-specific wizards Using workflows provides: A simple rapid application development environment Visual representation of the process Re-usable, maintainable and shared processes Workflow-based knowledge management (provenance, audit, policies, warehousing) Handling of basic parallel programming constructs (concurrent executions of branches, pipelining of executions for certain type of data and certain activities, interface for data-parallel activity implementations) Coupling with data sets management
19/09/2006All Hands Meeting, Nottingham Interface Client interface: Workflow construction, verification, execution, monitoring Supports visualisation and interactive activities (activities executed in the client) Synchronised with activity repository (using JWS) Web Portal and Web Services endpoint, for accessing workflows as Services
19/09/2006All Hands Meeting, Nottingham Server-side Architecture Repositories User/Group Workflows Results Interm. Results Jobs Generic Services Activities Logging (Log4J) Container-Managed Persistence for EJB Messaging (JMS) P2P, Publish/Subscribe Security (JAAS) Services HTTP Servlets Presentation StatelessStateful CMPMessage-driven JSPStruts Portlets (JSR 168) Cache/Results AccessCode Download Task Management Job ExecutionHandlerDataMgmt Queues Topics Jobs queueStatus topicJobs topic Component Mgmt Database Connectivity (JDBC) Data TransferWeb Service Jobs History Naming Service (JNDI) Management Service (JMX) Plugin Framework (JPF) Software Delivery (JWS)
19/09/2006All Hands Meeting, Nottingham Distributed Execution Activity-level distributed computing SSH (data streaming), SGE, LSF… Web Services, GRIA, HTTPClient (Groovy) Workflow-level (scheduling of overall execution): Depends on usage and type of workflow: Developing prototype workflows: Iterative refinement Caching and reuse of intermediate results within a user session Stateless production workflow: Entire workflow executed for different input/parameters Scheduling Stateful production workflow as services: Workflow executed following a process/guide Execution engine must be able to reuse results cached
19/09/2006All Hands Meeting, Nottingham Granularity of an execution Architecture based on the Java EE stack, which provides a hosting environment for the activities (context, security, logging, access to resources and application-level environment information) Each workflow execution is handled by one or more threads running in a Java server, while usually tasks submitted to grid schedulers are OS processes. Periodic monitoring information generated by each activity (not only by the workflow engine) sent back to the client tool or portal. Whats the best way to handle task scheduling in that context?
19/09/2006All Hands Meeting, Nottingham Requirements Summary From the Discovery Net architecture: Workflow execution and activity reliant on JEE services Scheduling should depend on the need to reuse and the availability of intermediate results for the workflow Additional constraints: Execution servers can be distributed over WAN Based only on standard Grid infrastructure or JEE Services No direct communication between execution servers and client tool
19/09/2006All Hands Meeting, Nottingham First attempt: Grid Scheduler Submit execution to SGE Issues: Cannot start an instance of the server for each execution (only one instance of JBoss at a time, except adding new configurations for each execution). Start up cost of the server not negligible for some workflows. The execution server needs to connect back to the submission server and setup a two-way communication channel. How is the client notified of new status?
19/09/2006All Hands Meeting, Nottingham Second attempt: AS Clustering Application server level clustering CMP Entity bean Clustering Experiment with JBoss Clustering (based on JGroups) Issues Application Server Clustering not fully standardized. Different issues on different application servers Cluster configuration based on JGroups, only supported static clusters (set of IPs) or join protocol based on broadcast (may be better now?) Modifications of the clustering code required to ensure that a unique instance of the Entity bean representing the task is created and used throughout the execution Not designed for long running tasks
19/09/2006All Hands Meeting, Nottingham Third Attempt: Using Java Services Stateless Session Bean as entry point (Task Management Service): Mapping to IIOP/RMI or SOAP/HTTP Stateful CMP Entity Bean to represent the state of the task (workflow, cached results, monitoring information) Job JMS Queue to submit requests ExecutionHandler Message-Driven Bean to handle the requests Job Topic to send control commands to the execution Status Topic to send back information from the execution Scheduling policy implemented by the JMS Queue service provider: Default using round-robin Integrated with SGE using simple scripts to find out a potential execution server (extended to check whether the execution server is started or should be started) Customised implementation to check for workflow cached intermediate results Number of concurrent executions on each execution server defined by the size of the pool for the ExecutionHandler MDB
19/09/2006All Hands Meeting, Nottingham Web Portal Task Management Service (Stateless) Messaging Service Provider Persistence Service Execution Server 1 Execution Server N load/save subscribe publish submit Client Tool Services Provided Design
19/09/2006All Hands Meeting, Nottingham Task Management Service Job queue Execution Server load/save subscribe publish submit Client Tool Services Provided - Create JobEntity -Publish to Job queue - Message-triggered ExecutionHandler receives notification - JobEntity activated Persistence Provider for JobEntity Submission
19/09/2006All Hands Meeting, Nottingham Task Management Service Job topic Execution Server load/save Subscribe to messages for Job ID publish Control (pause/resume/kill) Client Tool Services Provided -Publish control request on Job topic - JobEntity receives notification Persistence Provider for JobEntity Control
19/09/2006All Hands Meeting, Nottingham Task Management Service Status Topic Execution Server update publish Client Tool Services Provided -Update Job entity state - Publish status update Persistence Provider for JobEntity Subscribe to messages for Job ID Monitoring
19/09/2006All Hands Meeting, Nottingham Management Status update period: The ExecutionHandler is in charge of checking regularly (base period) if the monitoring information of the workflow has changed, increase the period if it has not (up to a maximum update period) and notify the Status Topic if it has. Failure detection: The server hosting the Task Management service also checks for tasks for which the time since the last update is significantly higher than the maximum update period. Security Context: All the execution servers can have dedicated JAAS configuration. To avoid the issue of having to re-authenticate the user who submitted the workflow, execution servers use a customised login module to handle the delegation.
19/09/2006All Hands Meeting, Nottingham Deployment JBoss 3.2, JBossMQ, Hibernate LAN: Using faster native Java protocols (RMI/JRMP) and call back WAN: Using HTTP-based and polling based protocols
19/09/2006All Hands Meeting, Nottingham Task Management Service Messaging Service Provider Persistence Service Execution Server 1 Execution Server N Client Tool Services Provided TCP RMI/JRMP IIOPWeb Service/HTTP LAN Deployment
19/09/2006All Hands Meeting, Nottingham Task Management Service Messaging Service Provider Persistence Service Execution Server 1 Execution Server N Client Tool Services Provided TCP HTTP Web Service/HTTPIIOP Firewall/NAT WAN Deployment
19/09/2006All Hands Meeting, Nottingham Evaluation Reliability: Dependent on reliability of CMP provider, JMS provider Task Management service is stateless Execution Servers do not hold the state of the task (only intermediate results) LAN configuration, used for running nightly regression test workflows, over a heterogeneous cluster (Linux servers + desktop PCs) Deployed on production clusters (with limited connectivity from the slaves to the outside network) WAN configuration adding several seconds of delay depending on the workflow: Workflow submission is still synchronous RPC using tunnelled JRMP Monitoring information using Java serialisation as well
19/09/2006All Hands Meeting, Nottingham Conclusion Simple, scalable solution based on Java EE commodity services, instead of working around Grid submission APIs, yet customisable to use any command-line based scheduler, resource monitor or workflow specific policies. The implementation is not bound to any network protocol. Issues To have custom policies rely on the flexibility of the JMS provider No software delivery mechanism for execution servers (unlike the client). You have to install it. Reliance on JEE services performances? Why bother about having a hosting environment for workflow execution?
19/09/2006All Hands Meeting, Nottingham Future Works Use the workflow structure to refine the scheduling algorithm, taking into account information about the workflow (such as the number of branches and pipelined activities) User-defined rules/scripts to define workflow-level or activity-level scheduling policy/rules.