Download presentation
Presentation is loading. Please wait.
1
Remote execution of long-running CGIs
Solving the 30-sec WEB timeout problem
2
Current Architecture 30 sec timeout WEB Server: CGI Application
3
Remote CGI infrastructure:
Job status check Progress report WEB Server: CGI2RCGI gateway Back-end CGIs (worker nodes) Gateway relays CGI environment and input streams to the grid. While query is running it checks the status and displays a status report HTML page (customizable) NetSchedule Grid Output transferred to the WEB front end
4
Why 30 sec. timeout happens?
Peak hours. In peak hours number of processor hungry tasks exceed number of CPUs. CGIs used as a platform to implement complex algorithms CGIs inefficiency
5
NetCache Brief: Temporary network accessible BLOB storage.
Client submits data, receives the token (BLOB ID), which can be used to access the data (token is ready for URLs, cookies, etc). BLOBs are getting deleted automatically based on access pattern and timeout. Used for: Session data storage (CGIs) Storage of on-line generated graphics (CGIs) Local data cache (Object Manager) Load balanced using LBSM Cross platform Uses Berkeley DB (fast, transaction protected, can be redistributed and embedded into other apps (GBENCH)
6
NetSchedule Push-Pull Model
Push Job Queue 1 Queue 2 Job 1 Job 2 ….. Job 3 Pull Job NetSchedule server maintains several FIFO queues
7
Network Communication front-end
NetSchedule Clients & Worker Nodes Network Communication front-end Queue 1 Queue 2 FSM (in-memory) FSM (in-memory) Pending : …………. Running : …………. …… Done : 1 0 0 Pending : …………. Running : …………. …… Done : 1 0 0 Transactional Data Base Transactional Data Base
8
NetCache/NetSchedule In Grid Framework
1.Put Input Data CGI: Submit job Wait for result (off–line) Render result as HTML NetCache Server -keep input & output BLOBs 1.1 Data key (inp_blob_key) 8. Get Output Data (out_blob_key) 8.1 Output data 5. Put Result 5.1. Data key (out_blob_key) 2. Job Submit (inp_blob_key) 7. Get Result (job_key) 7.1 (out_blob_key) 2.1 (job_key) NetSchedule Server: - Control job queue 4. Get Input data (inp_blob_key) 3. Get Job 6. Put Job Result (out_blob_key) Worker Node: -Get new job Get input data (BLOB) Do the job Submit output BLOB Submit job result
9
How to convert an existing CGI?
#include <misc/grid_cgi/remote_cgiapp.hp> ………….. class CRemoteCgiAppSample : public CRemoteCgiApp { ……. }; void CRemoteCgiAppSample::Init() // Standard CGI framework initialization CRemoteCgiApp::Init(); ………. } int CRemoteCgiAppSample::ProcessRequest( CCgiContext& ctx ) …………….. PutProgressMessage( “Work in progress"); ……………...
10
High availability All central components (queue and data storage) are duplicated All components are controlled by NCBI load balancer Protection against back-end (remote CGI) failures - by timeout or via explicit rescheduling
11
Features Back end servers can run requests for more than 30 seconds (WEB timeout) Easy application migration. (Minor code tweak and recompilation) Backend machines can send progress messages (feedback to the user)
12
Architecture Notification (UDP/IP) Worker Node (Active)
Client-submitter NetSchedule Queue 1 Queue 2 …. GetJob() PutResult() Client-submitter Job 1 Job 2 Job 3 Client Waiting for job GetStatus() GetJob() PutResult() Worker Node (Waiting for a new job) Notification (UDP/IP) Push – Pull design (NetSchedule server is passive) Worker nodes can subscribe for queue events notifications Using connection-less network protocol
13
Framework Levels Remote CGI (Application looks like Ncbi CGI App)
Grid framework High level API, job input and output are just C++ streams. NetSchedule/NetCache API Low level Queue/Storage access API All APIs need NetCache / NetSchedule components running in local network
14
Worker node API High level design (use of C++ streams, compatibility with ASN.1 serialization) Support of SMP (node can run parallel jobs) Remote administrative access to worker nodes (shutdown, availability check, statistics)
15
Performance & Latency Stress test results:
NetSchedule Queue performance guarantees low overhead comparing to conventional CGI model Submit 5000 jobs Done. Elapsed: sec. Avg time: sec. GetStatus 5000 jobs... .....Elapsed : sec. Avg time : sec. Take-Return jobs... Returned 2500 jobs. Jobs processed: 2500 Elapsed: sec. Avg time: sec. Test environment: Linux-to-Linux 2 CPU machine, One active submitter
16
Job Timeout Protection
Estimated job execution time (based on input analysis) 50 seconds NetSchedule Worker node 1 Job N Worker node 2
17
Job Timeout Protection
Node Failure! NetSchedule Worker node 1 Job N Timer: Job N Worker node 2 Job N expired and rescheduled Job N
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.