Download presentation
Presentation is loading. Please wait.
Published byCecil Clark Modified over 9 years ago
1
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative Common Execution Environment Kate Keahey, Tim Freeman, Alex Clemesha, John Bresnahan, David LaBissoniere, Life Cycle Architecture Review La Jolla, CA
2
OOI CI LCA REVIEW August 2010 Subsystem Purpose Provide highly available services Rapidly provision resources Scale to demand
3
OOI CI LCA REVIEW August 2010 R1 Use Cases UC.R1.16: Scale the processing A load is put on the system Additional demand is recognized via different sensors Message queue length, CPU loads, disk usage System scales up to meet increased demand System scales down when demand goes away UC.R1.25: Assure reliability Failures happen Remedial actions happen No significant impact on observatory operation
4
OOI CI LCA REVIEW August 2010 Architecture Overview
5
OOI CI LCA REVIEW August 2010 More Architecture Overview
6
OOI CI LCA REVIEW August 2010 Even More Architecture Overview (HA Provisioning)
7
OOI CI LCA REVIEW August 2010 Even More Bootstrapping
8
OOI CI LCA REVIEW August 2010 Implementation/Achievement Status All major components implemented: Provisioner, Controller, Planner, Sensor Aggregator Integrated with ION Tested on infrastructure ranging from Magellan to EC2 Needing refinement: exchange points, EPU adapters, cc-agent, use of Registry, Planner, Image building and management Need to put together HA-Provisioner and bootstrapping Draft user and administrator process
9
OOI CI LCA REVIEW August 2010 Technology Choices Services: Python, Twisted Messaging: ION container (AMQP) Infrastructure: Nimbus IaaS, EC2 IaaS, libcloud, Nimboss, boto Contextualiztion: Nimbus Context Broker/agent, Nimboss, Chef Solo, Fabric Special: txrabbitmq, Twotp (queue length sensor)
10
OOI CI LCA REVIEW August 2010 Use Cases Addressed and Demonstrated UC.R1.16: Scale the processing UC.R1.25: Assure reliability
11
OOI CI LCA REVIEW August 2010 The Testfest Tests run on EC2: CEI infrastructure deployed on two small instances: Provisioner, DTRS, EPU_Controller, Sensor Aggregator, client RabbitMQ server on a large instance with elastic IP UC Context Broker instance Fully experimental (except for queue length) Target: scale the scenarios to up to 1000 VMs
12
OOI CI LCA REVIEW August 2010 Scale the Processing Test1 70 jobs Submitted over 28 minutes, 5 jobs every 2 minutes Long job duration, one job per VM Test 2 70 jobs Submitted at the same time Long job duration, one job per VM
13
OOI CI LCA REVIEW August 2010 Assure Reliability How does the system react to failure? Killing 5 VMs in 5 minute intervals Saturating the system with 10 second jobs Bounded policy: 40 VMs
14
OOI CI LCA REVIEW August 2010 Technology Challenges Most significant resolved issues: Semantic differences between IaaS providers Issues in contextualization recepies Multiple simple bugs Most significant unresolved issues: AMQP connections closed unexpectedly Currently prevents us from running at scale RabbitMQ crashes when used by multiple clients Inspecting queue remotely issue Non-concurrent container issue Pulling work issue
15
OOI CI LCA REVIEW August 2010 Plan for Construction Complete feature set Continue evaluation and improvement Make code ready for users Refine documentation Continue integration with other subsystems Acceptance tests and harness
16
OOI CI LCA REVIEW August 2010 Significant Risks
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.