Presentation is loading. Please wait.

Presentation is loading. Please wait.

Monitoring Latency Sensitive Enterprise Applications on the Cloud Shankar Narayanan Ashiwan Sivakumar.

Similar presentations


Presentation on theme: "Monitoring Latency Sensitive Enterprise Applications on the Cloud Shankar Narayanan Ashiwan Sivakumar."— Presentation transcript:

1 Monitoring Latency Sensitive Enterprise Applications on the Cloud Shankar Narayanan Ashiwan Sivakumar

2 Enterprise Applications (EA) Stock Trader Benchmark Application 2 Data Base (DB) Business Service (BS)Front End (FE) Configuration Service (CS) Order Processing Service (OS)

3 EA as Services 3 FE Users FE BS OS DB Load Balancers Service Endpoints

4 EA Characteristics 4 Notice: Dynamic and distributed nature of cloud deployments. Reducing user observed latency is the goal – Monitor this ! EA propertyRelevant cloud characteristic ScalabilityDynamic deployment sizes Availabilitygeo-redundancy EconomicsPay-as-you-use ElasticityDecoupled services Low latencyDeploy closer to user groups UtilizationLoad balancing

5 Performance Variation: Time Series and CDF of DB Latency 5 - data snapshot worth 4 hours across both the days

6 Monitoring Framework – Design Goals  Resilience: Less sensitive to cloud variability  Scalability: Capable of scaling with component instances  Portability: Easy to integrate with applications  Flexibility: Multiple levels of measurement  User level latency  Component level isolation  Efficiency: Fast and accurate measurements 6

7 7 Why is Monitoring Hard  Dynamic environment – number of components change  Distributed deployment - needs a collection framework  Variable request path – different choice of components Existing monitoring tools Do not support service oriented architectures Too detailed Not scalable Remember: user observed latency is our goal Abstract away un-necessary details !

8 Measuring End-points – Existing Tools FE BSDB Users 1 2 3 5 4 76 11 10 98 12 13 HTTP Request SOAP Response HTTP Response MySQL Replies 8 Aggregate !!

9 Measurement Model T i,i+1 C i + 1 C i + 2 C i T i-1,i T i,i+1 T i+1,i+2 T’ i+1,i+2 T i+1,i+2 T i,i+2 T’ i,i+2 T’’ i,i+2 T i,i+2 T’ i,i+1 T i,i+1 T i+1,i+2 T’’’ i,i+2 T i,i+2 T’’’’ i,i+2 T i,i+2  CL i = Component latency of i th component  LL i,i+1 = Link latency across components i, i+1  N = No of components C i communicates with  nj = No of calls made by C i to each of the j components 9

10 Notification Q Instrumented application component Log server (local) Raw log Storage (local) Global collector Instrumented application component Log server (local) Raw log Storage (local) Aggregated log Monitoring Framework Architecture 10

11 Outline 11 Monitoring tool – Collection framework – Instrumentation framework

12 The Collection Framework  Each component writes to local storage  Front-end sends “done” message to local queue  Queues: decouple producer, consumer entities  Storage: persistence, no limit on size  Both: scalable, robust 12 Question: Why this a right model ? When in doubt, measure!

13 Alternative Model 13  All components write to queue  Collection framework de-queues  Forms a P2P network to collate the data

14 Experiments on Azure and EC2 Experiments evaluating performance of storage and queues. Real cloud deployments (Microsoft Azure, Amazon AWS) Extensive measurements from all data-centers US (East/West/North/South) Europe (West/Central) Asia (East/South East) 14

15 Performance of Storage and Queues 15 Microsoft AzureAmazon AWS Measurements made in all 12 datacenter regions (Azure and AWS) Experiment length (24 – 26 hours) Approx 100,000 requests to storage 16,000 requests to the queues Write Q Read Q Write Q Write Store

16 Outline 16 Monitoring tool – Collection framework – Instrumentation framework

17 17 Instrumentation Framework - Goals Minimize coding effort and intervention Measure latency at the granularity of user request Automate instrumentation as much as possible Generate minimal measurement parameters

18 Comparison of Existing Tools 18

19 Instrumentation Framework Instrumented Application Component Original Application Component Aspects Specification for the application end- points (X-trace: log events) Measurement metric specification (X-trace: meta-data) Log Format specifications 19

20 Experiment Set-up 20 Deployed two similar benchmark applications DayTrader - Amazon AWS StockTrader - Windows Azure (prior work) Deployed the collection framework on AWS and Azure. User sessions and request patterns from DaCapo benchmark suite. Instrumentation: Automated using aspects – DayTrader (AWS) Custom coded - DayTrader and StockTrader

21 Aggregation Benefit: DayTrader 21 User request type Storage writes without aggregation Storage writes with aggregation FEBSFEBS Login3511 Portfolio10 11 Update profile4511 Home2211 Buy1711 Sell1811 Account3311 Total244077 User sessions : 20, 1 every 10 seconds Results shown for a random user from DaCapo 78% writes reduced in above case transactions benefits

22 Aggregation Benefit: MedRec Application Suite 22 ApplicationStorage writes without aggregation Storage writes with aggregation FEBSFEBS MedRec App4811 Physician App81511 Admin App2511 Storage writes reduced by at least 50% from FE, 80% from BS

23 Instrumentation Benefit 23 Category Code (# of files) Handcrafted Code (# of files) X-Trace with Aspect same15250 (88)15250 (92) modified593 (74)465 (70) added878 (0)166 (2) automatable0 (0)166 (2) FE component code : automatable using aspects with x-trace Cross component calls : x-trace object passed as parameter New lines of code reduced by ~80% SLOC reduced by ~20% Aspects can be automated

24 Future Work 24 Scaling the framework Application scale to Framework scale ratio Per Datacenter ? Per VM ? Varies per cloud provider ? Impact of these design decisions on the sensitivity of the framework

25 Conclusions 25 Architectural benefits: Generic across - application, # of components, access patterns Scalable – decoupled entities Aggregation benefits: N writes to storage becomes one write Log server offloads work from application Instrumentation benefits: Easy to integrate with application New lines of code reduced by ~80% SLOC reduced by ~20%

26 26 Q & A

27 Back up slides 27

28 Azure Blob Read and Write Latency Blob read-write at least 30-40 msec 28

29 Azure Queue Read and Write Latency Queue read costly, write comparable to blob 29

30 30 SQL Azure Performance Issue Snapshot (6 Days)


Download ppt "Monitoring Latency Sensitive Enterprise Applications on the Cloud Shankar Narayanan Ashiwan Sivakumar."

Similar presentations


Ads by Google