HyScale: Hybrid Scaling of Dockerized Microservices Architectures

HyScale: Hybrid Scaling of Dockerized Microservices Architectures
Before diving into the scaling aspect of things you might be wondering what Mas are and where they fit in the full stack and why we want to use them What/where do they fit in or why are we interested in dockerized microservices architectures? M.A.Sc. Thesis presented by Jonathon Wong Department of Electrical and Computer Engineering University of Toronto

Cloud Architecture Application level (e.g. Netflix, includes login, browsing/search, streaming, etc) Virtualization (i.e. house application) level two types but we deal with Docker containers, introduce at a high level (compare to VM) Docker/container/microservice interchangeably Resource management level: placement and resource allocation, scaling happens (the layer of interest) Physical hardware/resources level: also enables virtualization What are these mysterious microservices architectures?

Microservices Architecture
Catalog Service Product Service Libraries Java 7 Python 2.7 Javascript Libraries Java 8 Python 2.7 Javascript Customer Service Cart Service Loyalty Service Order Service Traditionally applications followed the monolithic architecture Break into many small decoupled microservices and compare A microservices architecture is a method of developing software applications as a suite of independently deployable, small, modular services, in which each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. Login Service Account Service

Benefits of Microservices Architectures
Independent redeployments without compromising application integrity Only need to change one or more distinct services instead of redeploying entire application Decoupled nature copes with failures One service failing does not affect other services Promotes and facilitates scalability An overloaded service can easily be replicated to distribute load

Monetizing Microservices Architectures
Hosting your own microservices architecture (MA) is expensive Customers/tenants typically pay cloud data centres to host their MA instances Modern data centres categorize various classes of tenants Each class associated with a service-level agreement (SLA) dictating various quality of service (QoS) requirements QoS metrics usually revolve around availability of microservices Violation of SLAs incur a penalty on the provider Yeah I want to host my own! … Now every business is doing it! Problem hasn’t gone away, just shifted to the cloud data centre to deal with D:

Scalability Issues Data centres reaching physical limits in terms of:
Hardware resources Operating costs Space/Real estate Data centre resources are also often underutilized by tenants Data centres tend to overprovision resources to tenants to prevent SLA violations Resources allocated to a tenant could be given to another instead! Even worse, sometimes it’s not used at all! Better ways of utilizing data centre resources must be identified

Cloud Goals Reduce overall operating costs
Increased physical resource utilization μ1 μ2 μ3 Want to achieve lower operating costs, usually through higher resource efficiencies Resource management layer comes into play

Cloud Goals Reduce overall operating costs
Increased physical resource utilization Reduce SLA violations Add resources/scale services that are about to violate the performance SLA Achieve high availability Replication Fault tolerance Even though we can share resources, don’t want to impact performance (SLAs) Still want to maintain high availability

Scaling Approaches Horizontal vs. Vertical Scaling
Granularity/Sizing Placement Reactive vs. Predictive Scaling Timing 2 characteristics HvsV: coarse vs fine grained (placement is implicit) PvsR: do we perform action for now/future

Horizontal and Vertical Scaling
Traditional scaling techniques use either horizontal or vertical scaling on its own Hybrid scaling leverages both horizontal and vertical scaling techniques simultaneously to achieve both high resource utilization and high availability Vertical Scaling Service Memory CPU Disk High load detected Horizontal Scaling Service High load detected Traditionally, scaling techniques applied to both VMs and containers either follow horizontal scaling or vertical scaling on its own. Horizontal is good for fault tolerance and availability Vertical is good for resource allocation granularity

Reactive and Predictive Scaling
Reactive scaling is popular in industry Predictive scaling much harder to accomplish in practice Trends are highly non-linear Difficult to account for external factors Reactive is simple, look at current/past values and accommodate (usually too late if spiky) Good when unexpected fluctuations occur (usually overprovisions with a % buffer) Predictive assumes trends exist (e.g. getting off work to play video games/annual sales) External factors such as fads, political events Examples are using ARIMA models or machine learning

Scaling Difficulties Multidimensional problem
There exists an infinite number of different resource configurations at a given point in time How many machines (horizontal scaling)? How many replicas per microservice (horizontal scaling)? How many resources per microservice (vertical scaling)? CPU Memory I/O Intractable to iterate over every possible configuration! Dynamic bin packing (alluded to in previous slide) How do we tackle this?

Contributions Develop novel predictive and reactive hybrid algorithms for Docker containers Resource allocation granularity of vertical scaling High availability and fault tolerance of horizontal scaling Quick response to sudden fluctuations and flash crowds of reactive scaling Pre-emptive resource allocation for workload trends of predictive scaling Explore both predictive and reactive sides

HyScale Architecture Autoscaler Monitor Load Balancers
Main components: Docker containers/ Microservices Load Balancers and Clients Node Manager (NM) Monitor and Autoscaler NM NM NM Due to lack of hybrid scaling architectures for containers, and to implement and benchmark/evaluate our scaling techniques Microservices + replicas run on nodes Clients speak to the server-side load balancers to identify which container instance to connect to. Node managers issue docker commands to bring up/down containers and resize Monitor is central arbiter which has view of all nodes and their usage, queries autoscaling module for scaling decisions and issues to NMs Autoscaler can have any scaling algorithm implemented onto it Load Balancers

Reactive Scaling: HyScaleres Algorithm
𝑀𝑖𝑠𝑠𝑖𝑛 𝑔 𝑚,𝑟𝑒𝑠 = 𝑟 𝑢𝑠𝑎𝑔 𝑒 𝑟 − 𝑟 𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑒 𝑑 𝑟 ∗𝑇𝑎𝑟𝑔𝑒 𝑡 𝑚 𝑇𝑎𝑟𝑔𝑒 𝑡 𝑚 𝑅𝑒𝑐𝑙𝑎𝑖𝑚𝑎𝑏𝑙 𝑒 𝑟,𝑟𝑒𝑠 =𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑒 𝑑 𝑟 − 𝑟 𝑢𝑠𝑎𝑔 𝑒 𝑟 𝑇𝑎𝑟𝑔𝑒 𝑡 𝑚 ∗0.9 𝑅𝑒𝑞𝑢𝑖𝑟𝑒 𝑑 𝑟,𝑟𝑒𝑠 = 𝑟 𝑢𝑠𝑎𝑔 𝑒 𝑟 𝑇𝑎𝑟𝑔𝑒 𝑡 𝑚 ∗0.9 −𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑒 𝑑 𝑟 The first approach I will show… Usage is past values seen - Reclaimable will tell us how much to scale down by + Acquired will tell us how much to scale up by

Evaluation: Microbenchmarks
“Microservices” running small CPU and memory workloads Workload is triggered upon receiving request from client Vary microservice resource type heterogeneity CPU only Memory only Mixture of both Emulate client request load to microservices Constant Wave Mem Mem + CPU CPU

Compared our algorithms against Google’s popular Kubernetes horizontal scaling technique
Lower response times for CPU (negligible failures)

Results: Resource Utilization
Constant CPU Experiment Wave CPU Experiment Average CPU Usage (%) RTECPU (ms/%) Kubernetes 37.38 90.88 HyScaleCPU 34.21 86.17 HyScaleCPU+Mem 35.41 67.41 Average CPU Usage (%) RTECPU (ms/%) Kubernetes 50.34 60.47 HyScaleCPU 44.72 59.43 HyScaleCPU+Mem 45.94 44.47 Average CPU utilization per node Explain RTE being a relative metric comparing response times overhead generated to resources expended HyScale generates less response time overhead per % of resource used

Predictive Approach: Machine Learning
ANNs (artificial neural networks) are usually trained on a static dataset Models old data trends well Known for its generalization capabilities Can result in poor predictions when encountering new data Requires dataset to encapsulate all anticipated data Online learning attempts to “learn” on the spot Allows model to incorporate new data trends Double-edged sword Overwrites old learned trends Next I want to introduce our predictive approach using machine learning Dataset coverage is good Online learning is especially good when you have lots of different jobs/workloads coming in dynamically (don’t have to do offline profiling)

Online Learning Technique
Experience buffer to encapsulate new and old trend information Old data samples are filtered by Pearson’s product-moment coefficient New data samples are pushed into a FIFO queue Experience buffer is randomly sampled to create a training batch Online training batch created for the ANN to train on in real-time

Results: Parameter Space Exploration
Parameter space search Learning Rate and Update Rate are about how fast model learns off new online training batches Similarity threshold and prior sample percentage are characteristics of the old data Numbers aren’t that important here, trends are important Sweet spot indicates where you don’t forget too much old, but learn enough new Update rate number of iterations before each training step

Conclusions Developed autoscaling architecture for benchmarking of various autoscaling algorithms Supports hybrid scaling of Docker containers Developed novel reactive hybrid algorithms (HyScale) Leverages availability of horizontal scaling and granularity of vertical scaling Developed novel predictive algorithm using ANNs and online learning Model learns new trends while retaining old trends

Future Work Implement predictive approach onto HyScale architecture for benchmarking Combine reactive and predictive approaches into a single algorithm Incorporate network and disk I/O into algorithms CRIU? Weighted? Support stateful microservices

Thank You! Questions?

Containers vs. Virtual Machines

Reactive Scaling Example: Kubernetes
Horizontally scales containers according to the following equations: 𝑢𝑡𝑖𝑙𝑖𝑧𝑎𝑡𝑖𝑜 𝑛 𝑟 = 𝑢𝑠𝑎𝑔 𝑒 𝑟 𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑒 𝑑 𝑟 𝑁𝑢𝑚𝑅𝑒𝑝𝑙𝑖𝑐𝑎 𝑠 𝑚 = 𝑟 𝑢𝑡𝑖𝑙𝑖𝑧𝑎𝑡𝑖𝑜 𝑛 𝑟 𝑇𝑎𝑟𝑔𝑒 𝑡 𝑚 To avoid thrashing, the following condition must also be met: 𝑟 𝑢𝑠𝑎𝑔 𝑒 𝑟 𝑁𝑢𝑚𝑅𝑒𝑝𝑙𝑖𝑐𝑎 𝑠 𝑚 ∗𝑇𝑎𝑟𝑔𝑒 𝑡 𝑚 −1 >0.1

Reactive Scaling Example: ElasticDocker
Vertically scales Docker containers Memory scaled up by 256MB and down by 128MB (threshold- based) vCPU cores added or removed based on CPU utilization thresholds (90% up / 70% down)

Reactive Scaling Example: Self-Adaptive
While response time SLA violated, increase resources above utilization threshold for that service, create new VM instance if necessary Threshold based removal of resources, VM removed if no more resources on it

Predictive Scaling Examples
Qazi Ullah uses ARNN and compares to ARIMA in the BitBrains fastStorage workload trace Binbin Song uses LSTMs for predicting workloads in the Google cloud trace

Predictive-Reactive Scaling Examples
Multi-tiered (VM horizontal) probability distribution for predictive and threshold based for reactive DoCloud (container horizontal) ARMA model for predictive and threshold based for reactive

CPU Container Experimental Setup*
A baseline microservice calculates 20,000 prime numbers per client request Response times are measured To create contention of CPU resources, the microservice is run alongside another third party container Progrium stress consumes CPU resources by performing sqrt(rand()) calculations μ1 μ2 This simulates CPU load on the system from the request/response framework that is inherent in a microservices architecture.

Docker CPU Vertical Scaling*
Docker CPU shares: Defines a container’s proportion of CPU cycles on a single machine μ1 μ2 1024 CPU shares 1024 CPU shares 3072 CPU shares By tuning the CPU shares allocated to each microservice, we effectively control their relative weights. For example, if two microservice containers run on a single machine with CPU shares of 1024 and 3072, the containers will have 1/4 and 3/4 of the access time to the CPU, respectively 4096 CPU shares 2048 CPU shares

Horizontal vs. Vertical Scaling: Experimental Scenarios
: μ1 μ2 1024 400% CPU 5 scenarios to see the effects of each For example, in the vertical scaling emulation, we allocated 1024 CPU shares to both the microservice and the progrium stress container, splitting CPU access time equally between the two. Assuming that nodes have 4 CPU cores, both are provisioned 2 CPU cores worth of access time. An equivalent horizontally-scaled resource allocation with 3 microservices running over 3 machines allocates 1024 and 5120 CPU shares to the microservice and the progrium stress container, respectively. This results in 1/6 of the CPU access time of each node for each microservice, again totaling to 2 CPU cores worth of access time (i.e., 1/6 of the 12 CPU cores). This equivalence was reproduced with 2 microservices and 2 nodes, and 4 microservices and 4 nodes. Horizontal vs. Vertical Scaling: Experimental Scenarios 1. No CPU contention

: : μ1 μ2 2048 1024 200% CPU For example, in the vertical scaling emulation, we allocated 1024 CPU shares to both the microservice and the progrium stress container, splitting CPU access time equally between the two. Assuming that nodes have 4 CPU cores, both are provisioned 2 CPU cores worth of access time. An equivalent horizontally-scaled resource allocation with 3 microservices running over 3 machines allocates 1024 and 5120 CPU shares to the microservice and the progrium stress container, respectively. This results in 1/6 of the CPU access time of each node for each microservice, again totaling to 2 CPU cores worth of access time (i.e., 1/6 of the 12 CPU cores). This equivalence was reproduced with 2 microservices and 2 nodes, and 4 microservices and 4 nodes. Horizontal vs. Vertical Scaling: Experimental Scenarios 2. CPU contention

: : μ1 μ2 μ1 μ2 4096 4096 100% CPU + 100% CPU = 200% CPU For example, in the vertical scaling emulation, we allocated 1024 CPU shares to both the microservice and the progrium stress container, splitting CPU access time equally between the two. Assuming that nodes have 4 CPU cores, both are provisioned 2 CPU cores worth of access time. An equivalent horizontally-scaled resource allocation with 3 microservices running over 3 machines allocates 1024 and 5120 CPU shares to the microservice and the progrium stress container, respectively. This results in 1/6 of the CPU access time of each node for each microservice, again totaling to 2 CPU cores worth of access time (i.e., 1/6 of the 12 CPU cores). This equivalence was reproduced with 2 microservices and 2 nodes, and 4 microservices and 4 nodes. Horizontal vs. Vertical Scaling: Experimental Scenarios 3. Equivalent horizontal scaling with 2 replicas

: : : μ1 μ2 μ1 μ2 μ1 μ2 6144 6144 6144 67% CPU + 67% CPU + 67% CPU = 200% CPU For example, in the vertical scaling emulation, we allocated 1024 CPU shares to both the microservice and the progrium stress container, splitting CPU access time equally between the two. Assuming that nodes have 4 CPU cores, both are provisioned 2 CPU cores worth of access time. An equivalent horizontally-scaled resource allocation with 3 microservices running over 3 machines allocates 1024 and 5120 CPU shares to the microservice and the progrium stress container, respectively. This results in 1/6 of the CPU access time of each node for each microservice, again totaling to 2 CPU cores worth of access time (i.e., 1/6 of the 12 CPU cores). This equivalence was reproduced with 2 microservices and 2 nodes, and 4 microservices and 4 nodes. Horizontal vs. Vertical Scaling: Experimental Scenarios 4. Equivalent horizontal scaling with 3 replicas

: : : : μ1 μ2 μ1 μ2 μ1 μ2 μ1 μ2 8192 8192 8192 8192 50% CPU + 50% CPU + 50% CPU + 50% CPU = 200% CPU For example, in the vertical scaling emulation, we allocated 1024 CPU shares to both the microservice and the progrium stress container, splitting CPU access time equally between the two. Assuming that nodes have 4 CPU cores, both are provisioned 2 CPU cores worth of access time. An equivalent horizontally-scaled resource allocation with 3 microservices running over 3 machines allocates 1024 and 5120 CPU shares to the microservice and the progrium stress container, respectively. This results in 1/6 of the CPU access time of each node for each microservice, again totaling to 2 CPU cores worth of access time (i.e., 1/6 of the 12 CPU cores). This equivalence was reproduced with 2 microservices and 2 nodes, and 4 microservices and 4 nodes. Horizontal vs. Vertical Scaling: Experimental Scenarios 5. Equivalent horizontal scaling with 4 replicas

Horizontal vs. Vertical Scaling Analysis
Contention over CPU resources introduces 17% overhead Replicated instances decrease overall CPU performance Overhead within applications (in our case, the JVM) replicated several times affect response times Preference for vertical scaling. Vertical scaling provided fastest request processing times when compared to the equivalently horizontally scaled instances. Although Docker containers, themselves, have negligible overhead, contention over shared CPU resources introduces significant overhead Contention would be further exacerbated by the presence of more co-located containers. Vertical Scaling Horizontal Scaling

Dockerized Microservices
Autoscaler Packages all libraries and dependencies within an isolated container Each Docker container hosts a microservice Microservices receive and perform task requests Each task consumes computing resources (CPU, memory, IO) Microservices periodically heartbeat the Node Manager (NM) Ensures liveness Provides resource usage information Monitor NM Load Balancers The heart of our architecture lies the actual Docker containers running the microservices hosted on each of the nodes. Docker containers, as we all know, package all libraries and dependencies within an isolated virtual container. In our architecture, each of these Docker container hosts a microservice that receive and perform task requests. These requests can come from clients as well as other services running on the same node as well as other nodes in the network. In our implementation, the microservices have been fitted with component that periodically heartbeats its Node Manager, sending information about the resource usage about the microservice to the Node Manager. (Point to the interaction on the diagram) The heartbeats are also used by the Node Manager to ensure liveness and availability of each container.

Node Manager Node Manager
Ensures containers are running via heartbeats Restarts service if crash detected Heartbeats the Monitor Ensures liveness Forwards resource utilization of all containers running on node to Monitor Performs resource update commands received by Monitor Reallocates resources from one container to another Uses docker-java client library to interface with Docker Node managers, as briefly mentioned on the previous slide, receive resource info from the containers as well as ensure that containers are running via the heartbeats. If a container fails, the service is then restarted. The node manager gathers all of the resource utilization information for all of its monitored microservices and forwards this to the central Monitor in the form of heartbeats. These heartbeats are also used by the monitor to ensure liveness for each of the nodes. Node manager uses Docker java client library to interface with Docker. After the monitor makes a resource reconfiguration and scaling decision, it will forward this to the Node managers who execute it using Docker java client interface, which is just an open source Docker interface written in Java.

Monitor and Autoscaler
Ensures nodes are running via heartbeats Receives resource usage information from all Node Managers (NM) Has a centralized view of resource utilization across all machines Performs resource adjustments Provides the autoscaler module cluster state information (e.g., monitored node usage) Receives resource adjustment decisions from the autoscaler Autoscaler The monitor is the main central arbiter of the system, tasked with gathering all the resource usage information from all the Node managers and making resource adjustment and scaling decisions. The resource adjustment decisions performed by the monitor are hybrid in nature; utilizing both vertical scaling and horizontal scaling. The monitor periodically invokes the cost function to identify the resource configuration with the lowest cost and adjusts/scales resources to reach the optimal cost effective resource configuration. The monitor also has a role in keeping the nodes available. The heartbeats sent by the Node Manager are used to keep track of liveness of the nodes by the Monitor. Monitor NM

Clients and Load Balancers
The load balancers have knowledge of all microservices and their respective IP addresses Perform load balancing for replicas of microservices Clients contact load balancer to discover service location Load balancer responds with IP address Afterwards, clients directly send requests to the IP address The load balancer has knowledge of all microservices, their replicas that exist because of horizontal scaling, as well as their respective IP addresses. It uses this information to load balance the incoming client requests among replicas of microservices Clients first contact the load balancer to discover the service location. The load balancer performs its balancing technique and responds with an IP address of one of the replicas. Afterwards, the clients will send all of its requests directly to the IP address.

HyScaleCPU Algorithm: CPU Shares
𝑡𝑜𝑡𝑎𝑙𝐶𝑃𝑈𝑆ℎ𝑎𝑟𝑒 𝑠 𝑛 =𝑛𝑢𝑚𝑅𝑒𝑝𝑙𝑖𝑐𝑎 𝑠 𝑛 ∗1024 𝐶𝑃𝑈 𝑆ℎ𝑎𝑟𝑒𝑠 𝑛𝑒𝑤𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑒𝑑𝐶𝑃𝑈 𝑠 𝑟 =𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑒 𝑑 𝑟 −𝑅𝑒𝑐𝑙𝑎𝑖𝑚𝑎𝑏𝑙𝑒𝐶𝑃𝑈 𝑠 𝑟 +𝐴𝑐𝑞𝑢𝑖𝑟𝑒𝑑𝐶𝑃𝑈 𝑠 𝑟 𝑛𝑒𝑤𝐶𝑃𝑈𝑆ℎ𝑎𝑟𝑒 𝑠 𝑟 = 𝑛𝑒𝑤𝑅𝑒𝑞𝑢𝑒𝑠𝑡𝑒𝑑𝐶𝑃𝑈 𝑠 𝑟 𝑛𝑢𝑚𝐶𝑃𝑈 𝑠 𝑛 ∗𝑡𝑜𝑡𝑎𝑙𝐶𝑃𝑈𝑆ℎ𝑎𝑟𝑒 𝑠 𝑛

HyScaleCPU+Mem Algorithm
𝑀𝑖𝑠𝑠𝑖𝑛𝑔𝑀𝑒 𝑚 𝑚 = 𝑟 𝑢𝑠𝑎𝑔 𝑒 𝑟 − 𝑟 𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑒 𝑑 𝑟 ∗𝑇𝑎𝑟𝑔𝑒 𝑡 𝑚 𝑇𝑎𝑟𝑔𝑒 𝑡 𝑚 𝑅𝑒𝑐𝑙𝑎𝑖𝑚𝑎𝑏𝑙𝑒𝑀𝑒 𝑚 𝑟 =𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑒 𝑑 𝑟 − 𝑟 𝑢𝑠𝑎𝑔 𝑒 𝑟 𝑇𝑎𝑟𝑔𝑒 𝑡 𝑚 ∗0.9 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑𝐶𝑃𝑈𝑀𝑒 𝑚 𝑟 = 𝑟 𝑢𝑠𝑎𝑔 𝑒 𝑟 𝑇𝑎𝑟𝑔𝑒 𝑡 𝑚 ∗0.9 −𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑒 𝑑 𝑟 𝐴𝑐𝑞𝑢𝑖𝑟𝑒𝑑𝑀𝑒 𝑚 𝑟 =𝑚𝑖𝑛 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑𝑀𝑒 𝑚 𝑟 , 𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒𝑀𝑒 𝑚 𝑟

Evaluation “Microservices” running miniature CPU and memory benchmarks
Benchmark is triggered during an incoming request from client Vary SLA heterogeneity for each microservice Vary microservice resource type heterogeneity Memory only CPU only Mixture of both Emulate client request load to microservices Constant Wave CPU Mem Mem Mem + CPU CPU Mem + CPU

User-perceived metrics
Explain response time Lower response times for CPU (negligible failures) Kubernetes lower failed requests due to replication adding more memory implicitly Connection failures make request return quickly lowering average response times

SLA metrics Explain num SLA violations, violation time, and total violation cost Lower violation time and and costs for hyscale algos Wavemix failed requests return too quickly and reduce the number of violations

HyScaleCPU is better due to lots of docker update commands being issued

Results: Resource Utilization for ConstantCPU and WaveCPU Experiments
CPU Usage (%) Average Min Max RTECPU (ms/%) Kubernetes 50.34 21.56 83.75 60.47 HyScaleCPU 44.72 40.89 53.18 59.43 HyScaleCPU+Mem 45.94 35.96 54.95 44.47 Explain RTE being a relative metric comparing response times overhead generated to resources expended HyScale generates less response time overhead per % of resource used CPU Usage (%) Average Min Max RTECPU (ms/%) Kubernetes 37.38 8.98 67.13 90.88 HyScaleCPU 34.21 26.10 39.90 86.17 HyScaleCPU+Mem 35.41 32.63 41.19 67.41

Results: Resource Utilization for ConstantMix
CPU Usage (%) Mem Usage (%) Average Min Max Kubernetes 38.86 24.08 62.21 42.54 27.78 50.79 HyScaleCPU 36.25 33.42 42.56 41.00 28.21 48.61 HyScaleCPU+Mem 37.48 34.72 46.81 41.03 28.17 48.10 RTECPU (ms/%) RTEMem (ms/%) Kubernetes 49.28 45.02 HyScaleCPU 61.99 54.80 HyScaleCPU+Mem 42.26 38.61

Results: Resource Utilization for the WaveMix Experiment
CPU Usage (%) Mem Usage (%) Average Min Max Kubernetes 16.30 9.72 21.58 42.08 31.66 47.82 HyScaleCPU 19.98 17.03 23.11 41.04 27.68 48.70 HyScaleCPU+Mem 24.22 22.13 28.74 42.16 29.35 50.36 RTECPU (ms/%) RTEMem (ms/%) Kubernetes 96.38 37.33 HyScaleCPU 73.02 35.55 HyScaleCPU+Mem 62.59 35.96 RTEmem is skewed due to low response times generated by failed requests Note that most real spikey world workloads are represented by wave mix

Evaluation: BitBrains Workload
GWA-T-12 BitBrains Rnd workload trace containing 500 VMs Add RTE table

Results: Resource Utilization for BitBrains Workload
CPU Usage (%) Mem Usage (%) Average Min Max Kubernetes 10.57 5.66 33.19 21.01 19.41 22.63 HyScaleCPU 12.52 4.90 67.48 20.75 19.43 22.09 HyScaleCPU+Mem 8.08 6.05 15.40 21.30 19.50 23.19 RTECPU (ms/%) RTEMem (ms/%) Kubernetes 68.21 34.32 HyScaleCPU 113.34 68.39 HyScaleCPU+Mem 55.82 21.17

ANN Models xt-n Basic ANN LSTM LN-LSTM Multi-layer LN-LSTM
Convolutional LN-LSTM xt+1 xt Blackbox: any ANN could be used for the online learning technique (so we tried a few) Explain basic input and output (black box) of models

Unrolled RNN Layer

LSTM Block

Custom Convolutional LN-LSTM
Convolutional filters reduce frequency variance in input Lstms model input in time Custom Convolutional LN-LSTM

Prediction Errors of Pre-trained ANNs
ANN Model RMSE (%) Standard ANN 4.1264 LSTM 4.0696 LN-LSTM 4.0118 Multi-layer LN-LSTM 4.0163 Conv + LN-LSTMs 3.9302 36 sample history SMA = 8.99%

Results: Parameter Space Exploration
3.7694/ = % 3.7694/ = % Almost 9% gap closure to perfect predictions or More than 4% gap closure to perfect predictions Diminishing returns

HyScale: Hybrid Scaling of Dockerized Microservices Architectures

Similar presentations

Presentation on theme: "HyScale: Hybrid Scaling of Dockerized Microservices Architectures"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HyScale: Hybrid Scaling of Dockerized Microservices Architectures

Similar presentations

Presentation on theme: "HyScale: Hybrid Scaling of Dockerized Microservices Architectures"— Presentation transcript:

Similar presentations

About project

Feedback