Building Scalable Resilient Websites in Azure Steve Spencer Solution Architect, Liberis, Nottingham Microsoft Azure MVP @sdspencer http://blogs.recneps.org
Agenda What is resilient and scalable? Why do we care? Building Scalability Building Resilience How can Microsoft Azure help? Tips and Tricks
What is resilient and scalable? “Resilience is the ability to provide and maintain an acceptable level of service in the face of faults and challenges to normal operation” “Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged in order to accommodate that growth”
Why do we care? We want to provide a good service We want the web site to be quick We want the web site to work even when things go wrong Our site makes us money, so we don’t want things to go wrong We want to make a good impression
Building Scalability Scale Up Scale Out Add More resources to an existing server e.g. Cores, RAM, Disk space Easier to scale existing applications Limited by cost and physics Scale Out Add more of the same servers More difficult to scale existing applications More cost effective for large scale Likely to need infrastructure and application changes
Building Scalability Web Site Web Services Business Logic Data Access
Building Scalability Web Site Web Site Web Site Web Site Web Site Web Services Web Services Web Services Web Services Business Logic Business Logic Data Access Data Access
Scaling out – Application Issues File access Files written to local disk not accessible to other nodes Can use file share or data store (e.g. document database like Mongo or Redis) Session State In memory session not available to other nodes Caching In memory cache needs distributing to all nodes Shared Resource contention E.g. Writing to the same file Bottlenecks E.g. Database
Scale out – Infrastructure Issues Routing Load balancing File Sharing Security, Access etc Accessing Shared Resources (e.g. Database) IP whitelisting, Firewalls etc Bandwidth Latency Legacy Hardware/Software e.g. Mainframe Configuration Management
Assume things will go wrong and deal with it Building Resilience Handle Failure Disaster Recovery Transient failures Cyber attacks Upgrading/Maintenance Assume things will go wrong and deal with it Understand bottlenecks and points of failure What happens if something fails? What do you want to happen?
Handling Failure – Multiple Instances Same DC Different DC Helps mitigate DR, Cyber Attacks & Upgrading/Maintenance Has the same issues as Scale Out
Handling Failure – Retry Logic Distributed systems fail Accessing things over a network is unreliable Retry calls to network resources if a failure occurs Add a limit to the number of times to retry Increase the timeout between retries Randomise Helps mitigate against transient failures
Handling Failure – UI Don’t just fail in the UI Display a maintenance screen for planned outages Show as much information as possible but highlight time critical information as being out of date Use caching in the UI
If stale data not acceptable then show message for that data Handling Failure – UI If stale data not acceptable then show message for that data Allow users to do as much as possible Use queuing for updates Cache locally Make dynamic pages static so not always pulling data from data store Make xml file containing the data. Only useful for data that doesn’t change often e.g. Account history – mainly static data for each account that updates over time
Single Points of failure Single points of load Bottlenecks Single Points of failure Single points of load Normally difficult to scale Types Legacy components Databases Logging Understand your application & identify potential bottlenecks Load/performance test
How can Microsoft Azure help? Cloud based not on premises Multi-tenant environment Windows and Linux Any Language (.Net, Java, Node.JS, Python etc) Collection of services to build applications Different hosting models Infrastructure as a Service (IaaS) e.g. Virtual Machines Platforms as a Service (PaaS) e.g. WebApps, Service Fabric Software as a Service (SaaS) e.g. Office 365 (Exchange, Lync, SharePoint), CRM 365 Container as a Service (CaaS) e.g. Azure Container Service (currently Linux only) Scale In and Out – Pay for what you use (Opex not Capex)
Issues moving to the cloud Legacy components Can they run in the cloud? Does the cloud support the legacy OS? Access to on premises resources E.g. Mainframe access, CRM, ERP, Legacy component etc File access Can you use file shares in the cloud? Security Cloud is in the public domain so need to make sure things are secure Don’t use Pass@word1
How can Microsoft Azure help? Multi-region Geo-redundancy Hosting Azure websites Virtual Machines Service Fabric Data SQL DB & Elastic DB Redis Cache DocumentDB Blob & CDN
How can Microsoft Azure help? Networking & Integration Traffic Manager Service Bus Hybrid Connections Security Azure AD (AAD) AAD B2C MFA Key Vault Tools TFS Online Load Testing Dev/Test Labs
Three instances of data across a single data centre Blobs Scalable File storage Three instances of data across a single data centre Automatically load balances Can be configured as Geo-Redundant with Read Only Access Can create SMB Share Access Use Blob API (Private Blob) Open public URL (Public Blob) Public URL with SAS (Private Blob with Shared Access Signature)
Databases Standard SQL Elastic Database Scale Up Threat Detection and Alerts Automatic Tuning Point in Time Restore High On-Prem SQL Compatibility Geo-Replication Elastic Database Scale Out Sharding
Azure Websites Platform as a Service You only need to manage your application Alerts and Monitoring Manual or Auto scale Web jobs Good for backend processing Can be scaled out separately Multiple deployment slots Staging A-B testing VIP swap Easy to automate deployment Supports Hybrid Connections
Traffic Manager DNS based routing Across data centres Can handle a single total datacentre failure Can access any service with public access Non-Microsoft Clouds Own hosting Can be configured for Priority (DR) Weighted (Round Robin) – Can be used for gradual upgrade Performance (Fastest)
CDN Content Delivery Network Integrates with Azure Blob Storage Global Coverage – More than just Azure Data centre Caches Blobs local to calling client Integrates with Blobs with different URL
Tips and Tricks – No Code Changes Autoscale Move load from website e.g. use blobs and cdn User Redis for caching ASP.NET Session State Provider for Azure Redis Cache (config change + nuget) Load test and measure performance Helps identify bottlenecks Provides baseline data to help understand limits for scaling
Tips and Tricks – Code Changes Don’t wait, use Async Use Queues Load balance back end More resilient Use Geo-Redundant Read Only data Database Sharding or Partitioning E.g. use multiple databases
Summary Applications & Infrastructure will need to change for scale out and resilience Things fail in a networked environment so be prepared Know your application and understand the bottlenecks Azure offers support for Scaling and Resilience Make use of the tools and services available to make your website scalable and resilient
Thank you for listening. Any Questions? @sdspencer http://blogs.recneps.org