Azure Best Practices How to Successfully Architect Windows Azure Apps for the Cloud 13-Mar-2013 (1:00 PM EDT) Bill Wilder An App in the Cloud is not (necessarily)

Slides:



Advertisements
Similar presentations
Architecting to be Cloud Native On Windows Azure or Otherwise
Advertisements

“Try not. Do, or do not. There is no try.” - Yoda
System Center 2012 R2 Overview
What’s New in Windows Azure A platform overview + how it can fit into my development shop today… New England Microsoft Dev Group 06-June-2013 (6:30-8:30.
Microsoft Azure Cloud Platform an overview
Page 1 Ricardo Villalobos Windows Azure Architect Evangelist Microsoft Corporation Designing, Building, and Deploying Windows Azure applications.
Overview Of Microsoft New Technology ENTER. Processing....
Big Ideas in Software Architecture (in cloud or otherwise) 14-December-2011 Copyright (c) 2011, Bill Wilder – Use allowed under Creative Commons license.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Google AppEngine. Google App Engine enables you to build and host web apps on the same systems that power Google applications. App Engine offers fast.
Plan Introduction What is Cloud Computing?
N-Tier Architecture.
Joan Wortman Architecting for the Cloud Bill Wilder An App in the Cloud is not a Cloud-Native App Boston Code Camp #19 08-Mar-2013 (2:50 – 4:00 PM EDT)
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
Cross Platform Mobile Backend with Mobile Services James
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
Introduction To Windows Azure Cloud
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
Cloud Computing Kwangyun Cho v=8AXk25TUSRQ.
@codingoutloud © 2014 Development Partners Software Corporation Meet Windows Azure, Your Next Data Center nearing.
Monitoring Latency Sensitive Enterprise Applications on the Cloud Shankar Narayanan Ashiwan Sivakumar.
Windows Azure Tour Benjamin Day Benjamin Day Consulting, Inc.
Your First Azure Application Michael Stiefel Reliable Software, Inc.
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.
Overview of Cloud Computing Sven Rosvall ACCU
Microsoft Azure SoftUni Team Technical Trainers Software University
What is the cloud ? IT as a service Cloud allows access to services without user technical knowledge or control of supporting infrastructure Best described.
Server Virtualization
Except where noted contents © 2014 Development Partners Software Corporation Cloud Architecture Anti-Patterns.
Welcome! 4:00 – 4:15 PM: socialize 4:15 – 5:00 PM: Overview of Microsoft Azure cloud platform toolbox 5:00 – 5:30 PM: networking break with snacks & food.
Cloud-Native in Azure Zoran B. Djordjevic’s CSCI E-175 Cloud Computing and Software as a Service class at Harvard University 19-November-2010 Copyright.
Text Microsoft to Or Tweet #uktechdays Questions?
 Mike Martin  Architect  MEET Member  Crew Member of Azug  Windows Azure Insider  Windows Azure MVP  
Windows Azure Web Sites Second-generation PaaS Boston Cloud Meetup 14-January-2014 (00:30) Boston Azure User Group
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Web Technologies Lecture 13 Introduction to cloud computing.
Scalability == Capacity * Density.
Azure in a Day Training: Windows Azure Module 1: Windows Azure Overview Module 2: Development Environment / Portal – DEMO: Signing up for Windows Azure.
Cloud Architecture Patterns for Mere Mortals New England Code Camp #16 29-October-2011 Copyright (c) 2011, Bill Wilder – Use allowed under Creative Commons.
Except where noted contents © 2014 Development Partners Software Corporation the Microsoft Azure.
Except where noted contents © 2014 Development Partners Software Corporation the Microsoft Azure.
Architecture Patterns for Building Cloud-Native Applications NYC Code Camp 7 15-September-2012 (10:45 – noon) Boston Azure User Group
Enabling the Cloud OS Today  New high-density Web Sites with elastic cloud scaling and complete dev-ops experiences  New rich IaaS experience for self-service.
Microsoft Cloud Computing. Topics to be covered 1.Environmental Features of windows azure 2.What is Cloud Computing 3.Roles in Cloud Computing 4.Benefits.
Hello Cloud… Mike Benkovich
Building Cloud Solutions Presenter Name Position or role Microsoft Azure.
(re)-Architecting cloud applications on the windows Azure platform CLAEYS Kurt Technology Solution Professional Microsoft EMEA.
Building web applications with the Windows Azure Platform Ido Flatow | Senior Architect | Sela | This session.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
Migration of Real Product into Windows Azure Lessons Learned.
Cloud-Native Architecture Patterns (Or… why your pre-cloud architecture won’t work so well in the cloud) Azure Florida Association 28-March-2012 Boston.
Boston Code Camp October-2012 (1:30 – 2:40)
Cloud Architecture Patterns for Mere Mortals
Deploying Web Application
Maximum Availability Architecture Enterprise Technology Centre.
Logo here Module 3 Microsoft Azure Web App. Logo here Module Overview Introduction to App Service Overview of Web Apps Hosting Web Applications in Azure.
Microsoft Ignite /22/2018 3:27 PM BRK2121
Exploring Azure Event Grid
Architecture Patterns for Scalability & Reliability
Hello Farmington! 4:30-5:30, then dinner.
Serverless CQRS in Azure!
New England Code Camp October-2010
Outline Virtualization Cloud Computing Microsoft Azure Platform
Design pattern for cloud Application
DevBoston 07-February-2013 (6:00 PM)
Building global and highly-available services using Windows Azure
Making Windows Azure Relevant to IT Professionals
Presentation transcript:

Azure Best Practices How to Successfully Architect Windows Azure Apps for the Cloud 13-Mar-2013 (1:00 PM EDT) Bill Wilder An App in the Cloud is not (necessarily) a Cloud-Native App

Who is Bill Wilder?

Roadmap for this talk… … 1.App in the Cloud != Cloud App (or at least not a Cloud-Native App) 2.Put Cloud-Native in context of cloud platform types from software development point of view 3.How to keep running when things go wrong? 4.How to scale? 5.How to minimize costs? Assumptions: – You know what “the cloud” is – so we can focus on application architecture using cloud as a toolbox – You are interested in understanding cloud-native apps ?

The term “cloud” is nebulous…

“Bring Your Own” ____ as a Service NIST: Most productive platforms for Cloud-Native Apps

What is different about the cloud? What's different about the cloud? ^ public

1/9 th above water  TTM & Sleeping well =

MTBF MTTR commodity hardware + multitenant services = cost-efficient cloud failure is routine (so you better be good at handling it)

This bar is always open *and* has an API Pay by the Drink

∞ Resource allocation (scaling) is: – Horizontal – Bi-directional – Automatable The “illusion of infinite resources”

Cloud-Native Applications have their Application Architecture aligned with the Cloud Platform Architecture – Use the platform in the most natural way – Let the platform do the heavy lifting where appropriate – Take responsibility for error handling, self- healing, and some aspects of scaling Cloud-Native Application Characteristics

3- or N-tier, SOA Multi-data center Horizontal scaling Expects failure PaaS Traditional Cloud-Native 2-tier Single data center Vertical scaling Ignores failure Hardware or IaaS Less flexible More manual/attention Less reliable (SPoF) Maintenance window Less scalable, more $$ Agile/faster TTM Auto-scaling Self-healing HA Geo-LB/FO TELLS/CLUES CONSEQUENCES Tells: Traditional vs Cloud-Native   Which is “best” architecture? There is no “best” architecture – it is situational, a Technical Business Decision. Cloud-native popularity growing in proportion to the shrinking cost and competitive benefits.

Putting Cloud Services to work Putting the cloud to work

Web Tier pageofphotos.com Original Approach 2-tier architecture Stateful web nodes Pros Well understood Easy to get working [Potential] Cons UX fails for upgrades, hardware failures, app pool recycling Limited scale Not Cloud-Native Database /maura

Web Tier pageofphotos.com 1.Scale web tier (stateless) 2.Scale service tier (async) 3.Scale data tier (shard) All while… handling failure and optimizing for cost- & operational- efficiency Scale the app, not the team! Database Service Tier Database /maura

Horizontal Scaling Compute Pattern pattern 1 of 5

Common Terminology: Scaling Up/Down  Vertical Scaling Scaling Out/In  Horizontal “Scaling”  But really is Horizontal Resource Allocation Architectural Decision – Big decision… hard to change Vertical Scaling vs. Horizontal Scaling

Vertical Scaling (“Scaling Up”). Resources that can be “Scaled Up” Memory: speed, amount CPU: speed, number of CPUs Disk: speed, size, multiple controllers Bandwidth: higher capacity pipe … and it sure is EASY Downsides of Scaling Up Hard Upper Limit HIGH END HARDWARE  HIGH END CO$T Lower value than “commodity hardware” May have no other choice (architectural)

Horizontal Scaling (“Scaling Out”) Autonomous nodes for scalability (stateless web servers, shared nothing DBs, your custom code in QCW) Autonomous nodes *and* Homogeneous nodes for operational simplicity *and* Anonymous nodes don‘t get emotionally involved! This is how a [public] CLOUD PLATFORM works *and* This is how YOUR CLOUD-NATIVE app works

Load Balancer (Cloud Service) Managed VMs (Cloud Service) “Web Role” Example: Web Tier

1.Auto-Scale Bidirectional 2.Nodes can fail Releasing VM resources (e.g., via Auto-Scale) is one cause Handle shutdown signals Externalize session state e.g., see ASP.NET Session State Providers for Azure Tables, Azure Cache N+1 rule as UX optimization Horizontal Scaling Considerations

How many users does your cloud-native application need before it needs to be able to horizontally scale? ?

What’s the difference between performance and scale? ?

Queue-Centric Workflow Pattern (QCW for short) pattern 2 of 5

Extend into a new Service Tier QCW enables applications where the UI and back-end services are Loosely Coupled [ Similar to CQRS Pattern ]

Web Tier pageofphotos.com Add service tier (async) Leave Web Tier to do what it’s good at Database Service Tier /maura

QCW Example: User Uploads Photo Web Tier Service Tier Reliable Queue Reliable Storage

QCW WE NEED: Compute (VM) resources to run our code Reliable Queue to communicate Durable/Persistent Storage

Where does Windows Azure fit?

QCW [on Windows Azure] WE NEED: Compute (VM) resources to run our code Web Roles (IIS – Web Tier) Worker Roles (w/o IIS – Service Tier) Reliable Queue to communicate Azure Storage Queues Durable/Persistent Storage Azure Storage Blobs

QCW on Azure: User Uploads a Photo Web Role (IIS) Web Role (IIS) Worker Role Worker Role Azure Queue Azure Blob UX implications: how does user know thumbnail is ready? push pull

QCW enables Responsive UX Response to interactive users is as fast as a work request can be persisted Time consuming work done asynchronously Comparable total resource consumption, arguably better subjective UX UX challenge – how to express Async to users? – Communicate Progress – Display Final results – Long Polling/Web Sockets (e.g., SignalR or Node.io)

QCW enables Scalable App Decoupled front/back provides insulation – Blocking is Bane of Scalability – Order processing partner doing maintenance – Twitter down – server unreachable – Internet connectivity interruption Loosely coupled, concern-independent scaling – (see next slide) – Get Scale Units right – Key to optimizing operational CO$T$

General Case: Many Roles, Many Queues Web Role (IIS) Web Role (IIS) Worker Role Worker Role Web Role (IIS) Web Role (IIS) Web Role (Public) Web Role (Public) Worker Role Worker Role Worker Role Worker Role Worker Role Type 1 Worker Role Type 1 Worker Role Worker Role Worker Role Worker Role Worker Role Worker Role Worker Role Type 2 Worker Role Type 2 Queue Type 1 Queue Type 2 Queue Type 1 Queue Type 2 Queue Type 3 Scaling is best when Investment α Benefit Optimize for CO$T EFFICIENCY Logical vs. Physical Architecture depends on current scale Worker Role Type 2 Worker Role Type 2 Worker Role Type 2 Worker Role Type 2 Worker Role Type 2 Worker Role Type 2 Web Role (Admin) Web Role (Admin)

Reliable Queue & 2-step Delete Web Role Web Role Worker Role Worker Role var url = “ queue.AddMessage( new CloudQueueMessage( url ) ); var invisibilityWindow = TimeSpan.FromSeconds( 10 ); CloudQueueMessage msg = queue.GetMessage( invisibilityWindow ); // do all necessary processing… Queue queue.DeleteMessage( msg );

QCW requires Idempotent Perform idempotent operation more than once, end result same as if we did it once Example with Thumbnailing (easy case) App-specific concerns dictate approaches – Compensating action, Last write wins, etc. PARTNERSHIP: division of responsibility between cloud platform & app  Transaction cannot span database + queue

QCW expects Poison Messages A Poison Message cannot be processed – Error condition for non-transient reason – Check CloudQueueMessage.DequeueCount property Falling off the queue may kill your system Determine a Max Retry policy per queue – Delete, put on “bad” queue, alert human, …

QCW requires “Plan for Failure” VM restarts will happen – Hardware failure, O/S patching, crash (bug) Bake in handling of restarts into our apps – Restarts are routine: system “just keeps working” – Idempotent mindset is key – Event Sourcing (commonly seen with CQRS) may help Not an exception case! Expect it! Consider N+1 Rule

Aside: Is QCW same as CQRS? Short answer: “no” CQRS – Command Query Responsibility Segregation Commands change state Queries ask for current state Any operation is one or the other Sometimes includes Event Sourcing Sometimes modeled using Domain Driven Design (DDD)

What about the Data? You: Azure Web Roles and Azure Worker Roles – Taking user input, dispatching work, doing work – Follow a decoupled queue-in-the-middle pattern – Stateless compute nodes Cloud: “Hard Part”: persistent, scalable data – Azure Queue & Blob Services – Three copies of each byte – Blobs are geo-replicated – Busy Signal Pattern

Database Sharding Pattern pattern 3 of 5

Extend example into Data Tier What happens when demands on data tier outgrow one physical database?

Web Tier pageofphotos.com Scale data tier (shard) Sharding is horizontal scaling for databases. Unlike compute nodes, databases are not stateless. Database Service Tier Database /maura Database

Database Sharding Problem: too much for one physical database – Too much data (e.g., 150 GB limit in WASD) – Not sufficiently performant Solution: split data across multiple databases – One Logical Database, multiple Physical Databases Each Physical Database Node is a Shard Goal is a Shared Nothing design & single shard handles most common business operations – May require some denormalization (duplication)

All shards have same schema SHARDS

Sharding is Difficult What defines a shard? (Where to put/find stuff?) – Example – by HOME STATE: customer_ma, customer_ia, customer_co, customer_ri, … – Design to avoid query / join / transact across shards What happens if a shard gets too big? – Rebalancing shards can get complex – Foursquare case study is interesting Cache coherence, connection pool management – Rolling-your-own is complex

Where does Windows Azure fit?

Windows Azure SQL Database (WASD) is SQL Server… with a few diffs… Common SQL Server Specific (for now) WASD Specific “Just change the connection string…” Full Text Search Transparent Data Encryption (TDE) Many more… Limitations 150 GB size limit Busy Signal Pattern Extra Capabilities Managed Service Highly Available Rental model Federations Additional information on Differences:

Windows Azure SQL Databse Federations for Sharding Single “master” database – “Query Fanout” makes partitions transparent – Instead of customer_ma, customer_ia, etc… we are back to customer database Handles redistributing shards Handles cache coherence and simplifies connection pooling No MERGE (yet); SPLIT only Bonus feature for Multitenant Applications USE FEDERATION myfed (myfedkey = 911) WITH FILTERING=ON RESET connectivity-model-for-federated-data.aspx connectivity-model-for-federated-data.aspx

Key Take-away Database Sharding has historically been an APPLICATION LAYER concern Windows Azure SQL Database Federations supports sharding lower in the stack as a DATABASE LAYER concern

My database instance is limited to 150 GB. ∞ ∞ ∞ Does that mean the cloud doesn’t really offer the illusion of infinite resources? ?

Busy Signal Pattern pattern 4 of 5

Language/Platform SDKs on TOPAZ from Microsoft P&P: All have Retry Policies

Auto-Scaling Pattern pattern 5 of 5

Goal is AUTOSCALING – using a library or services Microsoft “WASABi” block from P&P (you run it) MetricsHub is in the Azure store (very basic service) Third Party Services A few SaaS choices for Auto-Scaling and Monitoring

in conclusion In Conclusion

Optimize for MTTR (1/2) Apply Busy Signal Pattern – Retry transient failures due to issues with network, throttling, failovers – Applies to all cloud services Apply Node Failure Pattern – Stateless Nodes, QCW Pattern, handle node shutdown signals, covers nodes going away due to scaling action – Consider N+1 Rule Detect Poison Messages – Protect against Bad Data

Optimize for MTTR (2/2) Prevent Resource Failures – Environmental-signal-based Auto-Scaling (for surprises) – Proactive Auto-Scaling for known spikes (e.g., Superbowl Ad, lunch rush) – QCW Pattern (allow work to pile up w/o blocking users) Log Everything – Gather logs with Windows Azure Diagnostics

Typical SiteAny 1 Role InstOverall System Operating System Upgrade Application Code Update Scale Up, Down, or In Hardware Failure Software Failure (Bug) Security Patch What’s Up? Reliability as EMERGENT PROPERTY

Optimize for Cost Operational Efficiency Big Factor – Human costs can dominate – Automate (CI & CD and self-healing) – Simplify: homogeneous nodes Review costs billed (so transparent!) – Be on lookout for missed efficiencies “Watch out for money leaks!” – Inefficient coding can increase the monthly bill Prefer to Buy Rent rather than Build – Save costs (and TTM) of expensive engineering

Optimize for Scale With the right architecture… – Scale efficiently (linearly) – Scale all Application Tiers – Auto-Scale – Scale Globally (8/24 data centers) Use Horizontal Resourcing Use Stateless Nodes Upgrade without Downtime, even at scale Do not need to sacrifice User Experience (UX) ∞

Cloud Architecture Patterns book Primer Chapters 1.Scalability 2.Eventual Consistency 3.Multitenancy and Commodity Hardware 4.Network Latency

Cloud Architecture Patterns book Pattern Chapters 1.Horizontally Scaling Compute Pattern 2.Queue-Centric Workflow Pattern 3.Auto-Scaling Pattern 4.MapReduce Pattern 5.Database Sharding Pattern 6.Busy Signal Pattern 7.Node Failure Pattern 8.Colocate Pattern 9.Valet Key Pattern 10.CDN Pattern 11.Multisite Deployment Pattern

BostonAzure.org Boston Azure Cloud User Group Focused on Microsoft’s Public Cloud Platform Roles: Architect, Dev, IT Pro, DevOps (“WazOps”) Talks, Demos, Tools, Hands-on, special events, … Monthly, 6:00-8:30 PM in Boston area (free) Follow on More info or to join our Meetup.com group:

Business Card

My name is Bill Wilder professional ·· ·· ·· blog.codingoutloud.com ·· Bill Wilder Find this slide deck here!

Questions? Comments? More information? ?