Presentation is loading. Please wait.

Presentation is loading. Please wait.

“Try not. Do, or do not. There is no try.” - Yoda Yoda finally admits he does not understand exception handling...

Similar presentations

Presentation on theme: "“Try not. Do, or do not. There is no try.” - Yoda Yoda finally admits he does not understand exception handling..."— Presentation transcript:

1 “Try not. Do, or do not. There is no try.” - Yoda Yoda finally admits he does not understand exception handling...

2 Code Reuse: A practice in which other people get to use the code that I wrote.

3 “There are 2 hard problems in computer science: caching, naming things, and off-by-1 errors…” - (Source unknown)

4 How Architecting for the Cloud is Different Align your application’s architecture with the architecture of the cloud… NH Azure/.NET/Cloud Group 15-May-2013 (6:00 PM) Boston Azure User Group Bill Wilder Bill Wilder

5 My name is Bill Wilder Bill Wilder

6 Who is Bill Wilder?

7 I will ass-u-me… 1.You know what “the cloud” is 2.You have an inkling about Amazon Web Services and Windows Azure cloud platforms 3.You understand that such cloud platforms include compute services [like hosted virtual machines (VMs), in both IaaS and PaaS modes], SQL and NoSQL database services, file storage services, messaging, DNS, management, etc. 4.You are interested in understanding cloud- native applications and why that’s better than deploying my old-school app to the cloud “as is”

8 Roadmap for rest of talk… … 1.Lightning-fast overview of Windows Azure 2.Cover three specific patterns for building cloud-native applications 3.Mention some other patterns along the way Q&A during talk is okay (time permitting) Q&A at end with any remaining time Okay to reach out through or twitter ?

9 Windows Azure Portal General information Management Portal

10 NIST Terminology SaaS = Software as a Service (BYO users) PaaS = Plaform as a Service (BYO apps) IaaS = Infrastructure as a Service (BYO VMs) Simplicity Complexity Flexibility Rigidity Power? Power?

11 So Architecting for the (Windows Azure, AWS, GAE, …) Cloud is Different… WHY DID THEY (Microsoft, Amazon, Google, …) DO THIS TO US? But Why?

12 Know the rules “If I had asked people what they wanted, they would have said faster horses.” - Henry Ford Faster horses would not have addressed the horse manure problem … late 1800s.. 150k horses in NYC x 20 lbs manure/day/horse = 3 million lbs of manure per day

13 Know the rules “If I had asked IT departments what they wanted, they would have said IaaS.” - Henry Cloud

14 Cloud Platform Characteristics Scaling – or “resource allocation” – is horizontal – and ∞ (“illusion of infinite resources”) Resources are easily added or released – self-service portal or API; cloud scaling is automatable Pay only for currently allocated resources – costs are operational, granular, controllable, and transparent Optimized for cost-efficiency – cloud services are MT, hardware is commodity – MTTR over MTTF Rich, robust functionality is simply accessible – like an iceberg

15 Cloud-Native Application Characteristics Application architecture is aligned with the cloud platform architecture – uses the platform in the most natural way – lets the platform do the heavy lifting

16 Cloud-Native Application Characteristics Application architecture is aligned with the cloud platform architecture – uses the platform in the most natural way – lets the platform do the heavy lifting GO WITH THE FLOW Cloud (Azure) ≠ hosting Don’t fight it!

17 The term “cloud” is nebulous… The definition of “Cloud” is nebulous…


19 What is different about the cloud? What's different about the cloud? ^ public

20 1/9 th above water  TTM & Sleeping well = SOA

21 MTBF MTTR commodity hardware + multitenant services = cost-efficient cloud failure is routine (so you better be good at handling it) Architectural Assumptions

22 “Try not. Do, or do not. There is no try.” - Yoda try { foo.ThisCanThrow(); } catch (Exception ex) { // … } Yoda not a good cloud developer would make

23 Loosely Coupled & Eventually Consistent Data & Workflow Architecture

24 This bar is always open *and* has an API Pay by the Drink $

25 ∞ Resource allocation (scaling) is: – Horizontal – Bi-directional – Automatable The “illusion of infinite resources” Resource Allocation

26 Integrated Surface Area

27 Simple idea, simple app Two-tiers: web tier (one server) + database What’s the problem? But… what’s WRONG with this architecture? Different ≠ WRONG. Use the right tool for the job. Some apps are simply not good fit for cloud. ?

28 Simple idea, simple app Two-tiers: web tier (one server) + database What can go wrong We’ll reexamine 1.Scaling the web tier 2.Scaling the service tier 3.Scaling the data tier 4.Handling failure 5.Operational efficiency (scale the app, not the team!)

29 Horizontal Scaling Compute Pattern pattern 1 of 3

30 What’s the difference between performance and scale? ?

31 Common Terminology: Scaling Up/Down  Vertical Scaling Scaling Out/In  Horizontal “Scaling”  But really is Horizontal Resource Allocation Architectural Decision – Big decision… hard to change Scale Up (and Scale Down??) vs. Horizontal Resourcing

32 Vertical Scaling (“Scaling Up”). Resources that can be “Scaled Up” Memory: speed, amount CPU: speed, number of CPUs Disk: speed, size, multiple controllers Bandwidth: higher capacity pipe … and it sure is EASY Downsides of Scaling Up Hard Upper Limit HIGH END HARDWARE  HIGH END CO$T Lower value than “commodity hardware” May have no other choice (architectural)

33 Scaling Horizontally: Adding Boxes Autonomous nodes for scalability (stateless web servers, shared nothing DBs, your custom code in QCW) Autonomous nodes *and* Homogeneous nodes for operational simplicity *and* Anonymous nodes don‘t get emotionally involved! This is how the CLOUD works *and* This is how YOUR CLOUD-NATIVE APP WORKS

34 Load Balancer (Cloud Service) Managed VMs (Cloud Service) Example: Web Tier

35 1.Auto-Scale Bidirectional 2.Nodes can fail Auto-Scale is only one cause Handle shutdown signals Stateless (“like a taxi”) vs. Sticky Sessions Stateless nodes vs. Stateless apps N+1 rule vs. occasional downtime (UX) Horizontal Scaling Considerations

36 How many users does your cloud-native application need before it needs to be able to horizontally scale? ?

37 Queue-Centric Workflow Pattern (QCW for short) pattern 2 of 3

38 Extend example into Service Tier QCW enables applications where the UI and back-end services are Loosely Coupled (Compare to CQRS at end if there is interest)

39 QCW Example: User Uploads Photo Web Server Compute Service Reliable Queue Reliable Storage

40 QCW WE NEED: Compute (VM) resources to run our code Reliable Queue to communicate Durable/Persistent Storage

41 Where does Windows Azure fit?

42 QCW [on Windows Azure] WE NEED: Compute (VM) resources to run our code Web Roles (IIS) and Worker Roles (w/o IIS) Reliable Queue to communicate Azure Storage Queues Durable/Persistent Storage Azure Storage Blobs & Tables; WASD

43 QCW on Azure: User Uploads a Photo Web Role (IIS) Web Role (IIS) Worker Role Worker Role Azure Queue Azure Blob UX implications: user does not wait for thumbnail (architecture!) push pull

44 QCW enables Responsive UX Response to interactive users is as fast as a work request can be persisted Time consuming work done asynchronously Comparable total resource consumption, arguably better subjective UX UX challenge – how to express Async to users? – Communicate Progress – Display Final results – Long Polling/Web Sockets (e.g., SignalR or

45 QCW enables Scalable App Decoupled front/back provides insulation – Blocking is Bane of Scalability – Order processing partner doing maintenance – Twitter down – server unreachable – Internet connectivity interruption Loosely coupled, concern-independent scaling – (see next slide) – Get Scale Units right – Key to optimizing operational CO$T$

46 General Case: Many Roles, Many Queues Web Role (IIS) Web Role (IIS) Worker Role Worker Role Web Role (IIS) Web Role (IIS) Web Role (Public) Web Role (Public) Worker Role Worker Role Worker Role Worker Role Worker Role Type 1 Worker Role Type 1 Worker Role Worker Role Worker Role Worker Role Worker Role Worker Role Worker Role Type 2 Worker Role Type 2 Queue Type 1 Queue Type 2 Queue Type 1 Queue Type 2 Queue Type 3 Scaling best when Investment α Benefit Optimize for CO$T EFFICIENCY Logical vs. Physical Architecture depends on current scale Worker Role Type 2 Worker Role Type 2 Worker Role Type 2 Worker Role Type 2 Worker Role Type 2 Worker Role Type 2 Web Role (Admin) Web Role (Admin)

47 Reliable Queue & 2-step Delete (IIS) Web Role (IIS) Web Role Worker Role Worker Role var url = “”; queue.AddMessage( new CloudQueueMessage( url ) ); var invisibilityWindow = TimeSpan.FromSeconds( 10 ); CloudQueueMessage msg = queue.GetMessage( invisibilityWindow ); (… do some processing then …) queue.DeleteMessage( msg ); Queue

48 QCW requires Idempotent Perform idempotent operation more than once, end result same as if we did it once Example with Thumbnailing (easy case) App-specific concerns dictate approaches – Compensating action, Last write wins, etc. PARTNERSHIP: division of responsibility between cloud platform & app – Far cry from database transaction

49 QCW expects Poison Messages A Poison Message cannot be processed – Error condition for non-transient reason – Use dequeue count property Be proactive – Falling off the queue may kill your system Determine a Max Retry policy per queue – Delete, put on “bad” queue, alert human, …

50 QCW requires “Plan for Failure” VM restarts will happen – Hardware failure, O/S patching, crash (bug) Bake in handling of restarts into our apps – Restarts are routine: system “just keeps working” – Idempotent support needed important – Event Sourcing (commonly seen with CQRS) may help Not an exception case! Expect it! Consider N+1 Rule

51 Typical SiteAny 1 Role InstOverall System Operating System Upgrade Application Code Update Scale Up, Down, or In Hardware Failure Software Failure (Bug) Security Patch What’s Up? Reliability as EMERGENT PROPERTY

52 Aside: Is QCW same as CQRS? Short answer: “no” CQRS – Command Query Responsibility Segregation Commands change state Queries ask for current state Any operation is one or the other Sometimes includes Event Sourcing Sometimes modeled using Domain Driven Design (DDD)

53 What about the DATA? You: Azure Web Roles and Azure Worker Roles – Taking user input, dispatching work, doing work – Follow a decoupled queue-in-the-middle pattern – Stateless compute nodes Cloud: “Hard Part”: persistent, scalable data – Azure Queue & Blob Services – Three copies of each byte – Blobs are geo-replicated – Busy Signal Pattern

54 Database Sharding Pattern pattern 3 of 3

55 Extend example into Data Tier What happens when demands on data tier grow? The Database Sharding Pattern a little about reliability – a lot about scale and performance

56 Foursquare is a Social Network

57 Foursquare #Fail October 4, 2010 – trouble begins… After 17 hours of downtime over two days… “Oct. 5 10:28 p.m.: Running on pizza and Red Bull. Another long night.” WHAT WENT WRONG?

58 What is Sharding? Problem: one database can’t handle all the data – Too big, not performant, needs geo distribution, … Solution: split data across multiple databases – One Logical Database, multiple Physical Databases Each Physical Database Node is a Shard Most scalable is Shared Nothing design – May require some denormalization (duplication)

59 All shard have same schema SHARDS

60 Sharding is Difficult What defines a shard? (Where to put stuff?) – Example – use country of origin: customer_us, customer_fr, customer_cn, customer_ie, … – Use same approach to find records (can use lookup) What happens if a shard gets too big? – Rebalancing shards can get complex – Foursquare case study is interesting How to query / join / transact across shards Cache coherence, connection pool management – Roll-your-own challenge

61 Where does Windows Azure fit?

62 Windows Azure SQL Database (WASD) is SQL Server Except… Common SQL Server Specific (for now) WASD Specific “Just change the connection string…” Full Text Search Transparent Data Encryption (TDE) Many more… Limitations 150 GB size limit Busy Signal Pattern Extra Capabilities Managed Service Highly Available Rental model Federations Additional information on Differences:

63 Windows Azure SQL Databse Federations for Sharding Single “master” database – “Query Fanout” makes partitions transparent – Instead of customer_us, customer_fr, etc… we are back to customer database Handles redistributing shards Handles cache coherence Simplifies connection pooling No MERGE (yet); SPLIT only Bonus feature for Multitenant Applications USE FEDERATION myfed (myfedkey = 911) WITH FILTERING=ON RESET connectivity-model-for-federated-data.aspx connectivity-model-for-federated-data.aspx

64 Foursquare #Fail Foursquare was implementing database sharding in the application layer. WASD Federations makes this unnecessary. WHAT WENT WRONG?

65 My database instance is limited to 150 GB. ∞ ∞ ∞ Does that mean the cloud doesn’t really offer the illusion of infinite resources? ?

66 Pre-Cloud vs. Cloud-Native Old-School vs. Cloud- Native ControlEfficiency Stable/Static HardwareDynamic/∞ Resources Fixed/CapExVariable/OpEx Vertical ScalingHorizontal Resourcing Minimize MTBFMinimize MTTR Data Storage = RDBMSScenario-specific Storage Manage InfrastructureManaged Infrastructure architectural concerns

67 Pre-Cloud vs. Cloud-Native Lessons : being Cloud- Native 1:15,000Efficiency Auto-Scaling via APIDynamic/∞ Resources Pay-As-You-GoVariable/OpEx Stateless, AutonomousHorizontal Resourcing N+1, IdempotentMinimize MTTR SQL, NoSQL, BlobScenario-specific Storage VM, Storage, LB, DRManaged Infrastructure

68 Know the rules “Know the rules well, so you can break them effectively.” - Dalai Lama XIV

69 Integrated Surface Area

70 Practical Impact If web tier going to cloud service (Web Role), ensure that session state is externalized (avoid keeping session state in local server memory)Ensure all logging done to durable location (since fail or scale event could make local hard drive go away) - often this is Windows Azure Diagnostic (WAD) Often pre-cloud apps have too much logic in the web tier (including spiky/memory intensive bits that drive web servers nuts) - some may belong in a service tier - separate "web tier" code from "business service" code - and bonus consideration is whether these tiers should communicate directly (REST or SOAP call) or over queue (Queue-Centric Workflow) Ensure Retry Logic and proper Exception Handling in place for all database access and network service access Will need to do a new sizing exercise based on new layout (which VM sizes for which tiers and how to scale) Licensing can be fun if using non-cloud-friendly licenses - esp if the most natural distributed architecture also unnaturally multiplies license costs Are there any non-standard configurations needed? Might indicate need for Startup Tasks Logging is often weak/lacking in pre-cloud apps - making harder to debug in distributed work once there's an issueBuild/deploy automation can often use some work. An auto-scale monitor (wasabi or one of the services) is usually new - so each app node needs to ensure it can close down gracefully since it may be scaled away (or failed away) If app is going to be updated in-place, the system needs to be able to support running mixed versions in the same cloud serviceUsing cloud services where operating system services were used -- for example, Blob Storage for durable file storage, a Caching Role or Table Storage for externalizing session state, media services if you are dealing with media, CDN, Traffic Manager, etc. If planning to use SQL Azure, dealing with sharding. Might mean schema changes, more so if using Federations than roll-your-own sharding.Use identity is one of the biggest cliffs to walk over - the first time you have an app in the cloud you are needing a way to authenticate - with WAAD and ADFS being a couple of them - this also obvious tends to involve company roles beyond that of a specific app dev team While we're on the topic of identity, modernizing to use Claims-based authorization is a big shift for some apps, but makes integrating with the cloud-native identity plumbing easierEvery node in a cloud service shares a public IP Address - so if they depend on having multiple IP Addresses (domains), they need to consider multiple cloud services or using just port #

71 Cloud Architecture Patterns book Primer Chapters 1.Scalability 2.Eventual Consistency 3.Multitenancy and Commodity Hardware 4.Network Latency

72 Cloud Architecture Patterns book Pattern Chapters 1.Horizontally Scaling Compute Pattern 2.Queue-Centric Workflow Pattern 3.Auto-Scaling Pattern 4.MapReduce Pattern 5.Database Sharding Pattern 6.Busy Signal Pattern 7.Node Failure Pattern 8.Colocate Pattern 9.Valet Key Pattern 10.CDN Pattern 11.Multisite Deployment Pattern

73 Questions? Comments? More information? ?

74 Business Card

75 Boston Azure cloud user group Focused on Microsoft’s Public Cloud Platform Monthly, 6:00-8:30 PM in Boston area – Food; wifi; free; great topics; growing community Follow on More info or to join our group:

76 Looking for … consulting help with Windows Azure Platform? someone to bounce Azure or cloud questions off? a speaker for your user group or company technology event? Just Ask! Bill community inquiries: business inquiries: book: Contact Me Find this slide deck here




Download ppt "“Try not. Do, or do not. There is no try.” - Yoda Yoda finally admits he does not understand exception handling..."

Similar presentations

Ads by Google