Presentation is loading. Please wait.

Presentation is loading. Please wait.

Windows Azure for Research Roger Barga, Architect Cloud Computing Futures, MSR.

Similar presentations


Presentation on theme: "Windows Azure for Research Roger Barga, Architect Cloud Computing Futures, MSR."— Presentation transcript:

1 Windows Azure for Research Roger Barga, Architect Cloud Computing Futures, MSR

2 The Million Server Datacenter

3 HPC and Clouds – Select Comparisons Node and system architectures Communication fabric Storage systems and analytics Physical plant and operations Programming models (rest of tutorial)

4 HPC Node Architecture Moore’s “Law” favored commodity systems Specialized processors and systems faltered “Killer micros” and industry standard blades led Inexpensive clusters now dominate

5 HPC Interconnects Ethernet for low end (cost sensitive) High end expectations {Nearly} flat networks and very large switches Operating system bypass for low latency (microseconds)

6 6 Modern Data Center Network Internet CR AR … SS LB Data Center Layer 3 Internet S S AAA … S S AAA … … Layer 2 Key: CR (L3 Border Router) AR (L3 Access Router) S (L2 Switch) LB (Load Balancer) A (20 Server Rack/TOR) GigE 10 GigE

7 HPC Storage Systems Local disk Scratch or non-existent Secondary storage SAN and parallel file systems Hundreds of TBs (at most) Tertiary storage Tape robot(s) 3-5 GB/s bandwidth ~60 PB capacity

8 HPC and Clouds – Select Comparisons Node and system architectures Communication fabric Storage systems and analytics Physical plant and operations Programming models (rest of tutorial)

9 A Tour Around Windows Azure

10 Azure in Action, Manning Press Programming Windows Azure, O’Reilly Press Bing: Channel 9 Windows Azure Bing: Windows Azure Platform Training Kit – November 2010 Update

11 11 Application Model Comparison Machines Running IIS / ASP.NET Machines Running IIS / ASP.NET Machines Running Windows Services Machines Running Windows Services Machines Running SQL Server Machines Running SQL Server Ad Hoc Application Model

12 12 Application Model Comparison Machines Running IIS / ASP.NET Machines Running IIS / ASP.NET Machines Running Windows Services Machines Running Windows Services Machines Running SQL Server Machines Running SQL Server Ad Hoc Application Model Web Role Instances Worker Role Instances Worker Role Instances Azure Storage Blob / Queue / Table SQL Azure Azure Storage Blob / Queue / Table SQL Azure Windows Azure Application Model

13 Key Components Fabric Controller Manages hardware and virtual machines for service Compute Web Roles Web application front end Worker Roles Utility compute VM Roles Custom compute role; You own and customize the VM Storage Blobs Binary objects Tables Entity storage Queues Role coordination SQL Azure SQL in the cloud

14 Key Components Fabric Controller Think of it as an automated IT department “Cloud Layer” on top of: Windows Server 2008 A custom version of Hyper-V called the Windows Azure Hypervisor Allows for automated management of virtual machines

15 Key Components Fabric Controller Think of it as an automated IT department “Cloud Layer” on top of: Windows Server 2008 A custom version of Hyper-V called the Windows Azure Hypervisor Allows for automated management of virtual machines It’s job is to provision, deploy, monitor, and maintain applications in data centers Applications have a “shape” and a “configuration”. The configuration definition describes the shape of a service Role types Role VM sizes External and internal endpoints Local storage The configuration settings configures a service Instance count Storage keys Application-specific settings

16 Key Components Fabric Controller Manages “nodes” and “edges” in the “fabric” (the hardware) Power-on automation devices Routers / Switches Hardware load balancers Physical servers Virtual servers State transitions Current State Goal State Does what is needed to reach and maintain the goal state It’s a perfect IT employee! Never sleeps Doesn’t ever ask for raise Always does what you tell it to do in configuration definition and settings

17 Creating a New Project

18 Windows Azure Compute

19 Key Components – Compute Web Roles Web Front End Cloud web server Web pages Web services You can create the following types: ASP.NET web roles ASP.NET MVC 2 web roles WCF service web roles Worker roles CGI-based web roles

20 Key Components – Compute Worker Roles Utility compute Windows Server 2008 Background processing Each role can define an amount of local storage. Protected space on the local drive, considered volatile storage. May communicate with outside services Azure Storage SQL Azure Other Web services Can expose external and internal endpoints

21 Suggested Application Model Using queues for reliable messaging

22 Scalable, Fault Tolerant Applications Queues are the application glue Decouple parts of application, easier to scale independently; Resource allocation, different priority queues and backend servers Mask faults in worker roles (reliable messaging).

23 Key Components – Compute VM Roles Customized Role You own the box How it works: Download “Guest OS” to Server 2008 Hyper-V Customize the OS as you need to Upload the differences VHD Azure runs your VM role using Base OS Differences VHD

24 Application Hosting

25 ‘Grokking’ the service model Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate The service model is the same diagram written down in a declarative format You give the Fabric the service model and the binaries that go with each of those nodes The Fabric can provision, deploy and manage that diagram for you Find hardware home Copy and launch your app binaries Monitor your app and the hardware In case of failure, take action. Perhaps even relocate your app At all times, the ‘diagram’ stays whole

26 Automated Service Management Provide code + service model Platform identifies and allocates resources, deploys the service, manages service health Configuration is handled by two files ServiceDefinition.csdef ServiceConfiguration.cscfg

27 Service Definition

28 Service Configuration

29 GUI Double click on Role Name in Azure Project

30 Deploying to the cloud We can deploy from the portal or from script VS builds two files. Encrypted package of your code Your config file You must create an Azure account, then a service, and then you deploy your code. Can take up to 20 minutes (which is better than six months)

31 Service Management API REST based API to manage your services X509-certs for authentication Lets you create, delete, change, upgrade, swap,…. Lots of community and MSFT-built tools around the API -Easy to roll your own

32 The Secret Sauce – The Fabric The Fabric is the ‘brain’ behind Windows Azure. 1.Process service model 1.Determine resource requirements 2.Create role images 2.Allocate resources 3.Prepare nodes 1.Place role images on nodes 2.Configure settings 3.Start roles 4.Configure load balancers 5.Maintain service health 1.If role fails, restart the role, based on policy 2.If node fails, migrate the role, based on policy

33 Storage Replicated, Highly Available, Load Balanced

34 Durable Storage, At Massive Scale Blob - Massive files e.g. videos, logs Blob - Massive files e.g. videos, logs Drive - Use standard file system APIs Drive - Use standard file system APIs Tables - Non-relational, but with few scale limits - Use SQL Azure for relational data Tables - Non-relational, but with few scale limits - Use SQL Azure for relational data Queues - Facilitate loosely-coupled, reliable, systems Queues - Facilitate loosely-coupled, reliable, systems

35 Blob Features and Functions Store Large Objects (up to 1TB in size) You can have as many containers and Blobs as you want Standard REST Interface PutBlob Inserts a new blob, overwrites the existing blob GetBlob Get whole blob or a specific range DeleteBlob CopyBlob SnapshotBlob LeaseBlob Each Blob has an address /

36 Containers Similar to a top level folder Has an unlimited capacity Can only contain BLOBs Each container has an access level: -Private -Default, will require the account key to access -Full public read -Public read only

37 Two Types of Blobs Under the Hood Block Blob Targeted at streaming workloads Each blob consists of a sequence of blocks Each block is identified by a Block ID Size limit 200GB per blob Page Blob Targeted at random read/write workloads Each blob consists of an array of pages Each page is identified by its offset from the start of the blob Size limit 1TB per blob

38 You can upload a file in ‘blocks’. Each block has an id. Then commit those blocks in any order into a blob. Final blob limited to 1 TB, and up to 50,000 blocks. Can modify a blob by inserting, updating, and removing blocks. Blocks live for a week before being GC’d if not committed to a blob. Optimized for streaming. Blocks

39 Pages Similar to block blobs. Optimized for random read/write operations and provide the ability to write to a range of bytes in a blob. Call Put Blob, set max size. Then call Put Page. All pages must align 512-byte page boundaries. Writes to page blobs happen in-place and are immediately committed to the blob. The maximum size for a page blob is 1 TB. A page written to a page blob may be up to 1 TB in size.

40 BLOB Leases Creates a 1 minute exclusive write lock on a BLOB. Operations: Acquire, Renew, Release, Break. Must have the lease id to perform operations. Can check LeaseStatus property. Currently can only be done through REST.

41 Windows Azure Drive Provides a durable NTFS volume for Windows Azure applications to use Use existing NTFS APIs to access a durable drive Durability and survival of data on application failover Enables migrating existing NTFS applications to the cloud A Windows Azure Drive is a Page Blob Example, mount Page Blob as X:\ / All writes to drive are made durable to the Page Blob Drive made durable through standard Page Blob replication Drive persists even when not mounted as a Page Blob

42 Windows Azure Drive API Create Drive - Creates a Page Blob formatted as a single partition NTFS volume VHD. Initialize Cache – Allows an application to specify the location and size of the local data cache for all Windows Azure Drives mounted for that VM instance. Mount Drive – Takes a formatted Page Blob and mounts it to a drive letter for the Windows Azure application to start using. Get Mounted Drives – Returns the list of mounted drives. It consists of a list of the drive letter and Page Blob URLs for each mounted drive. Unmount Drive – Unmounts the drive and frees up the drive letter. Snapshot Drive – Allows the client application to create a backup of the drive (Page Blob). Copy Drive – Provides the ability to copy a drive or snapshot to another drive (Page Blob) name to be used as a read/writable drive.

43 BLOB Guidance Manage connection strings/keys in cscfg Do not share keys, wrap with a service Strategy for accounts and containers You can assign a custom domain to your storage account There is no method to detect container existence, call FetchAttributes() and detect the error if it doesn’t exist. FetchAttributes()

44 Table Structure Account: MovieData Star Wars Star Trek Fan Boys Star Wars Star Trek Fan Boys Table Name: Movies Brian H. Prince Jason Argonaut Bill Gates Brian H. Prince Jason Argonaut Bill Gates Table Name: Customers Account Table Entity  Tables store entities.  Entity schema can vary in the same table.

45 Windows Azure Tables Provides Structured Storage Massively Scalable Tables Billions of entities (rows) and TBs of data Can use thousands of servers as traffic grows Highly Available & Durable Data is replicated several times Familiar and Easy to use API WCF Data Services and OData.NET classes and LINQ REST – with any platform or language

46 Is not relational Can Not- Create foreign key relationships between tables. Perform server side joins between tables. Create custom indexes on the tables. No server side Count(), for example. All entities must have the following properties: Timestamp PartitionKey RowKey

47 Windows Azure Queues Queue are performance efficient, highly available and provide reliable message delivery Simple, asynchronous work dispatch Programming semantics ensure that a message can be processed at least once Access is provided via REST

48 Storage Partitioning Understanding partitioning is key to understanding performance Different for each data type (blobs, entities, queues) Every data object has a partition key A partition can be served by a single server System load balances partitions based on traffic pattern Controls entity locality Partition key is unit of scale Load balancing can take a few minutes to kick in Can take a couple of seconds for partition to be available on a different server System load balances Use exponential backoff on “Server Busy” Our system load balances to meet your traffic needs Single partition limits have been reached Server Busy

49 Partition Keys In Each Abstraction Entities w/ same PartitionKey value served from same partition Entities – TableName + PartitionKey Every blob and its snapshots are in a single partition Blobs – Container name + Blob name All messages for a single queue belong to the same partition Messages – Queue Name

50 Replication Guarantee All Azure Storage data exists in three replicas Replicas are created as needed A write operation is not complete until it has written to all three replicas. Reads are only load balanced to replicas in sync.

51 Scalability Targets Storage Account Capacity – Up to 100 TBs Transactions – Up to a few thousand requests per second Bandwidth – Up to a few hundred megabytes per second Single Queue/Table Partition Up to 500 transactions per second To go above these numbers, partition between multiple storage accounts and partitions When limit is hit, app will see ‘503 server busy’: applications should implement exponential backoff Single Blob Partition Throughput up to 60 MB/s

52 PartitionKey (Category) RowKey (Title) TimestampReleaseDate Action Fast & Furious…2009 Action The Bourne Ultimatum…2007 … ……… Animation Open Season 2…2009 Animation The Ant Bully…2006 PartitionKey (Category) RowKey (Title) TimestampReleaseDate Comedy Office Space…1999 … ……… SciFi X-Men Origins: Wolverine…2009 … ……… War Defiance…2008 PartitionKey (Category) RowKey (Title) TimestampReleaseDate ActionFast & Furious…2009 ActionThe Bourne Ultimatum…2007 ………… AnimationOpen Season 2…2009 AnimationThe Ant Bully…2006 ………… ComedyOffice Space…1999 ………… SciFiX-Men Origins: Wolverine…2009 ………… WarDefiance…2008 Partitions and Partition Ranges

53 Key Selection: Things to Consider Distribute load as much as possible Hot partitions can be load balanced PartitionKey is critical for scalability See and for more information Avoid frequent large scans Parallelize queries Point queries are most efficient Transactions across a single partition Transaction semantics & Reduce round trips Scalability Query Efficiency & Speed Entity group transactions

54 Expect Continuation Tokens – Seriously! Maximum of 1000 rows in a response At the end of partition range boundary Maximum of 1000 rows in a response At the end of partition range boundary Maximum of 5 seconds to execute the query

55 Tables Recap Efficient for frequently used queries Supports batch transactions Distributes load Select PartitionKey and RowKey that help scale Avoid “Append only” patterns Always Handle continuation tokens “OR” predicates are not optimized Implement back-off strategy for retries Distribute by using a hash etc. as prefix Expect continuation tokens for range queries Execute the queries that form the “OR” predicates as separate queries Server busy Load balance partitions to meet traffic needs Load on single partition has exceeded the limits WCF Data Services Use a new context for each logical operation AddObject/AttachTo can throw exception if entity is already being tracked Point query throws an exception if resource does not exist. Use IgnoreResourceNotFoundException

56 Queues Their Unique Role in Building Reliable, Scalable Applications Want roles that work closely together, but are not bound together. Tight coupling leads to brittleness This can aid in scaling and performance A queue can hold an unlimited number of messages Messages must be serializable as XML Limited to 8KB in size Commonly use the work ticket pattern Why not simply use a table?

57 Queue Terminology

58 Message Lifecycle QueueQueue Msg 1 Msg 2 Msg 3 Msg 4 Worker Role PutMessagePutMessage Web Role GetMessage (Timeout) RemoveMessageRemoveMessage Msg 2 Msg 1 Worker Role Msg 2 POST HTTP/ OK Transfer-Encoding: chunked Content-Type: application/xml Date: Tue, 09 Dec :04:30 GMT Server: Nephos Queue Service Version 1.0 Microsoft-HTTPAPI/ b586-0df3-4e2d-ad0c-18e3892bfca2 Mon, 22 Sep :29:20 GMT Mon, 29 Sep :29:20 GMT YzQ4Yzg1MDIGM0MDFiZDAwYzEw Tue, 23 Sep :29:20GMT PHRlc3Q+dG...dGVzdD4= HTTP/ OK Transfer-Encoding: chunked Content-Type: application/xml Date: Tue, 09 Dec :04:30 GMT Server: Nephos Queue Service Version 1.0 Microsoft-HTTPAPI/ b586-0df3-4e2d-ad0c-18e3892bfca2 Mon, 22 Sep :29:20 GMT Mon, 29 Sep :29:20 GMT YzQ4Yzg1MDIGM0MDFiZDAwYzEw Tue, 23 Sep :29:20GMT PHRlc3Q+dG...dGVzdD4= DELETE GM0MDFiZDAwYzEw DELETE GM0MDFiZDAwYzEw

59 Truncated Exponential Back Off Polling  Consider a backoff polling approach  Each empty poll increases interval by 2x  A successful sets the interval back to 1.

60 C1C1 C1C1 C2C2 C2C2 Removing Poison Messages Producers Consumers P2P2 P2P2 P1P1 P1P GetMessage(Q, 30 s)  msg 2 1. GetMessage(Q, 30 s)  msg

61 61 C1C1 C 1 C2C2 C2C2 Removing Poison Messages Producers Consumers P2P2 P2P2 P1P1 P1P GetMessage(Q, 30 s)  msg 2 3. C2 consumed msg 2 4. DeleteMessage(Q, msg 2) 7. GetMessage(Q, 30 s)  msg 1 1. GetMessage(Q, 30 s)  msg 1 5. C 1 crashed msg1 visible 30 s after Dequeue

62 62 C1C1 C1C1 C2C2 C2C2 Removing Poison Messages Producers Consumers P2P2 P2P2 P1P1 P1P Dequeue(Q, 30 sec)  msg 2 3. C2 consumed msg 2 4. Delete(Q, msg 2) 7. Dequeue(Q, 30 sec)  msg 1 8. C2 crashed 1. Dequeue(Q, 30 sec)  msg 1 5. C 1 crashed 10. C1 restarted 11. Dequeue(Q, 30 sec)  msg DequeueCount > Delete (Q, msg1) msg1 visible 30s after Dequeue 9. msg1 visible 30s after Dequeue

63 Queues Recap No need to deal with failures Make message processing idempotent Invisible messages result in out of order Do not rely on order Enforce threshold on message’s dequeue count Use Dequeue count to remove poison messages Messages > 8KB Batch messages Garbage collect orphaned blobs Dynamically increase/reduce workers Use blob to store message data with reference in message Use message count to scale No need to deal with failures Invisible messages result in out of order Enforce threshold on message’s dequeue count Dynamically increase/reduce workers

64 Windows Azure Storage Takeaways

65

66 Picking the Right VM Size Having the correct VM size can make a big difference in costs Fundamental choice – larger, fewer VMs vs. many smaller instances If you scale better than linear across cores, larger VMs could save you money Pretty rare to see linear scaling across 8 cores. More instances may provide better uptime and reliability (more failures needed to take your service down) Only real right answer – experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

67 Using Your VM to the Maximum Remember: 1 role instance == 1 VM running Windows. 1 role instance != one specific task for your code You’re paying for the entire VM so why not use it? Common mistake – split up code into multiple roles, each not using up CPU. Balance between using up CPU vs. having free capacity in times of need. Multiple ways to use your CPU to the fullest

68 Exploiting Concurrency Spin up additional processes, each with a specific task or as a unit of concurrency. May not be ideal if number of active processes exceeds number of cores Use multithreading aggressively In networking code, correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads In.NET 4, use the Task Parallel Library Data parallelism Task parallelism

69 Finding Good Code Neighbors Typically code falls into one or more of these categories: Find code that is intensive with different resources to live together Example: distributed network caches are typically network- and memory- intensive; they may be a good neighbor for storage IO-intensive code Memory Intensive CPU Intensive Network IO Intensive Storage IO Intensive

70 Scaling Appropriately Monitor your application and make sure you’re scaled appropriately (not over-scaled). Spinning VMs up and down automatically is good at large scale. Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running. Being too aggressive in spinning down VMs can result in poor user experience. Trade-off between risk of failure/poor user experience due to not having excess capacity and the costs of having idling VMs. Performance Cost

71 Storage Costs Understand an application’s storage profile and how storage billing works Make service choices based on your app profile E.g. SQL Azure has a flat fee while Windows Azure Tables charges per transaction. Service choice can make a big cost difference based on your app profile Caching and compressing. They help a lot with storage costs.

72 Saving Bandwidth Costs Bandwidth costs are a huge part of any popular web app’s billing profile Sending fewer things over the wire often means getting fewer things from storage Saving bandwidth costs often lead to savings in other places Sending fewer things means your VM has time to do other tasks All of these tips have the side benefit of improving your web app’s performance and user experience

73 Compressing Content 1.Gzip all output content All modern browsers can decompress on the fly. Compared to Compress, Gzip has much better compression and freedom from patented algorithms 2.Tradeoff compute costs for storage size 3.Minimize image sizes Use Portable Network Graphics (PNGs) Crush your PNGs Strip needless metadata Make all PNGs palette PNGs Uncompressed Content Compressed Content Gzip Minify JavaScript Minify CCS Minify Images

74 Best Practices Summary Doing ‘less’ is the key to saving costsMeasure everythingKnow your application profile in and out

75 Cloud Computing for eScience Applications

76 NCBI BLAST BLAST (Basic Local Alignment Search Tool) The most important software in bioinformatics Identify similarity between bio-sequences Computationally intensive Large number of pairwise alignment operations A BLAST running can take 700 ~ 1000 CPU hours Sequence databases growing exponentially GenBank doubled in size in about 15 months.

77 Opportunities for Cloud Computing It is easy to parallelize BLAST Segment the input Segment processing (querying) is pleasingly parallel Segment the database (e.g., mpiBLAST) Needs special result reduction processing Large volume data A normal Blast database can be as large as 10GB 100 nodes means the peak storage bandwidth could reach to 1TB The output of BLAST is usually x larger than the input

78 AzureBLAST Parallel BLAST engine on Azure Query-segmentation data-parallel pattern split the input sequences query partitions in parallel merge results together when done Follows the general suggested application model Web Role + Queue + Worker With three special considerations Batch job management Task parallelism on an elastic Cloud Wei Lu, Jared Jackson, and Roger Barga, AzureBlast: A Case Study of Developing Science Applications on the Cloud, in Proceedings of the 1st Workshop on Scientific Cloud Computing (Science Cloud 2010), Association for Computing Machinery, Inc., 21 June 2010AzureBlast: A Case Study of Developing Science Applications on the Cloud

79 AzureBLAST Task-Flow A simple Split/Join pattern Leverage multi-core of one instance argument “–a” of NCBI-BLAST 1,2,4,8 for small, middle, large, and extra large instance size Task granularity Large partition  load imbalance Small partition  unnecessary overheads NCBI-BLAST overhead Data transferring overhead. Best Practice: test runs to p rofiling and set size to mitigate the overhead Value of visibilityTimeout for each BLAST task, Essentially an estimate of the task run time. too small  repeated computation; too large  unnecessary long period of waiting time in case of the instance failure. BLAST task Splitting task BLAST task … … Merging Task

80 Micro-Benchmarks Inform Design Task size vs. Performance Benefit of the warm cache effect 100 sequences per partition is the best choice Instance size vs. Performance Super-linear speedup with larger size worker instances Primarily due to the memory capability. Task Size/Instance Size vs. Cost Extra-large instance generated the best and the most economical throughput Fully utilize the resource

81 AzureBLAST BLAST task Splitting task BLAST task … … Merging Task

82 AzureBLAST Job Portal ASP.NET program hosted by a web role instance Submit jobs Track job’s status and logs Authentication/Authorization based on Live ID The accepted job is stored into the job registry table Fault tolerance, avoid in-memory states Web Portal Web Service Job registration Job Scheduler Job Portal Scaling Engine Job Registry

83 Demonstration

84 R. palustris as a platform for H2 production Eric Shadt, SAGE Sam Phattarasukol Harwood Lab, UW Blasted ~5,000 proteins (700K sequences) Against all NCBI non-redundant proteins: completed in 30 min Against ~5,000 proteins from another strain: completed in less than 30 sec AzureBLAST significantly saved computing time…

85 All-Against-All Experiment Discovering Homologs Discover the interrelationships of known protein sequences “All against All” query The database is also the input query The protein database is large (4.2 GB size) Totally 9,865,668 sequences to be queried Theoretically, 100 billion sequence comparisons! Performance estimation Based on the sampling-running on one extra-large Azure instance Would require 3,216,731 minutes (6.1 years) on one desktop This scale of experiments usually are infeasible to most scientists

86 Our Approach Allocated a total of ~4000 instances 475 extra-large VMs (8 cores per VM), four datacenters, US (2), Western and North Europe 8 deployments of AzureBLAST Each deployment has its own co-located storage service Divide 10 million sequences into multiple segments Each will be submitted to one deployment as one job for execution Each segment consists of smaller partitions When load imbalances, redistribute the load manually

87 End Result Total size of the output result is ~230GB The number of total hits is 1,764,579,487 Started at March 25 th, the last task completed on April 8 th (10 days compute) But based our estimates, real working instance time should be 6~8 day Look into log data to analyze what took place…

88 Understanding Azure by analyzing logs A normal log record should be Otherwise, something is wrong (e.g., task failed to complete) 3/31/2010 6:14RD00155D3611B0Executing the task /31/2010 6:25RD00155D3611B0Execution of task is done, it took 10.9mins 3/31/2010 6:25RD00155D3611B0Executing the task /31/2010 6:44RD00155D3611B0Execution of task is done, it took 19.3mins 3/31/2010 6:44RD00155D3611B0Executing the task /31/2010 7:02RD00155D3611B0Execution of task is done, it took mins 3/31/2010 8:22RD00155D3611B0Executing the task /31/2010 9:50RD00155D3611B0Executing the task /31/ :12RD00155D3611B0Execution of task is done, it took 82 mins

89 Surviving System Upgrades North Europe Data Center, totally 34,256 tasks processed All 62 compute nodes lost tasks and then came back in a group. This is an Update domain ~30 mins ~ 6 nodes in one group

90 35 Nodes experience blob writing failure at same time Surviving Storage Failures West Europe Datacenter; 30,976 tasks are completed, and job was killed A reasonable guess: the Fault Domain is working

91 MODISAzure : Computing Evapotranspiration (ET) in the Cloud You never miss the water till the well has run dry Irish Proverb

92 Computing Evapotranspiration (ET) ET = Water volume evapotranspired (m 3 s -1 m -2 ) Δ = Rate of change of saturation specific humidity with air temperature.(Pa K -1 ) λ v = Latent heat of vaporization (J/g) R n = Net radiation (W m -2 ) c p = Specific heat capacity of air (J kg -1 K -1 ) ρ a = dry air density (kg m -3 ) δq = vapor pressure deficit (Pa) g a = Conductivity of air (inverse of r a ) (m s -1 ) g s = Conductivity of plant stoma, air (inverse of r s ) (m s -1 ) γ = Psychrometric constant (γ ≈ 66 Pa K -1 ) Estimating resistance/conductivity across a catchment can be tricky Lots of inputs: big data reduction Some of the inputs are not so simple Penman-Monteith (1964) Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration, or evaporation through plant membranes, by plants.

93 ET Synthesizes Imagery, Sensors, Models and Field Data NASA MODIS imagery source archives 5 TB (600K files) FLUXNET curated sensor dataset (30GB, 960 files) FLUXNET curated field dataset 2 KB (1 file) NCEP/NCAR ~100MB (4K files) Vegetative clumping ~5MB (1file) Climate classification ~1MB (1file) 20 US year = 1 global year

94 MODISAzure: Four Stage Image Processing Pipeline Data collection (map) stage Downloads requested input tiles from NASA ftp sites Includes geospatial lookup for non-sinusoidal tiles that will contribute to a reprojected sinusoidal tile Reprojection (map) stage Converts source tile(s) to intermediate result sinusoidal tiles Simple nearest neighbor or spline algorithms Derivation reduction stage First stage visible to scientist Computes ET in our initial use Analysis reduction stage Optional second stage visible to scientist Enables production of science analysis artifacts such as maps, tables, virtual sensors Reduction #1 Queue Source Metadata AzureMODIS Service Web Role Portal Request Queue Scientific Results Download Data Collection Stage Source Imagery Download Sites... Reprojection Queue Reduction #2 Queue Download Queue Scientists Science results Analysis Reduction StageDerivation Reduction Stage Reprojection Stage

95 MODISAzure: Architectural Big Picture (1/2) ModisAzure Service is the Web Role front door Receives all user requests Queues request to appropriate Download, Reprojection, or Reduction Job Queue Service Monitor is a dedicated Worker Role Parses all job requests into tasks – recoverable units of work Execution status of all jobs and tasks persisted in Tables Request … JobStatus Persist Job Queue MODISAzure Service (Web Role) Service Monitor (Worker Role) Parse & Persist TaskStatus … Dispatch Task Queue

96 MODISAzure: Architectural Big Picture (2/2) All work actually done by a Worker Role Service Monitor (Worker Role) Parse & Persist TaskStatus GenericWorker (Worker Role) … … Dispatch Task Queue … Data Storage Dequeues tasks created by the Service Monitor Retries failed tasks 3 times Maintains all task status

97 Example Pipeline Stage: Reprojection Service Reprojection Request … Service Monitor (Worker Role) ReprojectionJobStatus Persist Parse & Persist ReprojectionTaskStatus GenericWorker (Worker Role) … Job Queue … Dispatch Task Queue Points to … ScanTimeList SwathGranuleMeta Reprojection Data Storage Each entity specifies a single reprojection job request Each entity specifies a single reprojection task (i.e. a single tile) Query this table to get geo- metadata (e.g. boundaries) for each swath tile Query this table to get the list of satellite scan times that cover a target tile Swath Source Data Storage

98 Costs for 1 US Year ET Computation Computational costs driven by data scale and need to run reduction multiple times Storage costs driven by data scale and 6 month project duration Small with respect to the people costs even at graduate student rates ! Reduction #1 Queue Source Metadata Request Queue Scientific Results Download Data Collection Stage Source Imagery Download Sites... Reprojection Queue Reduction #2 Queue Download Queue Scientists Analysis Reduction StageDerivation Reduction Stage Reprojection Stage GB 60K files 10 MB/sec 11 hours <10 workers $50 upload $450 storage 400 GB 45K files 3500 hours workers 5-7 GB 5.5K files 1800 hours workers <10 GB ~1K files 1800 hours workers $420 cpu $60 download $216 cpu $1 download $6 storage $216 cpu $2 download $9 storage AzureMODIS Service Web Role Portal Total: $1420

99 Observations and Experience Clouds are the largest scale computer centers ever constructed and have the potential to be important to both large and small scale science problems. Equally import they can increase participation in research, providing needed resources to users/communities without ready access. Clouds suitable for “loosely coupled” data parallel applications, and can support many interesting “programming patterns”, but tightly coupled low-latency applications do not perform optimally on clouds today. Provide valuable fault tolerance and scalability abstractions Clouds as amplifier for familiar client tools and on premise compute. Clouds services to support research provide considerable leverage for both individual researchers and entire communities of researchers.

100 Resources: Cloud Research Community Site Getting started steps for developers Available research services Use cases on Azure for research Event Announcements Detailed tutorials Technical papers us with questions at

101 Resources: AzureScope Simple benchmarks illustrating basic performance for compute and storage services Benchmarks for reference algorithms Best Practice tips Code Samples us with questions at

102 Resources: AzureScope Simple benchmarks illustrating basic performance for compute and storage services Benchmarks for reference algorithms Best Practice tips Code Samples us with questions at

103 Demonstration

104 Azure in Action, Manning Press Programming Windows Azure, O’Reilly Press Bing: Channel 9 Windows Azure Bing: Windows Azure Platform Training Kit - November Update


Download ppt "Windows Azure for Research Roger Barga, Architect Cloud Computing Futures, MSR."

Similar presentations


Ads by Google