Presentation is loading. Please wait.

Presentation is loading. Please wait.

This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Programming.

Similar presentations


Presentation on theme: "This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Programming."— Presentation transcript:

1 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Programming the Cloud Cloud Services for Science Tony Hey Corporate Vice President Microsoft External Research

2 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Commander of the British Empire

3 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Advanced Research Tools and Services Community and Geographic Outreach

4 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License 1.Thousand years ago – Experimental Science –Description of natural phenomena 2.Last few hundred years – Theoretical Science –Newton’s Laws, Maxwell’s Equations… 3.Last few decades – Computational Science –Simulation of complex phenomena 4.Today – Data-Intensive Science –Scientists overwhelmed with data sets from many different sources Data captured by instruments Data generated by simulations Data generated by sensor networks  eScience is the set of tools and technologies to support data federation and collaboration For analysis and data mining For data visualization and exploration For scholarly communication and dissemination (With thanks to Jim Gray) With thanks to Jim Gray Astronomy has been one of the first disciplines to embrace data-intensive science with the Virtual Observatory (VO), enabling highly efficient access to data and analysis tools at a centralized site. The image shows the Pleiades star cluster form the Digitized Sky Survey combined with an image of the moon, synthesized within the WorldWide Telescope service. Science must move from data to information to knowledge

5 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Our goal is to accelerate research by collaborating with academic communities to use computer science research technologies We also aim to help scientists spend less time on IT issues and more time on discovery by creating open tools and services based on Microsoft platforms and productivity software

6 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License A Definition: – Cloud Computing means using a remote data center to manage scalable, reliable, on-demand access to applications – Providing Applications and Infrastructure over the Internet – Scalable means: Possibly millions of simultaneous users of the app. Exploiting thousand-fold parallelism in the app. – Reliable means on-demand; 5 “nines” available right now – Applications span the continuum from client to the cloud Three New Aspects to Cloud Computing: – Illusion of infinite computing resources available on demand – Elimination of an upfront commitment by cloud users – Ability to pay for use of computing resources on a short- term basis as needed

7 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Range in size from “edge” facilities to mega scale. Unprecedented economies of scale Approximate costs for a small size center (1K servers) and a larger, 50K server center. Each data center is 11.5 times the size of a football field TechnologyCost in small- sized Data Center Cost in Large Data Center Ratio Network$95 per Mbps/ month $13 per Mbps/ month 7.1 Storage$2.20 per GB/ month $0.40 per GB/ month 5.7 Administration~140 servers/ Administrator >1000 Servers/ Administrator 7.1 Data courtesy of James Hamilton

8 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Conquering complexity – Building racks of servers & complex cooling systems all separately is not efficient. – Package and deploy into bigger units – 3 Sockets: Power, Cooling, Bandwidth

9 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License

10 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License A Supercomputer is designed to scale a single application for a single user. – Optimized for peak performance of hardware – Batch operation is not “on- demand” – Reliability is secondary If MPI fails, application crashes Build check-pointing into application – Most Data Center applications run continuously (as services) Most Cloud Applications are immediate, scalable and persistent The Cloud is also a platform for massive data analysis – Not a replacement for leading edge supercomputers The Programming model must support scalability in two dimensions – Thousands of simultaneous users of the same applications – Applications that require thousands of cores for each use * Dan Reed

11 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Infrastructure as a Service (IaaS) Provide a way to host virtual machines on demand Platform as a Service (PaaS) You write an Application to Cloud APIs and the platform manages and scales it for you. Software as a Service (SaaS) Delivery of software to the desktop from the Cloud Infrastructure as a Service Platform as a Service Software as a Service

12 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Microsoft Azure.NET CLR/Windows Only Choice of Language Some Auto Failover/ Scale (but needs declarative application properties) Microsoft Azure.NET CLR/Windows Only Choice of Language Some Auto Failover/ Scale (but needs declarative application properties) Google App Engine Traditional Web Apps Auto Scaling and Provisioning Google App Engine Traditional Web Apps Auto Scaling and Provisioning Force.Com SalesForce Biz Apps Auto Scaling and Provisioning Force.Com SalesForce Biz Apps Auto Scaling and Provisioning Amazon AWS VMs Look Like Hardware No Limit on App Model User Must Implement Scalability and Failover Amazon AWS VMs Look Like Hardware No Limit on App Model User Must Implement Scalability and Failover Constraints in the App Model More Constrained Less Constrained Automated Management Services More AutomationLess Automation

13 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Azure Services (storage) Load Balancer Public Internet Worker Role(s) Front-end Web Role Front-end Web Role Switches Highly-available Fabric Controller Highly-available Fabric Controller In-band communication – software control Load-balancers Abstract Programming Model:

14 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Roles are mostly stateless processes running on a core Web Roles provide web service access to the app by the users. Web roles generate tasks for worker roles Worker Roles do “heavy lifting” and manage data in tables/blobs Communication is through queues. The number of role instances should dynamically scale with load Scalability Queue length directly reflects how well backend processing is keeping up with overall workload Queues decouple different parts of the application, making it easier to scale parts of the application independently Flexible resource allocation, different priority queues and separation of backend servers to process different queues

15 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Consists of a (large) group of machines all managed by software called the fabric controller The fabric controller is replicated across a group of five to seven machines and owns all of the resources in the fabric Because it can communicate with a fabric agent on every computer, the controller is aware of every Windows Azure application running on the fabric

16 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License The simplest way to store data in Windows Azure storage is to use blobs – A blob contains binary data – A storage account can have one or more containers, each of which holds one or more blobs Tables hold some number of entities. – An entity contains zero or more properties Queues provide scalability – Queues provide a buffer to absorb traffic bursts – Reduce the impact of individual component failures Blobs can be big—up to 50 gigabytes each They can also have associated metadata

17 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Investigating the use of commercial Cloud services for scientific research Example Applications: PhyloD, computationally-intensive science application that was not previously available as a service. Moving to the Cloud so researchers around the world can have access. MatLab, client application making use of Azure blob storage. Matlab could also be hosted in the Cloud, with the appropriate licensing. Excel, demonstrating seamless interaction between familiar client tools and the Cloud, where data is stored in Azure tables and Azure computations can be invoked for analysis. The data is from sensors distributed throughout one of our Data Centers.

18 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Statistical tool used to analyze DNA of HIV from large studies of infected patients PhyloD was developed by Microsoft Research and has had highly impact Small but important group of researchers – 100’s of HIV and HepC researchers actively use it – 1000’s of research communities rely on these results Typical job, 10 – 20 CPU hours with extreme jobs requiring 1K – 2K CPU hours – Very CPU efficient – Requires a large number of test runs for a given job (1 – 10M tests) – Highly compressed data per job ( ~100 KB per job) Highlights Azure’s potential for agile deployment of science-related services that scale Cover of PLoS Biology November 2008 Courtesy of Roger Barga

19 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Web role copies input tree, predictor and target files to blob storage, enqueues INITIAL work item and updates tracking tables. JobCreated INITIALCreated Blob Storage Work Item Queue Tracking Tables (INITIAL)

20 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Worker role copies the input files to its local storage, computes p-values for a subset of the allele-codon pairs, copies the partial results back to blob storage and updates the tracking tables.

21 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Web role serves the final results from blob storage and status reports from tracking tables.

22 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License

23 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Led by Newcastle University (Prof. Watson) and supported by External Research. Goal: Investigate applicability of Clouds for scientific research Build a working prototype for a thin slice (use-cases in chemo-informatics) Utilize Microsoft technologies to build science-related services Investigating additional Scientific Cloud Services to raise abstraction level for applications

24 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License 24 Ocean Science data on Azure SDS-relational Two terabytes of coastal and model data Computational finance data on SDS-relational BATS, daily tick data for stocks (10 years) XBRL call report for banks (10,000 banks) Currently working with IRIS to store select seismic data on Azure. IRIS consortium based in Seattle (NSF) collects and distributes global seismological data. Data sets requested by researchers worldwide Includes HD videos, seismograms, images, data from major seismic events. High research value worldwide, frequently used in academia and lectures Consume in any language, any tool, any platform in S+S scenarios

25 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License A knowledge ecosystem: – A richer authoring experience – An ecosystem of services – Semantic storage – Open, Collaborative, Interoperable, and Automatic Data/information is inter- connected through machine- interpretable information (e.g. paper X is about star Y) Social networks are a special case of ‘data meshes’ Attribution: Chris BizerChris Bizer

26 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License scholarly communications domain-specific services instant messaging identity document store blogs & social networking mail notification search books citations search books citations visualization and analysis services storage/data services compute services virtualization compute services virtualization Project management Reference management knowledge management knowledge discovery Vision of Future Research Environment with both Software + Services

27 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Dryad; DraydLINQ Computational Biology Toolkit Enables and accelerates fundamental advances in biology F# Collaboration with the academic and research community on F#’s typed functional and object-oriented programming on the.NET platform Plug-ins for Office Ontology Add-in for Word Article Authoring Add-in for Word Chem4Word – Chemistry Drawing in Word Microsoft Electronic Journals Service Open XML Document Viewer Software Engineering Tools Spec#: Program verifier for C# extended with design by contract VCC: Program verifier for Concurrent C PEX: automatic unit testing tool for.NET CHESS: Unit testing tools for concurrent Win32 executable and.NET Windows Azure: http://www.windowsazure.comhttp://www.windowsazure.com

28 This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License


Download ppt "This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Programming."

Similar presentations


Ads by Google