Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR

Similar presentations

Presentation on theme: "Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR"— Presentation transcript:

1 Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR

2 Technology trends Riding the exponentials Convergent invisibility NUIs and computing on behalf Client+Cloud experiences Opportunity for Data and Analytics Cloud infrastructure challenges Packaging, hardware, software, security Thoughts on the future Presentation Outline

3 Its Easy To Forget That Not Very Long Ago … There were few or no experiences with … Web sites, , spam, phishing, computer viruses e-commerce, digital photography or telephony Cell phones were rare and expensive A portable cassette player was still cool HiFi was more common than WiFi A friend was someone you actually knew The future depends on vision and context …

4 Pre-PC Era (1980) PC Era (1995) Internet Era (2000) Consumer Era (Today+) 21 st century implicit and natural computing Increasingly natural interfaces Embedded intelligence in everyday objects Ubiquitous network access and cloud services Computing Eras: Paucity To Plethora Mainframe Era

5 What Has Changed? System on a chip designs Powerful mobile devices Graphics processing units High quality graphics Explosive data growth Ubiquitous sensors and media Inexpensive embedded computing Everyday smart objects, … Wireless spectrum pressure Mobile device growth New software models Social networks, clients+clouds …

6 Multidisciplinary challenges are the present and future … and the tools must empower, not frustrate A computation task has four characteristic demands: Networking – delivering questions and answers Computation – transforming information to produce new information Data access – access to information needed by the computation Data storage – long term storage of information The ratios among these and their costs are critical New applications and systems will arise … if we create the right environment Orders of Magnitude Always Matter

7 Your car drives and navigates for you … and also parks the car (already a feature on some cars) Your sound system only plays music you love … because it knows about every song youve ever heard Your phone only rings when you want to answer … because it knows your emotional state and social context All your family memories are recorded automatically … via MEMS-based sensors and solid state storage Your body calls an ambulance when youre ill … via implanted, biologically powered diagnostic sensors Your DNA sample and lifestyle determine personalized treatment … because genotype-phenotype models are specific Your office adjusts its behavior to your needs … because it knows what you want to do Imagine a Future Where …

8 8 Successful Technologies Are Invisible


10 Enhanced GUI Speech Handwriting Single Touch Versus


12 An FFT? No, its an algorithm A rendering pipeline? No, its a software library A feature recognition system? No, its a building block Our notion of application is increasingly complex Many integrated and interoperating components Our tools must enable creativity accordingly, creating experiences What Is An Application? Microsoft Kinect

13 Working at your command Working on your behalf

14 Create the Experience

15 15 The Future of Experiences

16 Computer Rooms: Cloud COGS Matter Whats A Cloud?

17 17 Microsofts Data Center Evolution And Economics

18 Discovery and Innovation in 2020 In the last two decades advances in computing technology, from processing speed to network capacity and the Internet, have revolutionized the way scientists work. From sequencing genomes to monitoring the Earth's climate, many recent scientific advances would not have been possible without a parallel increase in computing power – and with revolutionary technologies such as the quantum computer edging towards reality, what will the relationship between computing and science bring us over the next 15 years?

19 19 70M 1M 14M High Performance Data-intensive Capacity 80% 20% 14M 1M Scientists & Engineers 55M Little to no access to high performance data-intensive capacity Lack of Broad Access

20 New Bytes of Information in 2010 Source: IDC, as reported in The Economist, Feb 25, x 10 21

21 Sources: The Economist, Feb 10; IDC By 2016 the New Large Synoptic Survey Telescope in Chile will acquire 140 terabytes in 5 days - more than Sloan acquired in 10 years In 2000 the Sloan Digital Sky Survey collected more data in its 1 st week than was collected in the entire history of Astronomy The Large Hadron Collider at CERN generates 40 terabytes of data every second

22 22 Economics of Storage 22 Source: Wired Magazine April 2010; Figures represented in USD 2000 Disk Storage (per gigabyte) Web Storage (per gigabyte) $44.56 $1,250 $0.07 $0.15 But remember,… free storage is like free puppies

23 Hypothesis-driven I have an idea, let me verify it. Exploratory What correlations can I glean? Different tools and techniques Rapid exploration of alternatives Data volume and complexity are assets … and challenges Simplicity really matters Social Implications of the Data Deluge

24 Let Researchers Be Researchers…


26 Magnify client power via the cloud Data analysis and computation Persist and share data in the cloud Multidisciplinary data fusion and scale Leverage via client tools/metaphors Analysis acceleration (Excel, toolkits, codes) Remote rendering and client visualization Data provenance, collaboration … Seamless Client Plus Cloud Compute Blob Storage … … Table Storage

27 NCBI BLAST BLAST (Basic Local Alignment Search Tool) One of the most important software in bioinformatics Identify similarity between bio-sequences Computationally intensive Large number of pairwise alignment operations A normal BLAST running could take 700 ~ 1000 CPU hours For most biologists, two choices to run large jobs Build a local cluster Submit jobs to NCBI or EBI (long job queue times)

28 R. palustris as a platform for H2 production Identify key drivers for producing hydrogen, promising alternative fuel – understand R. palustris well enough to be able to improve its H2 production; Characterize a population of strains and use integrative genomics approaches to dissect the molecular networks of H2 production; BLAST to query 16 strains to sort out genetic relationships Each strain, estimated ~5,000 proteins Jobs kicked off NCBI clusters before completion Against NCBI non-redundant proteins in ~30 min Against ~5,000 proteins from another strain < 30 sec Publishable result in one day for roughly $150. Eric Schadt, Pac Bio and Sam Phattarasukol Harwood Lab, UW

29 NCBI BLAST on Windows Azure Parallel BLAST engine on Azure Query-segmentation data-parallel pattern split the input sequences query partitions in parallel merge results together when done Follows the general suggested application model for Window Azure Web Role + Queue + Worker With three special considerations Batch job management Task parallelism on an elastic Cloud Large data-set management

30 AzureBLAST Task-Flow A simple split/Join pattern Leverage multi-core of one instance argument –num_threads of NCBI-BLAST Task granularity Large partition load imbalance Small partition unnecessary overheads NCBI-BLAST overhead Data transferring overhead. Best Practice: test runs to p rofile and set size to mitigate the overhead BLAST task Splitting task BLAST task … … Merging Task

31 Micro-Benchmarks Inform Design Task size vs. Performance Benefit of the warm cache effect 100 sequences per partition is the best choice Instance size vs. Performance Super-linear speedup with larger size worker instances Primarily due to the memory capability. Task Size/Instance Size vs. Cost Extra-large instance generated the best and the most economical throughput Fully utilize the resource

32 All-Against-All Experiment Discovering Homologs BLAST Uniref100, non-redundant protein sequence database Discover the interrelationships of known protein sequences All against All query The database is also the input query The protein database is large (4.2 GB size) Total of 9,865,668 sequences to be queried Theoretically, 100 billion sequence comparisons! Performance estimation Estimated completion, 3,216,731 minutes (6.1 years) on 8 core VM One of biggest BLAST jobs as far as we know This scale of experiment is usually infeasible to most researchers

33 Our Approach Allocated a total of ~4000 instances 475 extra-large VMs (8 cores per VM) 8 deployments of AzureBLAST Each deployment has its own co-located storage service Divide 10 million sequences into multiple segments Each will be submitted to one deployment as one job for execution 300,000 tasks on 3500 cores on Azure (70,000 bp or 35 sequences per task)

34 Cloud System Upgrades North Europe Data Center, totally 34,256 tasks processed All 62 nodes lost tasks and then came back together. This is an update domain ~30 mins ~ 6 nodes in one group

35 35 Nodes experience blob writing failure at the same time Failures Happen West Europe Datacenter; 30,976 tasks are completed, and job was killed Reasonable guess: Fault Domain is working

36 Release of BLAST on Windows Azure Open source release of NCBI BLAST on Windows Azure (CTP) Installation guide and users guide; Developers guide in preparation; Support the release for the next year – feature requests, fixes,… Free access to NCBI reference data sets on Windows Azure, auto update; Software can be installed and used immediately, customized for your institution (logos, private database, group databases), extend source Releasing result data from all-against-all run BLAST Uniref100, non-redundant protein sequence database Discover the interrelationships of known protein sequences Available Dec. 1 st, 2010.

37 Microsoft Client+Cloud Partnership Azure cloud services Storage and computing Tier one support Hardware and Azure software Hosted data sets Multidisciplinary data analysis Technical engagement team Community collaborations Application support One step of a worldwide program

38 HPC and Clouds: Twins Separated At Birth Similar technology issues Node and system architectures Communication fabrics Storage systems and analytics Physical plant and operations Programming models Reliability and resilience Differing culture and sociology Design and operations Management and philosophy

39 Cloud/HPC Hardware Comparison Predominate differences today Network architecture and SAN storage Dan Reeds hypotheses Convergence is coming AttributeHPCCloud ProcessorHigh-end x86x86 Memory/Node1-8 GB8 GB+ Local DiskScratch onlyPermanent storage SAN StorageCommonRare Tertiary StorageCommonRare InterconnectInfiniband or 10 GigE1 GigE/10GigE NetworkFlatHierarchical

40 Environmental responsibility Managing under a 100 MW envelope Adaptive systems management Provisioning 100,000 servers Hardware: at most one week after delivery Software: at most a few hours Resilience during a blackout/disaster Data center failure Service rollover for 20M customers Programming the entire facility Power, environmentals, provisioning Component tracking, resilience, … Cloud Scaling: Lessons for HPC Exascale

41 Consistency Weak consistency is good Component failure Failure as a first class object Systemic resilience Upgrade during operation Never go down Rethinking Node Architecture Windows Azure Live Services Applications SQL Azure Others Windows Mobile Windows Vista/XP Windows Server.NET Services Fabric Storage Config Compute Application Windows Azure

42 Break the LAN hierarchy Multiple paths, commodity components High bisection bandwidth We build WAN islands, not continents Isolated facilities with limited connectivity Change the landscape Serious, multiple terabit WANs Many lambdas entering a facility Fused node/LAN/WAN infrastructure Rethinking LAN/WAN Networking

43 People and hardware need not mix Hardware cooling standards are conservative Reliable at high temperature/humidity Optimize for efficiency Cooling is (often) unnecessary Design for ambient environments Energy reliability is (often) unnecessary Design for power outages Use larger building blocks Accept component failures Rethinking Packaging and Cooling Temperature Humidity

44 Truncated Life Models Performance and Failure Data Markov Performability Models TCO and Provisioning Desired Lifetime Utility Threshold Elapsed Time Performance Factory sealed units (FRUs) Over-provisioned for failure Dynamic reconfiguration Real-time, adaptive control Rethinking Reliability: Fail In Place

45 Power redundancy is a major cost Batteries to supply up to 15 minutes Use multiple sites, based on energy cost and carbon footprint Electrical grid, solar, wind, fuel cell, … Workload dispatching based on models Real-time optimization and prediction Workload demand Weather and seasonal models Auction-based energy pricing Infrastructure UPS, optical fiber and computing Rethinking Energy Provisioning

46 Draw the right bounding box It defines the problem you solve Understand your workload Use only the hardware you need Metrics reward and punish Choose carefully what you measure Embrace component failure Hardware is cheap and readily recyclable Machines and people do not mix well Consider sealing hardware at the factory Engage multidisciplinary solutions Mechanical, electrical, economic, social … Culture shapes behavior Implicit versus explicit costs Some Research/Design Thoughts

47 47 End-to-End Perspective

48 © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. Questions?...

Download ppt "Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR"

Similar presentations

Ads by Google