Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Laden Clouds Trends and Insights

Similar presentations


Presentation on theme: "Data Laden Clouds Trends and Insights"— Presentation transcript:

1 Data Laden Clouds Trends and Insights
Roger Barga, PhD Architect eXtreme Computing Group, MSR

2 Presentation Outline Technology trends Riding the exponentials
Convergent invisibility NUIs and computing on behalf Client+Cloud experiences Opportunity for Data and Analytics Cloud infrastructure challenges Packaging, hardware, software, security Thoughts on the future

3 It’s Easy To Forget That Not Very Long Ago …
There were few or no experiences with … Web sites, , spam, phishing, computer viruses e-commerce, digital photography or telephony Cell phones were rare and expensive A portable cassette player was still cool HiFi was more common than WiFi A “friend” was someone you actually knew The future depends on vision and context …

4 Computing Eras: Paucity To Plethora
Mainframe Era Pre-PC Era (1980) PC Era (1995) Internet Era (2000) Consumer Era (Today+) 21st century implicit and natural computing Increasingly natural interfaces Embedded intelligence in everyday objects Ubiquitous network access and cloud services

5 What Has Changed? System on a chip designs Graphics processing units
Powerful mobile devices Graphics processing units High quality graphics Explosive data growth Ubiquitous sensors and media Inexpensive embedded computing Everyday smart objects, … Wireless spectrum pressure Mobile device growth New software models Social networks, clients+clouds … LPIA x86 DRAM ctlr OoO 1 MB cache GPU PCIe NoC Custom acceleration Server Desktop Mobile

6 Orders of Magnitude Always Matter
Multidisciplinary challenges are the present and future … and the tools must empower, not frustrate A computation task has four characteristic demands: • Networking – delivering questions and answers • Computation – transforming information to produce new information • Data access – access to information needed by the computation • Data storage – long term storage of information The ratios among these and their costs are critical New applications and systems will arise … if we create the right environment

7 Imagine a Future Where …
Your car drives and navigates for you … and also parks the car (already a feature on some cars) Your sound system only plays music you love … because it knows about every song you’ve ever heard Your phone only rings when you want to answer … because it knows your emotional state and social context All your family memories are recorded automatically … via MEMS-based sensors and solid state storage Your body calls an ambulance when you’re ill … via implanted, biologically powered diagnostic sensors Your DNA sample and lifestyle determine personalized treatment … because genotype-phenotype models are specific Your office adjusts its behavior to your needs … because it knows what you want to do

8 Successful Technologies Are Invisible

9 NUI 2012-… Anticipatory Human-centric CLIENT+CLOUD 2006-present
Location-based apps Social networks CLIENT+CLOUD 2006-present INTERNET 1993-present Web browsers From killer apps to technology maturity Historically a software innovation unlocks new potential, but it takes a first “killer app” to drive adoption and further development of the innovation to maturity. When something new comes along, end-users drive adoption if there is a clear benefit to using it. Technology phases build on each other, and recently they are happening faster. Phases DOS ( ) Last version of DOS as a separate program shipped with Windows 95 Apps: Spreadsheets, word processors GUI (1985-present) Windows 1.0 in 1985 Apps: Desktop publishing, multimedia Internet (1993-present) Mosiac launched 1993, Netscape Navigator launched 1994, IE 1.0 and 2.0 launched 1995 Apps: , web browsers Client + Cloud ( present) In 2006, Amazon launched its Elastic Compute cloud (EC2) as a commercial web service, the first widely accessible cloud computing infrastructure service Apps: Location-based apps, social networks NUI (2012 – present) Manycore processors ship in 2012 Apps: Not sure, but they will be human-centric and immersive Apps like social networking and mobile location-based applications are driving the development of client+cloud. Natural user interfaces and the first killer apps for that technology will take off around 2012, when manycore desktop machines arrive on the market. We are on the verge of a transformation of the roles computers play in our lives, and how we interact with them. How do our computing systems need to evolve? GUI 1985-present Desktop publishing Multimedia Spreadsheets Word processors DOS TODAY

10 Enhanced GUI Versus NUI Expressions Voice Speech Multi-touch
Environment Tasks Context Handwriting From enhanced GUI to NUI GUI – Isn't going away completely. Some interfaces that have been called "natural" in the past really were just fancy ways to access a GUI interface (voice, handwriting, single touch). We are moving to a more human model of interacting with machines – NUI. It enables machines to have more human-like senses so people don't have to think or act like machines to use them. It's more than just voice, handwriting, and touch, but synchronous multimodal sensing of gestures, face, environment, context. The essential part of NUI, the emulation of the human senses and their integration, happens on the client. So what we're looking at is a wholesale integration across multi-sensory interaction models to create a more completely natural way for people to interact with the computer. Let’s look at an example of how a NUI application might look on this platform. Gestures Single Touch

11

12 What Is An Application? Microsoft Kinect An FFT? No, it’s an algorithm A rendering pipeline? No, it’s a software library A feature recognition system? No, it’s a building block Our notion of “application” is increasingly complex Many integrated and interoperating components Our tools must enable creativity accordingly, creating experiences

13 Working at your command
Working on your behalf Working on your behalf If we have clients that can provide the NUI experience, we could then write new types of applications to the client+cloud platform that assume you have NUI, and can use social nets, real-time data sources, and other ancillary data streams to create new experiences that didn't work well or were impossible in the GUI environment. The app on the composite platform creates the value. Computers provide an anticipatory, more expressive response, more human Understands your social graph and the world you live in Anticipates your needs across activities and devices thanks to cloud services and data Powerful clients allows low latency expressive feedback and rich, immersive 3D interaction The system converges around you Managing and mining your social net Deriving synthetic reputation, synthesizing metadata This is human-centric computing.

14 The Cloud The Clients Create the Experience Fixed Portable Specialty
Client form factors There are four classes of client form factors Fixed – you go to it and use it Portable – go somewhere and use it Specialty – purpose-built, optimized for a specific use Mobile – use on the go Difference between mobile and portable; growing diversity of specialty devices; resurgence of fixed form factors, in particular large displays, 3D displays, room-scale environments Independent evolution of client and cloud We see the client and cloud have been evolving on independent paths, both becoming more complex. These two largely divergent universes have been talking to one another through protocols, but they have still been considered separate entities. However, that is changing—we need to move to a composite platform… Fixed Portable Specialty Mobile Intelligent Objects

15 The Future of Experiences
Augmented Interaction Context Awareness Environment Anticipatory Processing Adaptive Behavior Public Data Services Trust & Security Private Data Sensory Inputs Client Cloud Hybrid The future of applications A NUI application seamlessly takes advantage of the composite platform. As an example, we might see this type of structure: Computed locally on the client: Anticipatory processing Sensory inputs Augmented interaction Computed in the cloud: Trust & security services Public data services Private data services Computed in either place (hybrid): Adaptive behavior Context awareness Environment awareness What does the client+cloud platform enable?

16 Computer Rooms: Cloud COGS Matter
What’s A Cloud? Computer Rooms: Cloud COGS Matter

17 Microsoft’s Data Center Evolution And Economics
2005 2006 2007 2008 2009 2010 Data Center Co-Location Generation 1 Quincy and San Antonio Generation 2 Chicago and Dublin Generation 3 Modular Data Center Generation 4 Facility PAC Deployment Scale Unit Server Rack Containers IT PAC Capacity Density & Deployment Scalability & Sustainability Time to Market Lower TCO

18 Discovery and Innovation in 2020
In the last two decades advances in computing technology, from processing speed to network capacity and the Internet, have revolutionized the way scientists work. From sequencing genomes to monitoring the Earth's climate, many recent scientific advances would not have been possible without a parallel increase in computing power – and with revolutionary technologies such as the quantum computer edging towards reality, what will the relationship between computing and science bring us over the next 15 years?

19 Data-intensive Capacity
Lack of Broad Access High Performance Data-intensive Capacity 80% 20% 70M 14M 1M 1M 14M Headline Limited access Tagline Today, only a fraction of scientists and researchers enjoy regular access to high performance computational and data-intensive resources. Narrative/Build Guild [Graphic Builds] And as scientists and engineers seek to take advantage of ever-increasing amounts of richer data to drive new discoveries, it’s not unusual to hit bottlenecks in available high-performance computational and data-intensive resources. [Click] Today, only a fraction of scientists and engineers enjoy regular access to high performance and data-intensive computing resources to process and analyze massive amounts of data and run models and simulations quickly. The reality for most of the scientific community is that speed to discovery is often hampered as they have to either queue up for access to limited resources or pare down the scope of research to accommodate available processing power. It’s no wonder that the U.S. Council on Competitiveness tells us that 53% of organizations are forced to scale down their advanced problems ‘to fit’ within their technology limitations, and that 57% of companies have problems they can’t solve with existing computers. *Background Only* Domain Specialist Descriptions: Domain specialists in the top band (~1M “power users”):   Titles tend to be more about leadership positions Note that this top band of HPC users can consist of people who are in leadership positions as a result of many years of hands-on expertise in the space (hence titles like “senior” or “chief”), but might also include more junior highly capable people who the leader delegates to.  Representative government example and role description: Job Title: Senior Research Scientist (Terminal Ballistics) Agency: Department Of The Army - U.S. Army Research, Development and Engineering Command Domain specialists in the middle band (~14M middle tier): Titles tend to be more about domains and roles.  Engineering domain expertise appears to be particularly common in this band. Job Title: Materials Research Engineer/Physical Scientist Agency: Department Of Energy Domain specialists in the bottom band (55M non-HPC users):  Titles tend to be more support-oriented or tactical.  Common examples would be analyst, assistant, consultant, support Job Title: Research Analyst                                                                                                               Agency: Department of the Treasury - Office of the Comptroller of the Currency 55M Little to no access to high performance data-intensive capacity Scientists & Engineers

20 1.2 x 10 21 New Bytes of Information in 2010
Source: IDC, as reported in The Economist, Feb 25, 2010 Headline Data Deluge1 Tagline Computational performance is critical in a world with an insatiable and growing appetite for data. Narrative/Build Guide Let me start by noting that computational performance is critical in a world with an insatiable and growing appetite for data. As cameras, digital sensors and precision instruments of all kinds have exploded over the last generation, a massive deluge of data has begun. Science, government and industry alike now face the problem of too much information - a vast surplus of data that ironically threatens to handicap progress. Industry analyst firm IDC estimates that there will be approximately 1.2 Zettabytes (1.2 x bytes) of digital data created in 2010, and that the amount of digital information increases tenfold every five years. For context, 1 Zettabyte = 1 million Petabytes.

21 In 2000 the Sloan Digital Sky Survey collected more data in its 1st week than was collected in the entire history of Astronomy By 2016 the New Large Synoptic Survey Telescope in Chile will acquire 140 terabytes in 5 days - more than Sloan acquired in 10 years Headline Data deluge fun facts - Science Tagline We witness this data deluge across science, engineering & business. Narrative/Build Guide We witness this “data deluge” across science, engineering and business every day. Here are a few additional examples of data explosion across the sciences: [Click] In 2000, the Sloan Digital Sky Survey collected more data in its 1st week than was collected in the entire history of Astronomy [Click] In 2016, the new Survey Telescope in Chile will acquire 140 terabytes in 5 days - more than Sloan acquired in 10 years. [Click] The Large Hadron Collider at CERN generates 40 terabytes of data every second The Large Hadron Collider at CERN generates 40 terabytes of data every second Sources: The Economist, Feb ‘10; IDC

22 Web Storage (per gigabyte)
Economics of Storage $44.56 $0.07 $0.15 $1,250 Disk Storage (per gigabyte) Web Storage (per gigabyte) Headline Economics & Innovation1: Data storage Tagline Advances in storage technology over the last 10 years have significantly decreased the cost of storing data, which has lowered an economic barrier for data to proliferate. Narrative/Build Guide In just the last decade, innovations in data storage technologies have lowered the transaction costs and made it much more cost-effective to access and archive massive amounts of data across a variety of mediums, which has lowered an economic barrier and leads to proliferation. Compare the price per GB of hard drive and web storage just 10 years ago… [Click] to the significantly lower prices today across the storage landscape. It’s orders of magnitude more cost-effective. However, data storage advances have profoundly outstripped computational progress. While the cost of the storage of data at the Petabyte scale and beyond is becoming practical, the ability to process it remains elusive. Ready Facts Only* Hard drive storage is over 600 orders of magnitude less Web is over 8000 orders of magnitude less 2009 2010 2008 2006 2001 2000 2007 2002 2003 2005 2004 But remember,… free storage is like free puppies Source: Wired Magazine April 2010; Figures represented in USD

23 Social Implications of the Data Deluge
Hypothesis-driven “I have an idea, let me verify it.” Exploratory “What correlations can I glean?” Different tools and techniques Rapid exploration of alternatives Data volume and complexity are assets … and challenges Simplicity really matters

24 Let Researchers Be Researchers…
Most researchers do not want to be system administrators They don’t want to learn to use supercomputers They want to focus on their research They use standard tools: spreadsheets, statistical packages, desktop visualization Programming = modifying a few parameters in a trusted scripting language

25 Let Researchers Be Researchers…
BUT … The data deluge means they must solve problems times the capacity of their desktop Research is now interdisciplinary Sharing access to large data collections and analysis tools is the future A paradigm shift is coming

26 Seamless Client Plus Cloud
Compute Blob Storage Table Storage Magnify client power via the cloud Data analysis and computation Persist and share data in the cloud Multidisciplinary data fusion and scale Leverage via client tools/metaphors Analysis acceleration (Excel, toolkits, codes) Remote rendering and client visualization Data provenance, collaboration …

27 NCBI BLAST BLAST (Basic Local Alignment Search Tool)
One of the most important software in bioinformatics Identify similarity between bio-sequences Computationally intensive Large number of pairwise alignment operations A normal BLAST running could take 700 ~ 1000 CPU hours For most biologists, two choices to run large jobs Build a local cluster Submit jobs to NCBI or EBI (long job queue times)

28 R. palustris as a platform for H2 production
Identify key drivers for producing hydrogen, promising alternative fuel – understand R. palustris well enough to be able to improve its H2 production; Characterize a population of strains and use integrative genomics approaches to dissect the molecular networks of H2 production; BLAST to query 16 strains to sort out genetic relationships Each strain, estimated ~5,000 proteins Jobs kicked off NCBI clusters before completion Against NCBI non-redundant proteins in ~30 min Against ~5,000 proteins from another strain < 30 sec Publishable result in one day for roughly $150. Eric Schadt, Pac Bio and Sam Phattarasukol Harwood Lab, UW

29 NCBI BLAST on Windows Azure
Parallel BLAST engine on Azure Query-segmentation data-parallel pattern split the input sequences query partitions in parallel merge results together when done Follows the general suggested application model for Window Azure Web Role + Queue + Worker With three special considerations Batch job management Task parallelism on an elastic Cloud Large data-set management

30 AzureBLAST Task-Flow A simple split/Join pattern
Splitting task Merging Task A simple split/Join pattern Leverage multi-core of one instance argument “–num_threads” of NCBI-BLAST Task granularity Large partition  load imbalance Small partition  unnecessary overheads NCBI-BLAST overhead Data transferring overhead. Best Practice: test runs to profile and set size to mitigate the overhead

31 Micro-Benchmarks Inform Design
Task size vs. Performance Benefit of the warm cache effect 100 sequences per partition is the best choice Instance size vs. Performance Super-linear speedup with larger size worker instances Primarily due to the memory capability. Task Size/Instance Size vs. Cost Extra-large instance generated the best and the most economical throughput Fully utilize the resource The smallest task, which only contains one sequence, is an order of magnitude slower than that of a large task which contains 100 sequences. After the task granularity is more than 100 sequences per partition the instance is saturated and generates the constant throughput.

32 All-Against-All Experiment
Discovering Homologs BLAST Uniref100, non-redundant protein sequence database Discover the interrelationships of known protein sequences “All against All” query The database is also the input query The protein database is large (4.2 GB size) Total of 9,865,668 sequences to be queried Theoretically, 100 billion sequence comparisons! Performance estimation Estimated completion, 3,216,731 minutes (6.1 years) on 8 core VM One of biggest BLAST jobs as far as we know This scale of experiment is usually infeasible to most researchers Researchers at Seattle Children’s Hospital interested in protein interactions wanted to know more about the interrelationships of known protein sequences. Due to the sheer number of known proteins, nearly 10 million, this was a very difficult question for even the most state of the art computer to solve. When the researchers first approached the XCG team to see if Azure BLAST could help them with this problem, initial estimates showed that it would take a single computer over six years to find the results. Using Azure BLAST, the 10 million protein sequences were split into groups that were distributed across the Azure cloud. In fact, there were so many sequences to work with that it was necessary to distribute these sequences to data centers in multiple countries, spanning different continents. In the end the results were found in about one week using the cloud and this has been the largest research project to date run o

33 Our Approach Allocated a total of ~4000 instances
475 extra-large VMs (8 cores per VM) 8 deployments of AzureBLAST Each deployment has its own co-located storage service Divide 10 million sequences into multiple segments Each will be submitted to one deployment as one job for execution 300,000 tasks on 3500 cores on Azure (70,000 bp or 35 sequences per task) Since each deployment can have at most 500 weighted instances

34 Cloud System Upgrades North Europe Data Center, totally 34,256 tasks processed All 62 nodes lost tasks and then came back together. This is an update domain ~30 mins ~ 6 nodes in one group

35 Failures Happen West Europe Datacenter; 30,976 tasks are completed, and job was killed 35 Nodes experience blob writing failure at the same time Reasonable guess: Fault Domain is working

36 Release of BLAST on Windows Azure
Open source release of NCBI BLAST on Windows Azure (CTP) Installation guide and users guide; Developers guide in preparation; Support the release for the next year – feature requests, fixes,… Free access to NCBI reference data sets on Windows Azure, auto update; Software can be installed and used immediately, customized for your institution (logos, private database, group databases), extend source Releasing result data from “all-against-all” run BLAST Uniref100, non-redundant protein sequence database Discover the interrelationships of known protein sequences Available Dec. 1st, 2010.

37 Microsoft Client+Cloud Partnership
Azure cloud services Storage and computing Tier one support Hardware and Azure software Hosted data sets Multidisciplinary data analysis Technical engagement team Community collaborations Application support One step of a worldwide program

38 HPC and Clouds: Twins Separated At Birth
Similar technology issues Node and system architectures Communication fabrics Storage systems and analytics Physical plant and operations Programming models Reliability and resilience Differing culture and sociology Design and operations Management and philosophy

39 Cloud/HPC Hardware Comparison
Attribute HPC Cloud Processor High-end x86 x86 Memory/Node 1-8 GB 8 GB+ Local Disk Scratch only Permanent storage SAN Storage Common Rare Tertiary Storage Interconnect Infiniband or 10 GigE 1 GigE/10GigE Network Flat Hierarchical Predominate differences today Network architecture and SAN storage Dan Reed’s hypotheses Convergence is coming

40 Cloud Scaling: Lessons for HPC Exascale
Environmental responsibility Managing under a 100 MW envelope Adaptive systems management Provisioning 100,000 servers Hardware: at most one week after delivery Software: at most a few hours Resilience during a blackout/disaster Data center failure Service rollover for 20M customers Programming the entire facility Power, environmentals, provisioning Component tracking, resilience, …

41 Rethinking Node Architecture
Consistency Weak consistency is good Component failure Failure as a first class object Systemic resilience Upgrade during operation Never go down .NET Services Fabric Storage Config Compute Application Windows Azure Applications SQL Azure Windows Azure Live Services Applications Windows Server Windows Vista/XP Windows Mobile Others

42 Rethinking LAN/WAN Networking
Break the LAN hierarchy Multiple paths, commodity components High bisection bandwidth We build WAN islands, not continents Isolated facilities with limited connectivity Change the landscape Serious, multiple terabit WANs Many lambdas entering a facility Fused node/LAN/WAN infrastructure

43 Rethinking Packaging and Cooling
People and hardware need not mix Hardware cooling standards are conservative Reliable at high temperature/humidity Optimize for efficiency Cooling is (often) unnecessary Design for ambient environments Energy reliability is (often) unnecessary Design for power outages Use larger building blocks Accept component failures Humidity Temperature

44 Rethinking Reliability: Fail In Place
Truncated Life Models Performance and Failure Data Markov Performability Models TCO and Provisioning Desired Lifetime Utility Threshold Elapsed Time Performance Factory sealed units (FRUs) Over-provisioned for failure Dynamic reconfiguration Real-time, adaptive control

45 Rethinking Energy Provisioning
Power redundancy is a major cost Batteries to supply up to 15 minutes Use multiple sites, based on energy cost and carbon footprint Electrical grid, solar, wind, fuel cell, … Workload dispatching based on models Real-time optimization and prediction Workload demand Weather and seasonal models Auction-based energy pricing Infrastructure UPS, optical fiber and computing

46 Some Research/Design Thoughts
Draw the right bounding box It defines the problem you solve Understand your workload Use only the hardware you need Metrics reward and punish Choose carefully what you measure Embrace component failure Hardware is cheap and readily recyclable Machines and people do not mix well Consider sealing hardware at the factory Engage multidisciplinary solutions Mechanical, electrical, economic, social … Culture shapes behavior Implicit versus explicit costs

47 End-to-End Perspective
NEW EXPERIENCES HOLISTIC DESIGN INTELLIGENT MANAGEMENT END-TO-END TRUST Securing the client+cloud platform The client+cloud platform faces a variety of challenges. How can we give users the confidence to embrace client+cloud computing to enable wide-scale adoption? Claims-based identity – move from personal composite identity, to claims-based identity for people, machines, and programs End-to-end trust – ensure that trust can be verified from the end device, to the user, to applications and services, to the cloud and back Active defense – need to move from passive defense (moats and walls), to active defense: being able to detect crimes as they happen, authenticate claims, audit, and trace cybercriminals Machine quarantine – need to be able to quarantine compromised machines, like WHO for the internet Global policy framework – all this needs to happen in an international framework to address cybercrime, require all organizations to meet cyber-security standards The secure client+cloud platform is fully realized when we add the concept of natural user interfaces (NUI). GLOBAL POLICY FRAMEWORK

48 4/1/2017 Questions?... © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Data Laden Clouds Trends and Insights"

Similar presentations


Ads by Google