Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Stanford Platform Laboratory John Ousterhout and Guru Parulkar Stanford University

Similar presentations


Presentation on theme: "The Stanford Platform Laboratory John Ousterhout and Guru Parulkar Stanford University"— Presentation transcript:

1 The Stanford Platform Laboratory John Ousterhout and Guru Parulkar Stanford University http://platformlab.stanford.edu/

2 Platform Lab Faculty John Ousterhout Faculty Director Mendel RosenblumKeith WinsteinGuru Parulkar Executive Director Bill DallyPhil LevisSachin KattiChristos Kozyrakis Nick McKeown 2

3 1980’s: – Platform: relational database – Applications: enterprise applications (e.g. ERP systems) 1990’s: – Platform: HTTP + HTML + JavaScript – Applications: online commerce 2000’s: – Platform: GFS + MapReduce – Applications: large-scale analytics 2010’s: – Platform: smart phones + GPS – Applications: Uber and many others New Platforms Enable New Applications 3

4 General-purpose substrate – Makes it easier to build applications or higher-level platforms – Solves significant problems – Usually introduces some restrictions Software and/or hardware Example: Map/Reduce computational model – Simplifies construction of applications that use hundreds of servers to compute on large datasets – Hides communication latency: data transferred in large blocks – Automatically handles failures & slow servers – Restrictions: 2 levels of computation, sequential data access What is a Platform? 4

5 Platform Lab Vision Platforms Large SystemsCollaboration Create the next generation of platforms to stimulate new classes of applications 5

6 Drivers for Next Generation Platforms Achieve physical limits – Can we have layers of abstraction without giving up performance? Heterogeneity and specialization – General-purpose systems fundamentally inefficient – Can we find a small set of specialized components that are highly efficient and, taken together, provide a general-purpose set of abstractions? Raise the floor of the developer productivity – How to create abstractions that are extremely easy to use, while still providing high enough performance to meet the application’s needs? Scalability and elasticity – How to achieve high throughput and low latency with horizontal scaling – How to achieve elasticity, for example from 1K-1M users without reimplementation? 6

7 Initial Focus Platform: Swarm Pod

8 Swarm Pod Next-Generation Datacenter Pod 8 Swarms of Devices Wired/Wireless Networks Switches/Routers Self-Driving Cars Drones Cell Phones Internet of Things

9 Changing Technologies and Requirements Next-Generation Datacenter Pod 9 Swarms of Devices Wired/Wireless Networks Switches/Routers Self-Driving Cars Drones Cell Phones Internet of Things More Devices Online Need Better Visibility and Control Large Nonvolatile Memories Low Latency Interconnects Increasing Core Density Specialized Components Collaboration Between Devices

10 Swarm Pod Research Topics Next-Generation Datacenter Pod 10 Swarms of Devices Wired/Wireless Networks Switches/Routers Self-Driving Cars Drones Cell Phones Internet of Things Scalable Control Planes Low-Latency Software Stack New Memory/ Storage Systems IX Operating System RAMCloud Storage System Programmable Network Fabrics Self-Incentivizing Networks

11 The Low-Latency Datacenter Phase 1 of datacenter revolution: scale – How can one application harness thousands of servers? – New platforms such as MapReduce, Spark – But, based on high-latency technologies: 1990’s networking: 300-500µs round-trips Disks: 10ms access time Phase 2 of datacenter revolution: low latency – New networking hardware: 5-10µs round-trips today 2-3µs in the future – New nonvolatile memory technologies Storage access times < 10µs – Low latency will enable new applications How does low latency affect system architecture? 11

12 Eliminating Layers Existing software stacks highly layered – Great for software structuring – Layer crossings add latency – Software latency hidden by slow networks and disks Can’t achieve low latency with today’s stacks – Death by a thousand cuts: no single place to optimize – Networks: Complex OS protocol stacks Marshaling/serialization costs – Storage systems: OS file system overheads Low-latency systems will require a new software stack – Can layers be reduced without making systems unmanageable? – Must eliminate layer crossings – What are the new APIs? 12

13 The RAMCloud Storage System New class of storage for low-latency datacenters: – All data in DRAM at all times – Large scale: 1000-10000 servers – Low latency: 5-10µs remote access – Durability/availability equivalent to replicated disk. 1000x improvements in: – Latency – Throughput (relative to disk-based storage) Goal: enable new data-intensive applications 13 Master Backup Master Backup Master Backup Master Backup … Appl. Library Appl. Library Appl. Library Appl. Library … Datacenter Network Coordinator 1000 – 100,000 Application Servers 1000 – 10,000 Storage Servers

14 New RAMCloud Projects New software stack layers for low-latency datacenter: New remote procedure call (RPC) system – Homa: new transport protocol Receiver-managed flow and congestion control Minimize buffering – Microsecond-scale latency – 1M connections/server New thread scheduling mechanism – Threads scheduled by application, not OS – OS allocates cores to applications, manages competing apps – Same mechanism extends to VMMs: hypervisor allocates cores to guest OS 14

15 Reimagining Memory and Storage New nonvolatile memories coming soon Example: Intel/Micron Crosspoint devices: – 1-10µs access time? – 10 TB capacity? – DIMM form factor: directly addressable What are the right abstractions for shared storage? – Files have high overheads for OS lookups, protection checks – Does paging make sense again? – Single-level store? Relationship between data and computation: – Move data to computation or vice versa? 15

16 Hollowing Out of the OS 16 Hypervisor Application Operating System Device Drivers Networking (Kernel Bypass) Direct Storage Access Thread Scheduling Physical Memory Mgmt Does a radical OS redesign/simplification make sense?

17 Next logical step in SDN: Take programmability all the way down to the wire

18 Status quo 18 Switch OS Run-time API Driver “This is roughly how I process packets …” Fixed-function ASIC Prone to bugs Very long and unpredictable lead time

19 Turning the tables Switch OS Run-time API Driver PISA device (Protocol-Independent Switch Architecture) “This is precisely how you must process packets” in P4 19

20 P4 and PISA P4 code Compiler Compiler Target Queues Programmable Parser Programmable Parser Fixed Action L2 Table Fixed Action IPv4 Table IPv6 Table ACL Table Match Table Action Macro CLK 20

21 Current Research Projects 1.P4 as a front-end to configure OVS (with Ben Pfaff and Princeton) – Approach 1: Statically compile P4 program to replace parse and matching in OVS – Approach 2: Compile P4 to eBPF and dynamically load to kernel – Early results suggest no performance penalty for programmability; in some cases faster 2.Domino: A higher level language (with MIT) – C-like, process-to-completion. Includes stateful processing. Compiler generates P4 code. 3.PIFO: A hardware abstraction for programmable packet scheduling algorithms 4.xFabric: Calculating flow rates based on programmers utility function 5.PERC: Fast congestion control by proactive, direct calculation of flow rates in the forwarding plane. 21

22 Applications declare their resource preferences – Lowest latency, bandwidth allocation Network operators declare their resource usage policies Challenge is to automate optimal resource allocation for diverse applications for the datacenter scale infrastructure xFabric as a platform will ensure optimal resource allocations while meeting application requirements while meeting operator policies xFabric: Programmable Datacenter Fabric 22

23 Scalable Control Plane: Generalized Customizable Separation of control plane is a common trend: networks/systems – SDN, storage systems, MapReduce scheduler, … Control plane design represents a challenge – Scale, throughput and latency metrics, abstractions that are easy to use We have been building control planes for specific systems – ONOS, SoftRAN, and RAMCloud Coordinator Can we design a generalized scalable control plane with – A common foundation that can be customized for different contexts Design a new platform that makes it significantly easier to develop diverse control planes with functionality and performance 23

24 Key Performance Requirements Control Apps Global Network View / State Global View / State High Volume of State: ~500GB-2TB High Throughput: ~500K-20M ops / second ~100M state ops / second Low Latency to Events: 1-10s ms A distributed platform required to meet the metrics Difficult challenge! High throughput | Low latency | Consistency | High availability Server Storage 24

25 Generalized and Customizable Scalable Control Plane Northbound Absrtractions/APIs (C/C++, Declective Programming, REST) Strongly consistent, trasaction semantics? OpenFlow/NetConfRAN Protocol?RPC Apps Distributed Core Cluster of Servers, 10-100Gbps, low latency RPC Distributed State Management Primitives Southbound Plug-in for different contexts Northbound Abstraction: - Interface to apps - Provide different APIs - Customize for the context Core: - distributed - context independent Southbound Abstraction: - Interface to data plane - Plug-ins for different contexts Server Storage Switches eNBs Servers Storage Server

26 Platform Lab Vision Platforms Large SystemsCollaboration New platforms enable new applications 26

27 Why universities should do large systems projects: – Companies don’t have time to evaluate, find best approach – Universities can lead the market – Produce better graduates Goal for Platform Lab: – Create environment where large systems projects flourish Large Systems 27

28 Convergence of computing, storage, and networking is very important for future infrastructure Swarm Pod and the target applications require expertise in many system areas The Platform Lab has brought a set of professors and their students with expertise in different systems area to collaborate to address the challenges at the convergence of computing, storage, and networking Collaboration 28

29 Difficult to know the specifics at this point but our expectations and history suggests Platform Lab will lead to Influential ideas and architecture directions Real systems or platforms with community of users Graduates with strong systems skillset Impact on the practice of computing and networking Commercial impact with ideas, open source systems, and startups Across several areas of systems: hardware & software; computing, networking, and storage; different layers of the system; app domains; … Expected Results 29

30 Regular Members – Event Based Interactions – Regular reviews and retreats – Early access to results – Access to faculty and students Premium Members – Active Collaboration – Work together with committed engineers/researchers – Be part of architecture, design, implementation, and evaluation of platforms – Company staff participate in regular meetings including weekly meetings Engagement Model 30

31 Questions? Reactions?

32 Thank You!

33 Low-latency datacenter (Dally, Katti, Kozyrakis, Levis, Ousterhout) RAMCloud (Ousterhout, Rosenblum) Scalable control planes (Katti, Ousterhout, Parulkar) Programmable network fabrics (Katti, Levis, McKeown, Ousterhout, Parulkar) New memory/storage systems for the 21st Century (Dally, Kozyrakis, Levis) Cloud query planner (Winstein, Levis) Self-incentivizing networks (Winstein, ??) Example Target Platforms 33

34 Memory abstractions and storage hierarchies obsolete for today’s workload and technologies – E.g., memory is limited; temporal locality; moving data to computation is efficient -- not true for many of today’s apps/environment Goals: revisit memory/storage abstractions and implementations – Heterogeneous: combination of DRAM, SCM – Aggressive memory sharing among apps across a sever, cluster, datacenter – Support for QoS, near-data processing, and security 21 st Century Abstractions for Memory and Storage 34

35 21 st Century Abstractions for Memory and Storage Proposed design – ideas – A single-level store based on objects or segments that will span apps, memory technologies, and servers – Objects will have logical attributes: persistence, indexing, … – Objects will have physical attributes: encryption, replication requirements, … – Apps/users specify logical attributes; compilers and run time systems manage mapping and do background optimizations Develop hardware & software platforms for a single-level store – Efficient hardware structure for fast access – Compiler & system software for efficient mapping within and across servers – APIs and storage representation schemes – Security and privacy support – Cluster-wide management and optimization 35

36 Logically Centralized Control Plane Provides global network view Makes it easy to program control, management, config apps Enables new apps 36

37 Scalable Control Plane: Perfect Platform for the Laboratory Requires overcoming all of the fundamental challenges identified Physical limits – To deliver on performance Heterogeneity and specialization – Target environments are diverse: hardware to apps Scalability and elasticity – Most control plane scenarios need scalability and elasticity Raise the floor of the developer productivity – Typically devops/netops people write apps for the controllers – programming abstractions have to suite them 37

38 Target Platforms: Low Latency Datacenter

39 Phase 1: manage scale – 10,000-100,000 servers within 50m radius – 1 PB DRAM – 100 PB disk storage – Challenge: how can one application harness thousands of servers? Answer: MapReduce, etc. But, communication latency high: – 300-500µs round-trip times – Must process data sequentially to hide latency (e.g. MapReduce) – Interactive applications limited in functionality Evolution of Datacenters 39

40 Why Does Latency Matter? Large-scale apps struggle with high latency – Random access data rate has not scaled! – Facebook: can only make 100-150 internal requests per page UI App. Logic Data Structures Traditional Application UI App. Logic Application Servers Storage Servers Web Application << 1µs latency0.5-10ms latency Single machine Datacenter 40

41 Goal: Scale and Latency Enable new class of applications: – Large-scale graph algorithms (machine learning?) – Collaboration at scale? Traditional ApplicationWeb Application << 1µs latency 0.5-10ms latency 5-10µs UI App. Logic Application Servers Storage Servers Datacenter UI App. Logic Data Structures Single machine 41

42 Large-Scale Collaboration Data for one user “Region of Consciousness” Gmail: email for one user Facebook: 50-500 friends Morning commute: 10,000-100,000 cars 42

43 Goal: Build new hardware and software infrastructure that operates at microsecond-scale latencies Build on RAMCloud RPC implementation: – Reduce software overhead down from 2µs – Support throughput as well as latency – Reduce state per connection to support 1M connections/server in future Low Latency Datacenter 43

44 Target Platforms: RAMCloud

45 Storage system for low-latency datacenters: General-purpose All data always in DRAM (not a cache) Durable and available Scale: 1000+ servers, 100+ TB Low latency: 5-10µs remote access RAMCloud 45

46 RAMCloud: Distributed Storage with Low Latency Appl. Library Datacenter Network … 1000 – 100,000 Application Servers Appl. Library Appl. Library Appl. Library Master Backup Master Backup Master Backup Master Backup … Coordinator Coordinator Standby External Storage (ZooKeeper) 1000 – 10,000 Storage Servers High-speed networking: 5 µs round-trip Full bisection bandwidth Commodity Servers 64-256 GB per server Build higher level abstractions for ease of use while preserving or improving performance 46

47 Using Infiniband networking (24 Gb/s, kernel bypass) – Other networking also supported, but slower Reads: – 100B objects: 4.7µs – 10KB objects: 10µs – Single-server throughput (100B objects): 900 Kops/sec. – Small-object multi-reads: 2M objects/sec. Durable writes: – 100B objects: 13.5µs – 10KB objects: 35µs – Small-object multi-writes: 400-500K objects/sec. RAMCloud Performance 47

48 Support higher-level features/abstractions Secondary indexes Multi-object transactions Graph operations Without compromising scale and latency (as much as possible) RAMCloud Next Steps 48


Download ppt "The Stanford Platform Laboratory John Ousterhout and Guru Parulkar Stanford University"

Similar presentations


Ads by Google