SEDCL: Stanford Experimental Data Center Laboratory.

Slides:



Advertisements
Similar presentations
Fast Data at Massive Scale Lessons Learned at Facebook Bobby Johnson.
Advertisements

Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Microsoft Research.
2  Industry trends and challenges  Windows Server 2012: Beyond virtualization  Complete virtualization platform  Improved scalability and performance.
What happens when you try to build a low latency NIC? Mario Flajslik.
B 黃冠智.
Deconstructing Datacenter Packet Transport Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker Stanford University.
Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Modified by Feng.
Lecture 18: Congestion Control in Data Center Networks 1.
Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Presented by Shaddi.
PFabric: Minimal Near-Optimal Datacenter Transport Mohammad Alizadeh Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker.
RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Nandu Jayakumar, Diego Ongaro, Mendel Rosenblum,
Congestion Control: TCP & DC-TCP Swarun Kumar With Slides From: Prof. Katabi, Alizadeh et al.
Mohammad Alizadeh Adel Javanmard and Balaji Prabhakar Stanford University Analysis of DCTCP:Analysis of DCTCP: Stability, Convergence, and FairnessStability,
Don’t Let Anybody Slip into Your Network! Using the Login People Multi-Factor Authentication Server Means No Tokens, No OTP, No SMS, No Certificates MICROSOFT.
Bertha & M Sadeeq.  Easy to manage the problems  Scalability  Real time and real environment  Free data collection  Cost efficient  DCTCP only covers.
Congestion control in data centers
Defense: Christopher Francis, Rumou duan Data Center TCP (DCTCP) 1.
CS 140 Lecture Notes: Technology and Operating SystemsSlide 1 Technology Changes Mid-1980’s2009Change CPU speed15 MHz2 GHz133x Memory size8 MB4 GB500x.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
Introducing the Platform Lab John Ousterhout Stanford University.
CS 142 Lecture Notes: Large-Scale Web ApplicationsSlide 1 RAMCloud Overview ● Storage for datacenters ● commodity servers ● GB DRAM/server.
On Horrible TCP Performance over Underwater Links Balaji Prabhakar Abdul Kabbani, Balaji Prabhakar Stanford University.
03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University
Mohammad Alizadeh, Abdul Kabbani, Tom Edsall,
Didier Van Hoye Technical FGIA MVP – Virtual Machine Microsoft Extended Experts Team
Practical TDMA for Datacenter Ethernet
Mohammad Alizadeh Stanford University Joint with: Abdul Kabbani, Tom Edsall, Balaji Prabhakar, Amin Vahdat, Masato Yasuda HULL: High bandwidth, Ultra Low-Latency.
It’s Time for Low Latency Steve Rumble, Diego Ongaro, Ryan Stutsman, Mendel Rosenblum, John Ousterhout Stanford University 報告者 : 厲秉忠
Assumptions Hypothesis Hopes RAMCloud Mendel Rosenblum Stanford University.
What We Have Learned From RAMCloud John Ousterhout Stanford University (with Asaf Cidon, Ankita Kejriwal, Diego Ongaro, Mendel Rosenblum, Stephen Rumble,
Sharing the Data Center Network Alan Shieh, Srikanth Kandula, Albert Greenberg, Changhoon Kim, Bikas Saha Microsoft Research, Cornell University, Windows.
SEDCL/Platform Lab Retreat John Ousterhout Stanford University.
Curbing Delays in Datacenters: Need Time to Save Time? Mohammad Alizadeh Sachin Katti, Balaji Prabhakar Insieme Networks Stanford University 1.
RAMCloud Overview John Ousterhout Stanford University.
RAMCloud: a Low-Latency Datacenter Storage System John Ousterhout Stanford University
RAMCloud: Concept and Challenges John Ousterhout Stanford University.
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
Cool ideas from RAMCloud Diego Ongaro Stanford University Joint work with Asaf Cidon, Ankita Kejriwal, John Ousterhout, Mendel Rosenblum, Stephen Rumble,
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Forum January, 2015.
Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015.
B 李奕德.  Abstract  Intro  ECN in DCTCP  TDCTCP  Performance evaluation  conclusion.
Packet Transport Mechanisms for Data Center Networks Mohammad Alizadeh NetSeminar (April 12, 2012) Mohammad Alizadeh NetSeminar (April 12, 2012) Stanford.
RAMCloud: Low-latency DRAM-based storage Jonathan Ellithorpe, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro,
RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Christos Kozyrakis, David Mazières, Aravind Narayanan,
1 NETE4631 Course Wrap-up and Benefits, Challenges, Risks Lecture Notes #15.
Enable Multi Tenant Clouds Network Virtualization. Dynamic VM Placement. Secure Isolation. … High Scale & Low Cost Datacenters Leverage Hardware. High.
CS 140 Lecture Notes: Technology and Operating Systems Slide 1 Technology Changes Mid-1980’s2012Change CPU speed15 MHz2.5 GHz167x Memory size8 MB4 GB500x.
RAMCloud Overview and Status John Ousterhout Stanford University.
Ulrich (Uli) Homann Chief Architect, WW Enterprise Services Microsoft Corporation SESSION CODE: ARC305.
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2013.
6.888: Lecture 3 Data Center Congestion Control Mohammad Alizadeh Spring
Revisiting Transport Congestion Control Jian He UT Austin 1.
Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Microsoft Research.
RAMCloud and the Low-Latency Datacenter John Ousterhout Stanford Platform Laboratory.
Network Requirements for Resource Disaggregation
Data Center TCP (DCTCP)
Data Center TCP (DCTCP)
CIS 700-5: The Design and Implementation of Cloud Networks
Incast-Aware Switch-Assisted TCP Congestion Control for Data Centers
The New Network TNC Prague
HyGenICC: Hypervisor-based Generic IP Congestion Control for Virtualized Data Centers Conference Paper in Proceedings of ICC16 By Ahmed M. Abdelmoniem,
Router-Assisted Congestion Control
Packet Transport Mechanisms for Data Center Networks
Microsoft Research Stanford University
Carnegie Mellon University, *Panasas Inc.
Data Center TCP (DCTCP)
Lecture 16, Computer Networks (198:552)
Lecture 17, Computer Networks (198:552)
CS 401/601 Computer Network Systems Mehmet Gunes
Lecture 8, Computer Networks (198:552)
Presentation transcript:

SEDCL: Stanford Experimental Data Center Laboratory

Tackle Data Center Scaling Challenges with Stanford’s research depth and breadth

Data Center Scaling A network of data centers and web services are the key building blocks for future computing Factors contributing to data center scaling challenges –Explosive growth of data with no locality of any kind –Legal requirement to backup data in geographically-separated locations---big concern for financial industry –Emergence of mobile and Cloud Computing –Massive “interactive” web application –Energy as a major new factor and constraint –Increasing capex and opex pressures Continued innovations critical to sustain growth 3

Stanford Research Themes RAMCloud: main-memory based persistent storage –Extremely low latency RPC Networking: –Large, high-bandwidth, low-latency network fabric –Scalable, error-free packet transport –Software defined data center networking with OpenFlow Servers and computing –Error and failure resilient design –Energy aware and energy proportional design –Virtualization and mobile VMs 4

Major research topics of SEDCL RAMCloud: Scalable DRAM-based Storage –Scalable nvRAM –All data in DRAMs all the time Interconnect fabric –Bufferless networks: low-latency, high-bandwidth network Packet transport –Reliable delivery of packets: R2D2—L2.5 –Congestion management: QCN (IEEE 802.1Qau), ECN-HAT, DCTCP –Programmable bandwidth partitioning for multi-tenanted DCs: AF-QCN –Low-latency 10GBaseT Related projects –OpenFlow –Energy aware and energy proportional design 5

Experimentation is Key to Success Many promising ideas and technologies –Will need iterative evaluation at scale with real applications Interactions of subsystems and mechanisms not clear –Experimentation best way to understand the interactions Difficult to experiment with internal mechanisms of a DC –No experimental facilities and that is a big barrier to innovations Ongoing efforts to enable experimentation –Facebook, Microsoft, NEC, Yahoo!, Google, Cisco, Intel, … 6

Overview of Research Projects RAMCloud Packet transport mechanisms –Reliable and reliable data delivery: R2D2—L2.5 –ECN-HAT, DCTCP: collaboration with Microsoft Data center switching fabric –Extremely low latency, low errors and congestion (bufferless) –High port density with very large bisection bandwidth  project just initiated 7

RAMCloud Overview Lead: John Ousterhout Storage for datacenters commodity servers 64 GB DRAM/server All data always in RAM Durable and available Low-latency access: 5µs RPC High throughput: 1M ops/sec/server Application Servers Storage Servers Datacenter

RAMCloud Research Issues Data durability and availability Low latency RPC: 5 microseconds –Need suitable network! Data model Concurrency/consistency model Data distribution, scaling Automated management Multi-tenancy Client-server functional distribution

Layer 2.5: Motivation and use cases 10

L2.5 Research Issues Determine simple signaling method –Simplify (or get rid of) headers/tags for L2.5 encapsulation Develop and refine the basic algorithm for TCP –In the kernel –In hardware (NICs) Develop the algorithm for storage (FC, FCoE) Deploy in a large testbed Collaborate on standardization

DCTCP DCTCP: TCP for data centers –Operates with really small buffers –Optimized for low-latency –Uses ECN marking  with Mohammad Alizadeh, and Greenberg et al at Microsoft  Influenced by ECN-HAT (with Abdul Kabbani)

DCTCP: Transport Optimized for Data Centers 1.High throughput –Creating multi-bit feedback at TCP sources 2.Low Latency (milliseconds matter) –Small buffer occupancies due to early and aggressive ECN marking 3.Burst tolerance –Sources react before packets are dropped –Large buffer headroom for bursts 1.Use full info in stream of ECN marks 2.Adapt quickly and in proportion to level of congestion Packet buffer K Mark Don’t Mark ECN MarksDCTCPTCP Cut window by 40%Cut window by 50% Cut window by 5%Cut window by 50% Sauce  DCTCP Reduces variability Reduces queuing Incast Queue buildup

14 Research Themes and Teams Networking Virtualization: Server and network Energy Aware M. Rosenblum B. Prabhakar P. Levis K. Kozyrakis WEB App Framework N. McKeown B. Prabhakar G. Parulkar J. Ousterhout N. McKeown Resilient Systems M. Rosenblum S. Mitra N. McKeown Storage J. Ousterhout M. Rosenblum D. Mazieres