Copyright © 2005 Department of Computer Science 1 Solving the TCP-incast Problem with Application-Level Scheduling Maxim Podlesny, University of Waterloo.

Copyright © 2005 Department of Computer Science 22 Motivation 2 Emerging IT paradigms –Data centers, grid computing, HPC, multi-core –Cluster-based storage systems, SAN, NAS –Large-scale data management “in the cloud” –Data manipulation via “services-oriented computing” Cost and efficiency advantages from IT trends, economy of scale, specialization marketplace Performance advantages from parallelism –Partition/aggregation, MapReduce, BigTable, Hadoop –Think RAID at Internet scale! (1000x)

Copyright © 2005 Department of Computer Science 33 Problem Statement High-speed, low-latency network (RTT ≤ 0.1 ms) Highly-multiplexed link (e.g., 1000 flows) Highly-synchronized flows on bottleneck link Limited switch buffer size (e.g., 32 KB) How to provide high goodput for data center applications? TCP retransmission timeouts TCP throughput degradation N

Copyright © 2005 Department of Computer Science 444 Related Work E. Krevat et al., “On Application-based Approaches to Avoiding TCP Throughput Collapse in Cluster-based Storage Systems”, Proceedings of SuperComputing 2007 A. Phanishayee et al., “Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems”, Proceedings of FAST 2008 Y. Chen et al., “Understanding TCP Incast Throughput Collapse in Datacenter Networks”, WREN 2009 V. Vasudevan et al., “Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication”, Proceedings of ACM SIGCOMM 2009 M. Alizadeh et al., “Data Center TCP”, Proc. ACM SIGCOMM 2010 A. Shpiner et al., “A Switch-based Approach to Throughput Collapse and Starvation in Data Centers”, IWQoS 2010

Copyright © 2005 Department of Computer Science 55 Summary Data centers have specific network characteristics TCP-incast throughput collapse problem emerges Possible solutions: –Tweak TCP timers and/or parameters for this environment –Redesign (or replace!) TCP in this environment –Rewrite applications for this environment (Facebook) –Increase switch buffer sizes (extra queueing delay!) –Smart edge coordination for uploads/downloads Summary of Related Work

Copyright © 2005 Department of Computer Science 6 Data Center System Model N servers Logical data block (S) (e.g., 1 MB) Server Request Unit (SRU) (e.g., 32 KB) 1 2 3 N packet size S_DATA small buffer B link capacity C switch client

Copyright © 2005 Department of Computer Science 7 Performance Comparisons  Internet vs. data center network: Internet propagation delay: 10-100 ms data center propagation delay: 0.1 ms packet size 1 KB, link capacity 1 Gbps -> packet transmission time is 0.01 ms

Copyright © 2005 Department of Computer Science 88 Summary Determine maximum TCP flow concurrency (n) that can be supported without any packet loss Arrange the servers into k groups of (at most) n servers each, by staggering the group scheduling Analysis Overview (1 of 2)

Copyright © 2005 Department of Computer Science 99 Summary Determine maximum TCP flow concurrency (n) that can be supported without any packet loss –Determine flow size in packets (based on SRU and MSS) –Determine maximum outstanding packets per flow (W max ) –Determine max flow concurrency (based on B and W max ) Arrange the servers into k groups of (at most) n servers each, by staggering the group scheduling Analysis Overview (2 of 2)

Copyright © 2005 Department of Computer Science 10 Summary Recall TCP slow start dynamics: –Initial TCP congestion window (cwnd) is 1 packet –Acks cause cwnd to double every RTT (1, 2, 4, 8, 16…) Consider TCP transfer of an arbitrary SRU (e.g., 21) Determine peak power-of-2 cwnd value (W A ) Determine “residual window” for the last RTT (W B ) W max depends on both W A and W B (e.g., W A + W B /2 ) Determining W max

Copyright © 2005 Department of Computer Science 12 Scheduling Details Using lossless scheduling of server responses: maximum n servers responding simultaneously, with k groups of responding servers scheduled Using lossless scheduling of server responses: maximum n servers responding simultaneously, with k groups of responding servers scheduled Server i (1 <= i <= N) starts responding at: Server i (1 <= i <= N) starts responding at:

Copyright © 2005 Department of Computer Science 13 Theoretical Results Maximum goodput of an application in a data center with lossless scheduling is: where: S - size of a logical data block T - actual completion time of an SRU - SRU completion time used for scheduling k – how many groups of servers to use d max - real system scheduling variance Maximum goodput of an application in a data center with lossless scheduling is: where: S - size of a logical data block T - actual completion time of an SRU - SRU completion time used for scheduling k – how many groups of servers to use d max - real system scheduling variance

Copyright © 2005 Department of Computer Science 19 Summary and Conclusion  Application-level scheduling for TCP- incast throughput collapse  Main idea: scheduling responses of servers so that there are no losses  Maximum goodput with lossless scheduling Non-monotonic goodput, highly-sensitive to network configuration parameters

Copyright © 2005 Department of Computer Science 1 Solving the TCP-incast Problem with Application-Level Scheduling Maxim Podlesny, University of Waterloo.

Similar presentations

Presentation on theme: "Copyright © 2005 Department of Computer Science 1 Solving the TCP-incast Problem with Application-Level Scheduling Maxim Podlesny, University of Waterloo."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Copyright © 2005 Department of Computer Science 1 Solving the TCP-incast Problem with Application-Level Scheduling Maxim Podlesny, University of Waterloo.

Similar presentations

Presentation on theme: "Copyright © 2005 Department of Computer Science 1 Solving the TCP-incast Problem with Application-Level Scheduling Maxim Podlesny, University of Waterloo."— Presentation transcript:

Similar presentations

About project

Feedback