Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fine-Grained Latency and Loss Measurements in the Presence of Reordering Myungjin Lee, Sharon Goldberg, Ramana Rao Kompella, George Varghese.

Similar presentations


Presentation on theme: "Fine-Grained Latency and Loss Measurements in the Presence of Reordering Myungjin Lee, Sharon Goldberg, Ramana Rao Kompella, George Varghese."— Presentation transcript:

1 Fine-Grained Latency and Loss Measurements in the Presence of Reordering Myungjin Lee, Sharon Goldberg, Ramana Rao Kompella, George Varghese

2 Trend toward low-latency networks  Low latency: one of important metrics in designing a network  Switch vendors introduce switches that provide low latency  Financial data center begins to demand more stringent latency

3 Low latency Benefits of low-latency networks  An automated trading program can buy shares cheaply  A cluster application can run 1000’s more instructions Financial Service Provider Network Content Provider Brokerage Our network provides E-to-E latency SLA of a few μ seconds

4 But…  Guaranteeing low latency in data centers is hard  Congestion needs to be less than a certain level  Reason 1: No traffic models for different applications  Hinders managers from predicting offending applications  Reason 2: New application’s behavior is often unforeseen until it is actually deployed  E.g., TCP incast problem [SIGCOMM ’09]

5 Latency & loss measurements are crucial  Need latency & loss measurements on a continuous basis  Detect problems  Fix: re-routing offending application, upgrading links, etc.  Goal: Providing fine-grained end-to-end aggregate latency and loss measurements in data center environments AB Content Provider Brokerage E-to-E latency and loss measurements

6 Content Provider Brokerage Measurement model  Out-of-order packet delivery due to multiple paths  Packet filtering associates packet stream between A and B  Time synchronization: IEEE 1588, GPS clock, etc.  No header changes: Regular packets carry no timestamp Financial Service Provider Network AB … Multiple paths Out-of-order delivery Brokerage Filter

7 Measurement model  Interval message: A special ‘sync’ control packet to mark off a measurement interval  Injected by measurement modules at an edge (e.g., Router A)  Measurement interval: A set of packets ‘bookended’ by a pair of interval messages AB Financial Service Provider Network Content Provider Brokerage Filter Router A Router B Interval Message Interval Message Measurement Interval

8 Existing solutions  Active probes  Problem: Not effective due to huge probe rate requirement  Storing timestamps and packet digests locally  Problem: Significant overhead for communication  Packet sampling: Trade-off between accuracy and overhead  Lossy Difference Aggregator (LDA) [Kompella, SIGCOMM ’09]  State-of-the-art solution with FIFO packet delivery assumption  Problem: Not suitable in case where packets can be reordered

9 LDA in packet loss case  Key point: Only useful buckets must be used for estimation  A useful bucket: a bucket updated by the same set of packets at A and B  Bad packets: lost packets to corrupt buckets Router A Router B Hash 1 1 123 Bucket 1 3 X 1 2 59 2 6 2 12 711 2 9 1 True delay = Corrupted bucket (3 – 1) 3 = 3.3 + (11 – 7) + (9 – 5) Estimated delay = 12 – 6 2 = 3 Interval Message Estimation error = 9% Packet count Timestamp sum

10 LDA in packet loss + reordering case  Problem: LDA confounds loss and reordering  Packet count match in buckets between A and B is insufficient  Reordered packets are also bad packets  Significant error in loss and aggregate latency estimation Router A Router B Hash 1 1 123 1 3 X 1 2 59 2 6 2 12 711 2 9 1 13 2 24 No reordering Reordering True delay = 3.3 Estimated delay = 12 + 24 – 6 – 9 4 = 5.25 Estimation error = 59% Freeze buckets after update True delay = 3.3 Packet count Timestamp sum Freeze buckets

11 Quick fix of LDA: per-path LDA  Let LDA operate on a per-path basis  Exploit the fact that packets in a flow are not reordered by ECMP  Issues  (1) Associating a flow with a path is difficult  (2) Not scalable: potentially need to handle millions of separate TCP flows

12 Packet reordering in IP networks  Today’s trend  No reordering among packets in a flow  No reordering across flows between two interfaces  New trend: Data centers exploit the path diversity  ECMP splits flows across multiple equal-cost paths  Reordering can occur across flows  Future direction: Switches may allow reordering within switches for improved load balancing and utilization  Reordering-tolerant TCP for use in data centers

13 Proposed approach: FineComb  Objective  Detect and correct unusable buckets  Control the number of unusable buckets  Key ideas  1) Incremental stream digests: Detect unusable buckets  2) Stash recovery: Make corrupted buckets useful by correction  3) Packet sampling: Control the number of bad packets included

14 Incremental stream digests (ISDs)  An ISD = H(pkt 1 )  H(pkt 2 )  …  H(pkt k )   is an invertible commutative operator (e.g., XOR)  Property 1: Low collision probability  Two different packet streams hash to different value  Allows to detect corrupted buckets  Property 2: Invertibility  Easy addition/subtraction of a packet digest from an ISD  The basis of stash recovery

15 ISDs handles loss and reordering  ISDs detects corrupted buckets by loss and reordering  Buckets are usable only if both packet counts and ISDs match each other between A and B Router A Router B Hash 1 1 030403 1 3 X 1 2 06 2 6 2 12 2A 2 9 1 11 10 2 24 True delay = 3.3 03 04 092E 03 092A3A ISDs don’t match Hash value Packet count Timestamp sum ISD

16 Latency and loss estimation  Average latency estimation 22 69 2E09 ISD Timestamp sum Packet count Router ARouter B A1 9 322 1224 3A099C 19 1 Delay sum = (12 – 6) 2 Average latency = 3.0 Count = + (0 – 0) + 0 = 6 = 2 Loss count sum = (2 – 2) 2 Loss rate = 0.43 Total packets = + (2 – 2)+ (3 – 1) + 2+ 3 = 3 = 7  Loss estimation

17 Stash recovery  Stash: A set of (timestamp, bucket index, hash value) tuple of packets which are potentially reordered  (-) stash  Contains packets potentially added to a receiver (Router B)  In recovery, packet digests are subtracted from bad buckets at a receiver  (+) stash  Contains packets potentially missing at a receiver (Router B)  In recovery, packet digests are added to bad buckets at a receiver

18 Stash recovery  A bad bucket can be recovered iff reordered packets corrupted it  Reordered packets are not counted as lost packets  Increase loss estimation accuracy A bucket in A (–) Stash in B 2122E3343E A bucket in B – 12042323A2292E ISDs don’t match ISDs match 151015 1204 1510 1204 {} {} 15101204 {} All subsets

19 Sizing buckets and stashes  Known loss and reordering rates  Given a fixed storage size, we obtain the optimal packet sampling rate (p*)  We provision stash and buckets based on the the p*  Unknown loss and reordering rates  Use multiple banks optimized for different set of loss and reordering rate Details can be found in our paper

20 Accuracy of latency estimation Average relative error Reordering rate 1000x difference FineComb: ISD+stash, FineComb-: ISD only Packet loss rate = 0.01%, #packets = 5M, true mean delay = 10 μs

21 Accuracy of loss estimation Average relative error Reordering rate Packet loss rate = 0.01%, #packets = 5M Stash helps to obtain accurate loss estimation

22 Summary  Data centers require end-to-end fine-grain latency and loss measurements  We proposed a data structure called FineComb  Resilient to packet loss and reordering  Incremental stream digest detects corrupted buckets  Stash recovers buckets only corrupted by reordered packets  Evaluation shows FineComb achieves higher accuracy in latency and loss estimation than LDA

23 Thank you! Questions?

24 Backup

25 Microscopic loss estimation Average relative error Reordering rate

26 Handling unknown loss & reordering rates Average relative error Reordering rate LDA: 2-banks, FineComb: 4-banks with same memory size


Download ppt "Fine-Grained Latency and Loss Measurements in the Presence of Reordering Myungjin Lee, Sharon Goldberg, Ramana Rao Kompella, George Varghese."

Similar presentations


Ads by Google