Presentation on theme: "LASTor: A Low-Latency AS-Aware Tor Client"— Presentation transcript:
1LASTor: A Low-Latency AS-Aware Tor Client Masoud Akhoondi, Curtis Yu, Harsha V. MadhyasthaHello everyone,My name is Masoud Akhoondi from UCRiverside.I am gonna present our paper paper called LASTor: a low latency AS-aware Tor clientIn this talk I am gonna say (this paper is about ) how we can improve both latency and anonymity by only client side modificationI have collaborated on this work with curits and our advisor harsha.----- Meeting Notes (5/22/12 07:37) -----
2Tor (The onion router) Anonymity Low latency communication DR1R2R3STor stands for The Onion Router.Tor has 300,000 users and around 2700 volunteer relays distributed around the world.Tor uses a source routing. When a source wants to communicate with a destinations, it picks 3 relays. Then it extends a path through these relays to the destination using onion routingTor has two goals:The first goal is anonymity, it protects privacy of its user. In other words, each hop only knows previous and next hop on a path. For example R3 know D and R2 in this path. This is a property of onion routing.90% of Tor traffic is interactive, for example browing the web, so another goal of Tor is low latency communicationBut Tor does not meet both of these two goals perfectly. In next couple of slides I am gonna talk about quality of these two goals in current Tor clientAnonymity- Each hop only knows previous and next hop on a pathLow latency communication- 90% of Tor traffic is interactive [Mccoy08]400,000 users2700 relays
3How are latencies on Tor? Experiment:Sources:50 PlanetLab nodes spread across globeDestinations:Top 200 websites5x inflation in medianFirst I am gonna talk about latency.To quantify the extent of latency overhead, we measured the latency of visiting top 200 websites from 50 PlanetLab nodes.We measured the latency once over Tor and once directly using the Internet.This figure shows the latency overhead of both cases. Y axis is cdf of src, dst pairs and x axes shows latency.No Tor in this figure means that we directly measure latency of destination.We see in this figure that using Tor inflates 5x in median latency
4Profiling attack on Tor Green AS (Autonomous System) can eavesdrop on both end segments of path[Murdoch07]EntrysegmentExitDSRelay 2Now let’s talk about quality of another goal of Tor, anonymity. I am gonna show you that Tor does not achieve this goal well.I would like to remind you that onion routing gurantees that every hop on the path only know previous and next hop. However, this does not provide perfect anonymity due to profilling attack as follows:I define two terms here: entry segment is a part of path from client to first relay and exit segment is a part of path from third relay to destinationEvery path goes through a bunch of AS (Automimous system). For the purpose of this talk we can consider an AS is equivalent to an ISPIf there is a common AS on both segments, for example green AS, it can profile traffic and de-anonymize path. In other words, green AS can figure out who is talking to whom----- Meeting Notes (5/22/12 07:39) -----Entry relayExit relay
5How severe is profiling attack? 65% of relays are in 20% of all ASesTo quantify how sever is profiling attack on TorWe analyze distribution of relays across ASes. We map each relay to its AS and plot this figure. This figure shows that distribution of relays across ASes is not uniform. For instance, 65% of relays are in 20% of all Ases cotains relaysSo probability of choosing entry and exit relay in the same AS is pretty highNon-uniform distribution of relays across ASes
6Potential solution for these problems Measure latencies and routes from each relay to all end-hosts [Sherr09, Alsabah11, Mittall11]Requires modification of relaysNone of these proposals deployed yetNon-trivial to implementNow question here is what are solutions for these problems?one potential solution is to measure latencies and routes from each relay to all end-host. And then choose a path with low latency and no common Ases on both entry and exit segment.But it requires modification of relays to capture, distribute and update this information. Several proposals have been taken this approach which requires modfication of relays.I wanna point out that none of these proposals for changing relays have been incorporated on real Tor.
7LASTor: A low-latency AS-aware Tor client Main insight: Client modifications sufficeImprove poor latency for interactive communicationsMitigate profiling attackLASTor: A low-latency AS-aware Tor clientWhat’s this paper all about?The key insight of our work is client modifications suffice both of these goals improving latency and mitigating profiling attack. We developed LASTor which is a modified version of Tor client. It improves latency and anonymity of Tor
8Main insight: Client modifications suffice Improve poor latency for interactive communicationsMitigate profiling attackSolution:Modified path selection to reduce latencySolution:AS-aware path selectionTo improve latecny, we propose a path selection algorithmTo mitigate profiling attack, we propose an algorithm to predict paths with common ASes on them and LASTor avoids choosing those paths.In the first half of my talk, I gonna demonstrate how we improve latency
9Sources of latency on Tor Goal: Improve latencySources of latency on TorQueuing and processing delayCongestion in relays [Panchenko09]Propagation delayLong pathsDSTo improve latency we need to know sources of delay on Tor.One source of delay is queuing and processing delay because of congestion in relaysAnother source of delay is propagation delay specially when path is very long.For example in this figure, although source and destination are almost close to each other, all packets should traverse half of the world several times before they get to the destinationglobe several times before they reach at destination although client and destination are close to each other.To investigate which of these factor is dominant factor,
10Shortest path vs. Default Tor Goal: Improve latencyShortest path vs. Default TorDestinations:Top 200 websitesSources:50 PlanetLab nodes spread across globeMap relays to geographical locations50% improvement in medianwe repeat the experiment, We measure latency of visiting top 200 websites from 50 planetlab nodes.This time we run shortest path algorithm as well.Each source maps relays to geographical locations and finds the shortest paths from itself to destination which goes through three relays. Then it measures latency on this pathThis figure shows the result of this experiment, No Tor means measuring latency directly over the Internet. SP Tor is a modified version of Tor client which runs shortest path algorithm and default Tor is default Tor client.We see significant improvement in latency using shortest path algorithm, for example shortest path algorithm reduces 50% in median latency.So shorter path can greatly improve latency. It means propagation delay is one of the significant factor.But We can’t afford path to be deterministic since an adversary can de-anonymize the path between a source to a destination. That’s why we introduce weighted shortest pathShorter paths can greatly reduce latencyPath should not be deterministic Weighted Shortest Path (WSP)
11Weighted Shortest Path (WSP) Goal: Improve latencyWeighted Shortest Path (WSP)WSP computes length of all possible pathsProbability of choosing is inversely proportional to its length33111PathLengthProb.Upper80.56Lower100.44WSP calculates length of all possible paths between a given source and destination and then it chooses a path with a probability inversely proportional to its length.For example, in this figure, length of uppper path 8 and lower path is 10. So the probability of choosing upper path is higher than lower path.At high level WSP provides lower latency by prefering shorter path, but unlike shortest path is probabilistic. That the case an attacker can not infer the path.However if we use this naïve version of WSP, an adversary can attempt to run an attack on it.243
12Attacker controls a relay Goal: Improve latencyAn Attack on WSPAttacker controls a relay33111243For example an attacker who has control over a relays, replicates its relay 5 times by let’s say running several virtual machines. We see that the probability of choosing compromised paths will be increased.In general, if an adversary runs several malicious relays close to the direct line between source and destination, the chance of choosing compromised paths will be increase with some order of magnitude.Original prob.Prob.Compromised paths0.560.8Other paths0.440.2
13Solution: Clustering of relays Goal: Improve latencySolution: Clustering of relays3311124To mitigate this attack, we cluster relays that are located in geographically nearby locations.Then we run WSP using clusters of relays instead of relays. And for a chosen cluster-level path, randomly pick a relay in each cluster.3Run WSP using clusters of relaysFor chosen cluster-level path, randomly pick a relay in each cluster
14Solution: Clustering of relays Goal: Improve latencySolution: Clustering of relays3311124By using clustering of relays, probability of choosing compromised paths comes back to the original values.Eventhough an adversary replicates malicious relays in one location, the probability of choosing compromised paths will be un-changed. As the result, burden on the adversary is increased, he has to run several relays in multiple location in order to attract more traffic.Another advantage of clustering relays is reducing running time of finding a path from the order of seconds to few hundred milliseconds.3Prob.Compromised paths0.56Other paths0.44
15Weighted Shortest Path (WSP) Goal: Improve latencyWeighted Shortest Path (WSP)PreprocessingCluster all relaysPath selectionComputes length of possible paths using clustersChoose a path with a probability inversely proportional to its lengthPick a relay randomly in each chosen clusterOther issues (see paper)Handling multi-location destinationsChoosing entry relaysIn this slides I am gonna review how WSP works:It has a preprocessing step in which it clusters all relays.Then it chooses a cluster level path with a probability inversely proportional to its length. Finally it picks one random relay in each cluster.There are other issue that I refer you to read paper.We implemented WSP to account for handling multi locaiton destinations. For example, Google web servers are in several different geographical locations in the world. So when a client wants to computes end-2-end distance between source and destination, it needs to know where the destination is.WSP also handles the selection of entry relays. I leave it to you to read paper for details.
16WSP reduces latency 50 PlanetLab nodes to top 200 websites Goal: Improve latencyWSP reduces latency20% improvement in 80th percentile25% improvement in medianNow I have explained to you how WSP works, let’s see how well it works.We developed LASTor a modified version of Tor client to use WSP. We measure the latency visiting top 200 website from 50 planet lab nodes.We see significant improvement in latency using WSP. For example, WSP improves 25% in median latency and it improves 20% in 80th percentile.I wanna stress out here that all of these improvement is availble today by only client side modification.The key insight that enables this is that geographical distance is a good measure of latency and we don’t need to measure latencies from relays.50 PlanetLab nodes to top 200 websites
17Tunable path selection in LASTor Goal: Improve latencyTunable path selection in LASTorModify WSP to consider user’s preference towards:AnonymityLatencySingle parameter α configured by user:Modified weight w to w(1-α) where 0 ≤α≤ 1Since WSP’s preference for shorter paths naturally reduces the entropy of path selection, all users may not wish to reduce in entropy for lower latencies.That’s why we modified WSP to be tunable with a single parameter alpha which changes weight of paths in order to consider user’s preference towards anonymity or latency.For example If alpha is equal to 1, weight and probability of all paths are the same (highest anonymity). If alpha is zero, shorter paths have higher chances, lower latency.To quantify the effect of alpha on latency and anonymity, we ran some experiments.α1Lowest latencyHighest anonymity
18Tunable path selection in LASTor Goal: Improve latencyTunable path selection in LASTorLower α, lower latencyHigher α, higher anonymityWe choose different values for alpha from 0 to 1The left side figure shows effect of alpha on latency. Again we measured latency of visiting top 200 website from 50 planet lab nodes. This figure shows that lower values of alpha leads to lower latencyTo quantify anonymity of different values of alpha, we use a metric called Gini coefficient. It measures inequality in a distribution. 0 is perfect equality and 1 is maximal inequality.The right figure shows the Gini coefficient of different values of alpha. We see higher values of alpha have higher anonymity, in other words, less gini coefficient.Gini Coefficientmeasure of inequality in a distribution0: perfect equality1: maximal inequality
19Main insight: Client modifications suffice Improve poor latency for interactive communicationsMitigate profiling attackSolution:Modified path selection to reduce latencySolution:AS-aware path selectionNow we are done with the first goal of LASTor, improving latency. Let’s talk about the second goal of LASTor, mitigating profiling attack.
20Goal: Detect common ASes on entry and exit segments Goal: AS-awareProfiling attack on a pathGoal: Detect common ASes on entry and exit segmentsGreen AS (Autonomous System) can eavesdrop on both end segments of path[Murdoch07]EntrysegmentExitDRelay 2STo remind you, if there is a common AS on both entry and exit segment, that AS can de-anonymize the path.Our goal here is that detect paths with common Ases on entry and exit segment and avoid choosing those paths.First, let’s talk about how Tor mitigates this attack.Entry relayExit relay
21Simple heuristic does not work Goal: AS-awareSimple heuristic does not work57% of common AS instances are missedTo mitigate this attack, Tor avoid choosing entry relay and exit relay if they have same /16 IP prefix.This figure shows the false negative rate of this hurestic. We define false negative as fraction of paths with common AS which are not detected.For example 57% of paths with common Ases are missed for median of src and dst.So we need a solution for predicting Ases on each segment.Default Tor ensures no two Tor relays in same /16False negative: fraction of paths with common AS not detected
22Need for predicting AS paths Goal: AS-awareNeed for predicting AS pathsApproach 1: Measure routes from relays to all end hostsNeed to modify relaysApproach 2: Infer AS-level routesSeveral techniques exist [Mao05, Madhyastha06, Madhyastha09, Lee11]At best 70% accuracyThere are two approaches for this problem:One approach is that measuring routes from relays to all end host but it needs midification of relaysSeveral techniques exits however their accuracy at best case is 70%.Given AS path prediction is hard, we are predicting set of ASesExit relayD
23Our solution: AS set prediction Goal: AS-awareOur solution: AS set predictionExit relayDwe predict all Ases that traffic might go through and they are compliant with routing policies.Predict ASes on all paths compliant with routing policies
24Our solution: AS set prediction Goal: AS-awareOur solution: AS set predictionExit relayDbut we leave more room for errorPredict ASes on all paths compliant with routing policies
25Our solution: AS set prediction Goal: AS-awareOur solution: AS set predictionInput [13MB initially, 1.5MB weekly]Topology graph at AS-levelEstimate of AS path lengthCompact representation routing policies:Triple of (AS1, AS2, AS3) where AS1AS2AS3AlgorithmModified version of Dijkstra’s algorithmOutputSet of ASes on policy-compliant routesAS set prediction algorithm downloads three inputs, a topology graph of AS-level routing, estimate of AS path length and a compact representation of routing policies. Size of this data in 13MB initially and 1.5 MB weekly thereafter.Then we run a modified version of dijkstra’s algorithm to find the set of Ases which are on compliant with routing plicies and the length of path is equal to input path length.I refer you to read the paper for details of this algorithm. Let’s take a look at the accuracy of AS set predication.
26AS set based prediction is accurate Goal: AS-awareAS set based prediction is accurate11% of common AS instances are missed57% of common AS instances are missedWe see in the figure that AS set prediction reduces false negative rate. For example it reduces false nagative rate from 57% ro 11% in median.I wanna point out here that Any path selection algorithm including current Tor algorithm can use AS-set predictionFalse negative: fraction of paths with common AS not detectedAny path selection algorithm can use AS set predcition to avoid profiling attack
27LASTor Latency 50 PlanetLab nodes to top 200 websites Finally we compare latencies different algorithms.WSP refers to weighted shortest path algorithm. WSP+AS sets refers to WSP algorithm enhanced by AS set prediction and Default Tor algorithm.WSP+ AS sets increases latency in comparison with WSP but it is still better than Default Tor client. However it is safer for users.50 PlanetLab nodes to top 200 websites
28Summary Demonstrated client side changes are sufficient for: Lower latencyHigher anonymityDesigned and implemented LASTorReduces median latency by 25%Reduces median false negative of common AS from 57% to 11%To summarize this talk,Client side modifiction improves latency and anonymity of TorIt reduces median latency by 25% and it reducted mediand false negative rate of paths with common AS from 57% to 11%.
30How does Tor work? (Onion Routing) Entry Relay (guard)ServerClientExit RelayR1R3How does Tor work? Tor uses a traditional concept called onion routing.Let me explain it through an example. this Client wants to connect to the server.Tor client software routes Internet traffic through a worldwide volunteer network of relays. Tor has currently 2700 relays.There is a key associated with each relay.Tor client usually chooses 3 relays. And then it extends a path through them. For example here client chooses R1, R2 and R3. The first relay in the circuit is called entry relay or guard, the second one is called middle relay and the last one is called exit relay.Client encrypts its data with public key of exit relay first, then it encrypts with public key of middle relay and after that it encrypts all data with public key of entry relay.When packets reach at encry relay, it decrypts them and relays them to next middle relay. Same thing happens in middle relay. Finally exit relay forwards packets to the destination.R5R4R2- 300,000 usersrelaysMiddle Relay
31Is distance a good estimation of latency? Choose two different paths:WSP(latency)WSP(distance)Measure latency on these two paths50 planetlab nodes as source and top 200 websites as destinationSince we are not going to modify tor relays,we don’t know what is the latency between a source and all candidate relays and so forthwe need an estimation of latencyWe run another experiments to know how much distance is a good estimation of latencyThere is no significant difference between these two metrics
32Accuracy of AS-set prediction algorithm Goal: AS-awareAccuracy of AS-set prediction algorithmThis diagram shows the accuracy of prediction algorithm.The figure on the left shows the false ….We compare this algorithm with iplane which is one of the existing tools for predicting AS path./8 and /16 prefixes are current tor heuristics for avoiding this attackThe figure on the right shows the false negativ…
33Attack on WSP Clustering of relays reduces: Probability of the attack Goal: Improve latencyClustering of relays reduces:Probability of the attackRunning time of WSP50% reductionWe conduct an experiment to demonstrate the improved resilience of WSP to this attackWe emulate an adversaryFor each relay, we compute the probability of the chosen path traversing that relay.This is an upper bound on the fraction of cases in which the chosen path will traverse a relay controlled by the adversaryIt can be decreased more if we choose larger clusterWe reduced the running time of this algirhtms to few hundred milli secondAdversary replicates 10% most popular relays 25 timesCompute probability of the chosen path traversing a malicious relay