Enhancing Tor’s Performance using Real- time Traffic Classification By Hugo Bateman
Overview Background The problem DiffTor How it works Experiments Results Criticisms
Background
DiffTor Framework for classifying encrypted Tor circuits based upon the applications they serve Online Classification: Real-time Offline Classification: Using relay logs
Differentiating Applications Circuit lifetimes Amount of Data transferred Cell inter-arrival time
Class Definitions Bulk transfer: Download and upload larger volumes of data Greedy No time constraints Interactive Require interaction between client and server Time sensitive Streaming Bulk transfer Time and quality constraints
Classification Algorithms Naive Bayes: probabilistic classification algorithm that is based on Bayes’ theorem Bayesian Networks: Graphical representations that are used to model the dependency relationships between attributes and classes Decision Trees: Functional tree classification and Logistic model tree classification
Experiments BitTorrent Client: Download torrents from popular torrent site Browsing Client: Picks a random URL from the list of the top 100 URLs reported by Alexa, repeats Streaming Client: Stream random videos using key words
Data Collection Collected over a period of 6 weeks Offline data set: 200 circuits, 122 browsing, 49 BitTorrent, 28 streaming Online data set
Evaluation Metrics Accuracy: (TP + TN) / N Precision: TP / (TP + FP) Recall: TP / (TP + FN) F-measure: (2 * Precision * Recall) / (Precision + Recall)
Offline Results
Online Results
Live Tor Experiment Naive Bayes classifier deployed in release of Tor source code No streaming class Interactive client modified to download 300kb file in a loop
Results
Criticisms No mention of the effect on performance Very little explanation of throttling, how it’s done and how different methods could impact overall network performance Unrealistic browsing client