Enhancing Tor’s Performance using Real- time Traffic Classification By Hugo Bateman.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Advertisements

Data Mining Classification: Alternative Techniques
High Speed Networks and Internets : Multimedia Transportation and Quality of Service Meejeong Lee.
P2P 2.0 and it’s impact on the Internet
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Video Streaming in Flash CSCI 4220 – Network Programming Kacper Harabasz.
1 A Framework for Lazy Replication in P2P VoD Bin Cheng 1, Lex Stein 2, Hai Jin 1, Zheng Zhang 2 1 Huazhong University of Science & Technology (HUST) 2.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006.
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
Sentence Classifier for Helpdesk s Anthony 6 June 2006 Supervisors: Dr. Yuval Marom Dr. David Albrecht.
Lecture 3 Feb 7, 2011 Goals: Chapter 2 (algorithm analysis) Examples: Selection sorting rules for algorithm analysis Image representation Image processing.
Lecture 21: Privacy and Online Advertising. References Challenges in Measuring Online Advertising Systems by Saikat Guha, Bin Cheng, and Paul Francis.
A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute.
Jay Stokes, Microsoft Research John Platt, Microsoft Research Joseph Kravis, Microsoft Network Security Michael Shilman, ChatterPop, Inc. ALADIN: Active.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Privacy-Preserving P2P Data Sharing with OneSwarm -Piggy.
Rule Generation [Chapter ]
BitTorrent Presentation by: NANO Surmi Chatterjee Nagakalyani Padakanti Sajitha Iqbal Reetu Sinha Fatemeh Marashi.
BitTorrent How it applies to networking. What is BitTorrent P2P file sharing protocol Allows users to distribute large amounts of data without placing.
Active Learning for Class Imbalance Problem
Evaluation – next steps
1 Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 11 = Finish ch. 4 and start.
Your Interactive Guide to the Digital World Discovering Computers 2012 Chapter 2 Fundamentals of the World Wide Web and Internet.
הערכת טיב המודל F-Measure, Kappa, Costs, MetaCost ד " ר אבי רוזנפלד.
1 Towards Cinematic Internet Video-on-Demand Bin Cheng, Lex Stein, Hai Jin and Zheng Zhang HUST and MSRA Huazhong University of Science & Technology Microsoft.
Department of Information Engineering The Chinese University of Hong Kong A Framework for Monitoring and Measuring a Large-Scale Distributed System in.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
NcBrowse A Graphical netCDF/OPeNDAP Browser Donald Denbo 1 & John Osborne 2 1 UW/JISAO-NOAA/PMEL, 2 OceanAtlas Software
Slice&Dice: recognizing food preparation activities using embedded accelerometers Cuong Pham & Patrick Olivier Culture Lab School of Computing Science.
Streaming Media A technique for transferring data on the Internet so it can be processed as a steady and continuous stream.
Microsoft Research1 Characterizing Alert and Browse Services for Mobile Clients Atul Adya, Victor Bahl, Lili Qiu Microsoft Research USENIX Annual Technical.
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .
B IT T ORRENT T ECHNOLOGY Anthony Pervetich. H ISTORY Bram Cohen Designed the BitTorrent protocol in April 2001 Released July 2, 2001 Concept Late 90’s.
Impact of Incentives in BitTorrent By Jenny Liu and Seth Cooper.
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Slides for “Data Mining” by I. H. Witten and E. Frank.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Genetic Algorithms (in 1 Slide) l GA: based on an analogy to biological evolution l Each.
Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Interactions & Automations
Classification Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 24, 2015.
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
Bias Management in Time Changing Data Streams We assume data is generated randomly according to a stationary distribution. Data comes in the form of streams.
Performance of P2P implementations
P.Demestichas (1), S. Vassaki(2,3), A.Georgakopoulos(2,3)
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
ECE 471/571 - Lecture 19 Review 02/24/17.
Our Data Science Roadmap
Features & Decision regions
Anupam Das , Nikita Borisov
Anupam Das , Nikita Borisov
Evaluating Models Part 1
Exploring Complexity Metrics as Indicators of Software Vulnerability
Our Data Science Roadmap
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Practice Project Overview
COSC 4368 Intro Supervised Learning Organization
Presentation transcript:

Enhancing Tor’s Performance using Real- time Traffic Classification By Hugo Bateman

Overview Background The problem DiffTor How it works Experiments Results Criticisms

Background

DiffTor Framework for classifying encrypted Tor circuits based upon the applications they serve Online Classification: Real-time Offline Classification: Using relay logs

Differentiating Applications Circuit lifetimes Amount of Data transferred Cell inter-arrival time

Class Definitions Bulk transfer: Download and upload larger volumes of data Greedy No time constraints Interactive Require interaction between client and server Time sensitive Streaming Bulk transfer Time and quality constraints

Classification Algorithms Naive Bayes: probabilistic classification algorithm that is based on Bayes’ theorem Bayesian Networks: Graphical representations that are used to model the dependency relationships between attributes and classes Decision Trees: Functional tree classification and Logistic model tree classification

Experiments BitTorrent Client: Download torrents from popular torrent site Browsing Client: Picks a random URL from the list of the top 100 URLs reported by Alexa, repeats Streaming Client: Stream random videos using key words

Data Collection Collected over a period of 6 weeks Offline data set: 200 circuits, 122 browsing, 49 BitTorrent, 28 streaming Online data set

Evaluation Metrics Accuracy: (TP + TN) / N Precision: TP / (TP + FP) Recall: TP / (TP + FN) F-measure: (2 * Precision * Recall) / (Precision + Recall)

Offline Results

Online Results

Live Tor Experiment Naive Bayes classifier deployed in release of Tor source code No streaming class Interactive client modified to download 300kb file in a loop

Results

Criticisms No mention of the effect on performance Very little explanation of throttling, how it’s done and how different methods could impact overall network performance Unrealistic browsing client