Presentation is loading. Please wait.

Presentation is loading. Please wait.

SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo.

Similar presentations


Presentation on theme: "SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo."— Presentation transcript:

1 SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo † * Internet Systems Lab, Department of ECE, Purdue University, USA † Department of Electronics and Telecommunications, Politecnico di Torino, Italy

2 2 SIGMETRICS'09 Rapid Evolution of P2P Networks Peer-to-peer (P2P) systems are huge, complex and with millions of participants.  Over 60% of network traffic is due to P2P systems. Used for many different applications.  File sharing – BitTorrent, eMule.  VoIP – Skype.  Video streaming – PPlive. Matured to the point there are commercial offerings.

3 3 SIGMETRICS'09 Undesirable Behavior in P2P Networks Most of the research is on P2P systems design and characterization. Shift attention to the impact P2P systems may have on the Internet. Our focus is on identifying undesirable behavior.  Patterns not expected, not intended or unwanted by developers, users or network operators. Potential for undesirable behavior due to:  Millions of users.  Completely distributed.  Software bugs.  Malicious clients.  Security vulnerabilities.

4 4 SIGMETRICS'09 Our Contributions One of the first works to show that undesirable behavior exists, is prevalent and significant.  Evidence of DDoS attacks exploiting P2P clients.  Significant waste of ISP resources.  Impact of application/user performance. Expose problems in the context of a traffic trace of a large ISP.  More than 5 million customers. One of the first systematic approaches to uncover undesirable behavior in P2P systems.

5 5 SIGMETRICS'09 Talk Outline Dataset Methodology Results

6 6 SIGMETRICS'09 Setup Traces obtained from large European ISP. ISP provides ADSL (20/1Mbps) or Fiber (10/10Mbps). Extensive usage of NATs in the ISP  Peering point (Most clients in the ISP have private IP addresses).  Home NAT.

7 7 SIGMETRICS'09 Setup Traces obtained from large European ISP. ISP provides ADSL (20/1Mbps) or Fiber (10/10Mbps). Extensive usage of NATs in the ISP  Peering point (Most clients in the ISP have private IP addresses).  Home NAT. Packet traces collected from a PoP within the ISP network.  There are more than 2000 customers in the PoP.

8 8 SIGMETRICS'09 eMule Traffic is Predominant in the PoP eMule is a popular P2P file sharing application. Over an entire 3 month period:  60-70% of inbound traffic to PoP is due to eMule.  95% of outbound traffic is due to eMule. eMule consists of two networks:  Kad - decentralized DHT-based network. UDP-based and mainly used for file search.  ED2K - centralized tracker-based network. TCP-based and used for both search and data exchange.

9 9 SIGMETRICS'09 Systems Analyzed 1. Generic eMule, which we refer to as Kad. 2. Version of eMule customized to ISP, which we refer to as KadU. Modified version of Kad developed by users in the ISP. Avoid performance problems because of the NAT at the edge of the network. Difference: KadU clients only contact other clients within the ISP. These two systems are analyzed separately because they have different characteristics.  e.g. Performance of KadU clients is much better.

10 10 SIGMETRICS'09 High-Level Statistics of Dataset Analyzed 25 hours dataset. 478 kadU clients inside the PoP contact 229,000 kadU clients inside ISP. 136 Kad clients inside the PoP contact more than 300,000 Kad clients in the Internet. 815,000 ED2K TCP connections. More than 8 million Kad/KadU UDP flows.

11 11 SIGMETRICS'09 Traffic Classification and Samples Generation Per host aggregation of flows Samples Packet trace Per flow classification using Tstat Tstat is a Passive sniffer with Deep Packet Inspection (DPI) capabilities Aggregate over 5 minute period Metrics

12 12 SIGMETRICS'09 Metrics More than 50 metrics obtained from flow records.  Consider both TCP and UDP flows.  Consider if the flow initiator is inside or outside the PoP. Examples:  Flow: average flow duration.  Data Transfer: bps sent, bps received.  Destinations: number of distinct destination IP addresses.  Failures: failure ratio [TCP only]. Choice of metric:  Intuitively important.  Used in the past in the context of P2P systems.  Can capture specific behaviors of interest to us.

13 13 SIGMETRICS'09 Challenges Very little knowledge of what kinds of undesirable behavior may exist. It is hard to clearly distinguish between normal and unwanted behavior.  P2P traffic patterns are very heterogeneous across users. Techniques relying on detecting abrupt changes may not work since undesirable behavior can:  Be exhibited by the majority of the samples.  Last throughout the observation period. e.g. due to implementation bug in the P2P system.

14 14 SIGMETRICS'09 Our Approach We use clustering techniques and manual inspection to determine undesirable behavior. Clustering:  Tens of thousands of samples and more than 50 metrics.  Clustering reduces the number of samples to study to a granularity of clusters. Domain knowledge and manual inspection:  Select regions of interest.  Interpret the results.

15 15 SIGMETRICS'09 Clustering - DBScan DBScan is a density based clustering technique.  Dense regions of points are considered a cluster.  Low density regions are considered noise. Parameter tuning and sensitivity discussed in the paper. Cluster1 Cluster2 Cluster3 Noise Number of Samples Average Packet Size [bytes]

16 16 SIGMETRICS'09 Selecting Regions of Interest - Metrics with more than One Cluster Metrics with more than one cluster and noise.  A cluster and/or noise are selected as interesting. Cluster1 Cluster2 Cluster3 Noise Number of Samples Average Packet Size [bytes] clients only send control messages

17 17 SIGMETRICS'09 Selecting Regions of Interest - Metrics with One Cluster Metrics with one cluster and noise.  Noise is typically selected as interesting. Number of Samples Cluster1: Normal clients Bits per Second Sent noise very active clients x10 5

18 18 SIGMETRICS'09 Correlating Interesting Samples Once samples in interesting regions are identified, infer undesirable behavior. Find the hosts that generate the interesting samples.  If a few hosts, anomalous behavior is a property of the hosts.  If many hosts, behavior is general to the application. Find correlation across metrics.  Rely on domain knowledge to identify this.  Ongoing work exploring use of techniques like rule association mining.

19 19 SIGMETRICS'09 Talk Outline Dataset Methodology Results  Generic Observations  Key Findings

20 20 SIGMETRICS'09 Preliminary Results For Kad:  Most metrics have one cluster and noise.  8 metrics have two clusters and noise.  2 metrics have three clusters and noise. Similar results for KadU. Sensitivity study.  Night period and day period.  One week trace.  Obtained very similar results.

21 21 SIGMETRICS'09 Samples Distribution in the Interesting Region Fraction of Hosts Generating Samples Fraction of Samples in the Interesting Region A few hosts have abnormal behavior. Abnormality spread across many hosts (circled below). Number of destination ports in range 0-1024 that receive a kad flow

22 22 SIGMETRICS'09 Talk Outline Dataset Methodology Results  Generic Observations  Key Findings

23 23 SIGMETRICS'09 DDoS Attacks Exploiting Kad Considered UDP flows classified as Kad with destination port in range 0-1024. > 50% of these flows are sent to port 53 (DNS).  > 90% of these flows are unanswered. Top most destinations were reported to be under attack. Port 53 Fraction of Unanswered UDP Flows Port Number Port 4672: Default Kad port Unanswered UDP flows are those in which the flow destination never replies.

24 24 SIGMETRICS'09 DDoS Attack Exploiting P2P Systems Redirection Attacks.  Malicious clients inject fake membership information about a victim into the system.  Innocent clients send normal protocol message to the victim. There has been some awareness of the problem in the research community - Belovin [2001], Ross [2006].  They have shown theoretical feasibility of doing the attack. But our work is one of the first to show that these attacks are prevalent in the wild.

25 25 SIGMETRICS'09 Unnecessary P2P Traffic in KadU and Kad Cluster2: Most incoming UDP flows are unanswered Cluster1 Noise Fraction of Unanswered Flows from Total Incoming UDP Flows Fraction of Samples in the Interesting Region Fraction of Hosts Generating Samples

26 26 SIGMETRICS'09 Unnecessary P2P Traffic in KadU and Kad Large amount of wasted traffic:  28% of all UDP flows incoming to PoP are unanswered. 65% due to Kad and KadU.  30% of all TCP connections incoming to the PoP fail. 50% due to KadU. Due to two reasons:  Stale membership information.  Nodes behind NAT. Staleness can be extremely long lived (e.g. tens of hours).

27 27 SIGMETRICS'09 Malicious P2P Trackers in the ED2K Network Metric: average number of TCP connections per destination IP. 94% of interesting samples generated by two hosts. Many short lived connections to two trackers reported as malicious.  Never responded to requests and closed the connections.  Likely deployed by copyright agencies (e.g. RIAA, IFPI). Similar findings by Banerjee [2008] and Siganos [2009]. Noise: Clients contact same destination more than once in 5 minutes Cluster1 Average Connections per Destination

28 28 SIGMETRICS'09 Generalizing to Other Systems Findings in BitTorrent:  Very significant amount unnecessary P2P traffic is present as in KadU. Findings in Direct Connect:  Possible DDoS attack exploiting DC++. Many TCP connections sent to port 80 of real web servers. More findings in the paper. Ongoing work studying traces from other networks.

29 29 SIGMETRICS'09 Summary One of the first works to systematically study P2P traffic to identify undesirable behavior. Shown various types of undesirable behavior of P2P systems in the wild:  DDoS attack on external servers exploiting the system.  Wasted resources.  Affect the performance of the P2P system (e.g. malicious trackers). Shown the potential of a systematic approach to uncover this behavior. Our initial analysis suggest that results hold over a range of other P2P systems.

30 30 SIGMETRICS'09 Questions?

31 31 SIGMETRICS'09 Backup Slides

32 32 SIGMETRICS'09 Encrypted Traffic in eMule 11.05.2008 12:29 eMule 0.49a released 1.08.2008 20:25 eMule 0.49b released Our trace collection

33 33 SIGMETRICS'09 Why DBScan? Does not rely on the assumption of the shape of the cluster. There is the concept of noise region You don’t need to know how many clusters you want ahead of time. But, in principle, any technique can be used. Just need a coarse way to cluster samples

34 34 SIGMETRICS'09 DBScan - Parameter Sensitivity We adjust the parameters to match our intuition of where the clusters should be if manually look at each metric.  Try to keep noise region small but not too small (at most 6% of the samples in our study). We have an automated way to get clusters.  More details in the paper

35 35 SIGMETRICS'09 Clustering for single metric instead of multiple metrics Clusters interpretation may be harder.  Typical metric distribution is very skewed.  Metrics distribution have different support. Single clustering still helps.  Automatic way to get thresholds for interesting region.  First cut observations. But this is a first step. Ongoing work on multi-metric analysis.

36 36 SIGMETRICS'09 Do you think you find all behavior or there is more? We expect there is more (so there is more work to do).  But we expect we have caught first order issues This is the first attempt on this direction. We don’t have an exhaustive of undesirable behavior  There may be other behavior we could find when the application or network setup changes.  For example, buddy problem. More to the architecture of Kad.

37 37 SIGMETRICS'09 How can you automated these, generalized to different network? First step pointing to the importance of the problem Now that is there, we could look at better ways to detect:  Changes over time  Changes across networks  For a class of P2P systems, use same list of undesirable behavior.


Download ppt "SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo."

Similar presentations


Ads by Google