Presentation is loading. Please wait.

Presentation is loading. Please wait.

RPI Confidential - DO NOT DISTRIBUTE 1 Discovering Communities Among File Shares Aaron Sherman.

Similar presentations


Presentation on theme: "RPI Confidential - DO NOT DISTRIBUTE 1 Discovering Communities Among File Shares Aaron Sherman."— Presentation transcript:

1 RPI Confidential - DO NOT DISTRIBUTE 1 Discovering Communities Among File Shares Aaron Sherman

2 RPI Confidential - DO NOT DISTRIBUTE2 Outline Background Information Background Information Problem Statement Problem Statement Data Collection Data Collection Model Model Results Results

3 RPI Confidential - DO NOT DISTRIBUTE3 Background Information Flatlan (www.flatlan.com) is used on LAN’s to find shared files (flat view of the hierarchy of files on a lan) Flatlan (www.flatlan.com) is used on LAN’s to find shared files (flat view of the hierarchy of files on a lan)www.flatlan.com Napster, Kaaza, Gnutella are used to share files across the internet Napster, Kaaza, Gnutella are used to share files across the internet

4 RPI Confidential - DO NOT DISTRIBUTE4 Background (continued) Advantages of Flatlan over other P2P LAN’s are very fast compared to the internet as a whole LAN’s are very fast compared to the internet as a whole Different server per network, allowing very simple and effective scaling, and high levels of customization Different server per network, allowing very simple and effective scaling, and high levels of customization No software to download (for end user) No software to download (for end user) Maybe more legal then other P2P models Maybe more legal then other P2P models

5 RPI Confidential - DO NOT DISTRIBUTE5 Background (continued) How files are shared Users sharing files form a power law (roughly 80% of the files are shared by 20% of users) Users sharing files form a power law (roughly 80% of the files are shared by 20% of users)

6 RPI Confidential - DO NOT DISTRIBUTE6 Problem Statement Want to model the sharing of files as a graph and find patterns Want to model the sharing of files as a graph and find patterns Every graph is generated from one or more keyword searches on a flatlan database Every graph is generated from one or more keyword searches on a flatlan database A node on a graph represents a computer who is sharing at least one file that matched the search criteria A node on a graph represents a computer who is sharing at least one file that matched the search criteria An edge is drawn between two nodes if there is at least one file that is “identical” An edge is drawn between two nodes if there is at least one file that is “identical” Identical – (FileName1,Size1)= (FileName2,Size2) Identical – (FileName1,Size1)= (FileName2,Size2)

7 RPI Confidential - DO NOT DISTRIBUTE7 Example Alice Bob Carol Dave Eminem - The Real Slim Shady.mp3, 6.57MB Eminem - The Marshall Mathers LP - 07 - The Way I Am.mp3, 4.42MB (Eminem) The Real Slim Shady.mp3, 6.4MB

8 RPI Confidential - DO NOT DISTRIBUTE8 Other Design Issues //128.113.147.101/share/music/Eminem/E minem - The Marshall Mathers/18- criminal.mp3 – is that valid? //128.113.147.101/share/music/Eminem/E minem - The Marshall Mathers/18- criminal.mp3 – is that valid? Should word stemming be allowed? Should word stemming be allowed? What about artists with similar names like doors vs 3 doors down? What about artists with similar names like doors vs 3 doors down?

9 RPI Confidential - DO NOT DISTRIBUTE9 Problem Statement (continued) Study the properties of the resulting graphs. Study the properties of the resulting graphs. Study the evolution of these graphs (over a period of three weeks). Study the evolution of these graphs (over a period of three weeks). Study the differences between several networks. Study the differences between several networks. Try to guess the impact of other P2P. Try to guess the impact of other P2P. Propose a simple model of altruism from the given data. Propose a simple model of altruism from the given data.

10 RPI Confidential - DO NOT DISTRIBUTE10 Data Collected Collected data for three weeks time Collected data for three weeks time RPI, Cable Modem Network, Bryant, WNEC RPI, Cable Modem Network, Bryant, WNEC Sryacuse and U. Texas for shorter time Sryacuse and U. Texas for shorter time Only looked at the mp3 sub database Only looked at the mp3 sub database Searched for popular artists in several genres Searched for popular artists in several genres Popular, Rap, Metal, Classic Rock, Classical… Popular, Rap, Metal, Classic Rock, Classical… Eminem, Metallica, Madonna, Mozart, Beatles Eminem, Metallica, Madonna, Mozart, Beatles

11 RPI Confidential - DO NOT DISTRIBUTE11 Major Observations All the graphs basically look the same, regardless of music genre, date collected, or location All the graphs basically look the same, regardless of music genre, date collected, or location Central group of well connected nodes, with a “tree” of other nodes off of them Central group of well connected nodes, with a “tree” of other nodes off of them Some bigger graphs have more then one connected component Some bigger graphs have more then one connected component Many unconnected nodes, or nodes with few neighbors Many unconnected nodes, or nodes with few neighbors

12 RPI Confidential - DO NOT DISTRIBUTE12 Mozart at RPI

13 RPI Confidential - DO NOT DISTRIBUTE13 Observed Results There are a large number of small cliques (a file is usually shared by a group of common friends). There are a large number of small cliques (a file is usually shared by a group of common friends). Popular songs make up many of the edges Popular songs make up many of the edges Many unconnected nodes Many unconnected nodes People often rename tracks, and then the songs are no longer “the same” People often rename tracks, and then the songs are no longer “the same” People have unique songs People have unique songs RPI students may like classical more then other schools RPI students may like classical more then other schools

14 RPI Confidential - DO NOT DISTRIBUTE14 Variations in Songs eminem - 13 - drug ballad.mp3,48052771 eminem - 13 - drug ballad.mp3,48074531 eminem - 13 - drug ballad.mp3,71966721 eminem - 13 - drug ballad.mp3,71985021 eminem - 13 - superman.mp3,84085023 eminem - 13 - superman.mp3,840863015 eminem - 13 - superman.mp3,84090881 eminem - 14 - amityville.mp3,40751031 eminem - 14 - amityville.mp3,40793681 eminem - 14 - amityville.mp3,61049481 eminem - 14 - hailie's song.mp3,51797302 eminem - 14 - hailie's song.mp3,76963001 eminem - 14 - hailies song.mp3,769642816 eminem - 14 - hailies song.mp3,76984321 eminem - 14 - rock bottom.mp3,34251761 eminem - 14 - rock bottom.mp3,34281871

15 RPI Confidential - DO NOT DISTRIBUTE15 Distribution of repeated Files

16 RPI Confidential - DO NOT DISTRIBUTE16 File Name Repeats

17 RPI Confidential - DO NOT DISTRIBUTE17 Distribution of Matched Keywords for eminem at RPI

18 RPI Confidential - DO NOT DISTRIBUTE18 Why do People share Give to the community - altruism Give to the community - altruism Because they can – technical superiority over peers Because they can – technical superiority over peers In order to trade files, requires “everyone” to share something then “everyone” can benefit – think for the group – Nash? In order to trade files, requires “everyone” to share something then “everyone” can benefit – think for the group – Nash?

19 RPI Confidential - DO NOT DISTRIBUTE19 Conclusion Live demonstration with W3Pal Live demonstration with W3Pal Questions and Answers Questions and Answers


Download ppt "RPI Confidential - DO NOT DISTRIBUTE 1 Discovering Communities Among File Shares Aaron Sherman."

Similar presentations


Ads by Google