Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Similar presentations


Presentation on theme: "Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:"— Presentation transcript:

1 Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/12/11

2 Outline Introduction SNA algorithm Results and Discussion Conclusions and Future Work 2

3 Introduction The recent bankruptcy scandals in US companies such as Enron and WorldCom have increased the need to analyze electronic information – In order to define risk and identify any conflict of interest among the entities of a corporate household Identifying the relationships between entities, or corporate hierarchy is not a straightforward task – Can be extracted by analyzing the email communication data 3

4 SNA Algorithm For each mail user – Analyze and calculate several statistics for each feature of each user Construct an email network graph – Vertices represent accounts, edges represent communication between two accounts – Analysis cliques and other graph theoretical qualities – Combined to Social score 4

5 SNA Algorithm Two sets of statistics about user’s “importance” – Average response time The average time elapsed between a user sending an email and later receiving an email from that same user Considered a “response” if a received mail succeeds a sent mail within three days – Cliques(maximal complete subgraphs) find all cliques in a graph Assumptions: users associated with a larger set and frequency of cliques will be ranked higher 5

6 Cliques 6

7 Communication Networks Number of cliques – The number of cliques that the account is contained within Raw clique score – A score computed using the size of clique set Weighted clique score – A score computed using the “importance” of the people in each clique 7

8 Communication Networks Degree centrality – Deg(vi) = ∑ j a ij (a ij entry of adjacent matrix A of G) Clustering coefficient – how close the vertex and its neighbors are to being a clique 8

9 Communication Networks Mean of shortest path length from a specific vertex to all vertices in the graph G – where dij D, D is the geodesic distance matrix of G Betweeness centrality – Proportion of all geodesic distances of all other vertex that include vertex v i 9

10 Communication Networks “Hubs-and-authorities” importance – Calculates the “hubs-and-authorities” importance of each vertex J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46, 1999. 10

11 Social Score Social score – Rank users from most important to least important – Group users which have similar social scores and clique connectivity – Determine n different levels of social hierarchy within which to place all the users 11

12 Compute Social Score Scale and normalize each statistics Social score – A score between 0 and 100 12

13 Results and Discussion Using EMT – Java based email analysis engine built on a database back-end – JUNG library is used for the degree and centrality measures Present the analysis of the North American West Power Traders division of Enron Corporation 13

14 14

15 15

16 16

17 Conclusions and Future Work Enron dataset provides an excellent starting point of real world data By varying the feature weights, it is possible to – Pick out the most important individual – Group individuals with similar social qualities – Graphically draw an organization chart which approximately simulates the real social hierarchy 17

18 Conclusions and Future Work The concept of average response time can be reworked by considering the order of response Consider common email usage times for each user and to adjust the received time of email New grouping and division algorithms are being considered Graph edges should be considered into arrange users into different level 18


Download ppt "Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:"

Similar presentations


Ads by Google