Download presentation

Presentation is loading. Please wait.

Published byJaheim Edge Modified about 1 year ago

1
Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei

2
Introduction Tribes – groups with similar traits in a large graph Distinguish those that work together and move together intentionally

3
Relationship Knowledge Discovery Exploit connections among individuals to identify patterns and make predictions Discover underlying dependencies Links must be inferred

4
Graph Mining Discover Hidden Group Structures Animal Herds, Webpages, Employees Time Series Analysis Co-integration (Economics) Security and Intrusion Detection Dynamic Networks

5
Motivation National Association of Securities Dealers Fraud Collusion 4.8 Million Records 2.5 Million Reps at 560,000 Firms 100 Years of Data

6
Complications Jobs not necessarily in order (or singletons) 20% of employees hold more than one job at a time 10% begin multiple jobs (up to 16) on one day Leave gaps between employment Mergers and acquisitions

7
Model

8
Finding Anomalously Related Entities Input: Bipartite Graph: G = (R A, E) Entities: R = {r1, r2, …, rn} (People) Attributes: A = {a1, a2, …, am} (Orgs.) Entities should connect several attributes Model co-occurrence rates of pairs of attributes

9
Algorithm

10
Simple Model Measures JOBS = (Number of shared Jobs in the sequence) YEARS = (Number of Years of overlap)

11
Example Sequences

12
Probabilistic Model X = P(BrA -> BrB -> BrC -> BrD) = pa * tAB * tBC * tCD Estimate: P(start branch i) =(#reps ever at i) / (#reps in database) Tij = P(reps from i to j | #ever at i) =(#reps leave i to go to j) / (ever at i)

13
Probabilistic Model Null Hypothesis of Independent Movement Movement Not Random Split and Merge Markov Chains

14
Probabilistic Model (Different Paths) Tij becomes Vij Vij = P(move to branch j at any point after branch I | currently at i) = (# reps who go to branch j at any point after working at i) / (# reps ever at i) Now each vij >= tij and probabilities no longer sum to 1.

15
Probabilistic Model (Different Paths) Vij becomes Wij Wij = P (move to branch j at any point simultaneous to or after branch i | currently at i) = (# reps who start at j at any point simultaneous or after starting at i) / (# of reps ever at i) Now less precise in respect to direct transitions but more general

16
PROB - TIMEBINS Bins of 1 year or more 10 people worked at each branch in a bin period PiX = # reps ever at i during time X / # reps in DB yiXjY = # reps ever at I during time X and at j during time Y, where Y >= X / # reps ever at i during time X

17
PROB-NOTIME Ignores order of job moves Use original pi Zij = raw number of reps who are at both branches I and j during career Transition Pr from i to j: = (zij / # reps ever at i) != (zij / # reps ever at j) =transition Pr from j to i

18
Tribe Size

19
Pairs

20
Commonality of Job Sequence

21
Disclosure Scores

22
Homogenaity and Mobility

23

24

25
Discussion JOBS, PROB, PROB-TIME, PROB- NOTIME create tribes with higher than average disclosure scores PROB creates more cross zip code results PROB-TIME has higher phi-squared than all others PROB favors large firms

26
Discussion JOBS and YEARS compute larger connected components JOBS and PROB find same number of tribes but pick different groups as tribes

27
Conclusions With no explicit knowledge we can discover: Job transitions Geography Career track

28
Conclusions Needed: Ongoing process Multiple affiliations Arbitrary times Time is a paradox in domain

29
Thanks! Time for: Questions Comments Smart Remarks

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google