Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fraud Detection with Machine Learning: A Case Study from Sift Science

Similar presentations


Presentation on theme: "Fraud Detection with Machine Learning: A Case Study from Sift Science"— Presentation transcript:

1 Fraud Detection with Machine Learning: A Case Study from Sift Science
#GHC14 Fraud Detection with Machine Learning: A Case Study from Sift Science Katherine Loh, Sift Science October 9th, 2014 2014

2 What is Sift Science? Sift Science fights fraud using
large-scale machine learning Clarify that it’s mostly payments fraud, such as stolen credit cards

3 What is Fraud? Chargebacks

4 What is Fraud? Chargebacks From stolen credit cards

5 What is Fraud? Chargebacks From stolen credit cards
Teams dedicated to fighting chargebacks

6 What is Fraud? Chargebacks From stolen credit cards
Teams dedicated to fighting chargebacks Goods lost & fees (~$20)

7 What is Fraud? Chargebacks Spamming users

8 What is Fraud? Chargebacks Spamming users Fake listings

9 What is Fraud? Chargebacks Spamming users Fake listings
Promo program abuse

10 How does Sift help? Site reports page, transaction, and custom events to the Sift API We build up a model of the site’s users in real-time Site may give guidance by labeling some users as “bad” or “not bad” Site consumes scores through the API or workflow tools

11 TLDR; Site sends data to Sift, Sift calculates fraud scores Site consumes fraud scores

12 Supervised ML Human judgments on historical data (labels)
Statistical analysis of training data Model finds correlations between input data and observed labels Bad or Not Bad?

13 Real Time! Scores are necessary to process orders
Must include latest events & labels Median score latency is under 200ms

14 How Large is Large? 1,000+ websites 700 events / second (at peak)
350M+ IP addresses roughly $3B of transaction volume analyzed each month 1,000+ features Millions of fraud patterns

15 SIFT

16 SIFT

17 Magic Algorithms Naïve Bayes Logistic Regression

18 Network vs Customer Models
Customers start on our “Network Model” With 20 “bad” labels, they move to a customer-specific model

19 One User, One Purchase IP Address: Billing Name: Katherine Loh Billing Address: San Francisco, CA Address: Credit Card: 4567xxxxxxxxxxxx Item Purchased: Sleeping Bag Cost: USD Authorization Result: Success

20 One User, Over Time Account created Updated credit card info Updated
settings Purchased Item Updated credit card info Purchased Item Purchased Item IP Address: Billing Name: Katherine Loh Billing Address: San Francisco, CA Address: Credit Card: 6543xxxxxxxxxxxx Item Purchased: Sleeping Bag Cost: USD Authorization Result: Success

21 One User, Over Time Account is 4 hours old Account created
Updated credit card info Updated settings Purchased Item Updated credit card info Purchased Item Purchased Item IP Address: Billing Name: Katherine Loh Billing Address: San Francisco, CA Address: Credit Card: 6543xxxxxxxxxxxx Item Purchased: Sleeping Bag Cost: USD Authorization Result: Success

22 One User, Over Time 2 credit card updates in user’s history
Account is 4 hours old Account created Updated credit card info Updated settings Purchased Item Updated credit card info Purchased Item Purchased Item IP Address: Billing Name: Katherine Loh Billing Address: San Francisco, CA Address: Credit Card: 6543xxxxxxxxxxxx Item Purchased: Sleeping Bag Cost: USD Authorization Result: Success

23 One User, Over Time 2 credit card updates in user’s history
3 transactions in the last hour Account is 4 hours old Account created Updated credit card info Updated settings Purchased Item Updated credit card info Purchased Item Purchased Item IP Address: Billing Name: Katherine Loh Billing Address: San Francisco, CA Address: Credit Card: 6543xxxxxxxxxxxx Item Purchased: Sleeping Bag Cost: USD Authorization Result: Success

24 One Site, Many Users taylor@siftscience.com jtan123@gmail.com
time

25 x = marked bad by site owner
One Site, Many Users time x x x = marked bad by site owner

26 Transacted from same IP
One Site, Many Users time x x Transacted from same IP

27 One Site, Many Users taylor@siftscience.com jtan123@gmail.com
time x x Similar addresses Transacted from same IP

28 Many Sites, Many Users Site 1 Site 2 Site 3

29 Transacted from same IP
Many Sites, Many Users Site 1 Transacted from same IP Site 2 Site 3

30 Features Event features State features Temporal features
Graph features

31 Event Features Properties of user’s most recent event
Credit card type, billing zip code, shipping type Billing address, shipping address, product SKU

32 State Features Properties of user’s current state
Broad Attributes: Country, time of day, browser type Identity Features: IP address, device fingerprint, cookie, name

33 Temporal Features Properties of user’s time series up to that point
Velocities: Number of purchases in the past hour? IP addresses? Sequence Features: Last 5 actions taken? Last few geo locations?

34 Graph Features How the user relates to others on the sites and other sites Number of other users using the same shipping address Similarity of this user with the seller of the item (for an online marketplace)

35 Graph Features normal less normal

36 Evaluating Features

37 Evaluating Features

38 Evaluating Features

39 Normal Users Eat Lunch

40 Fraudsters Skip Lunch

41 Fraudsters Are Night Owls

42 Fraudsters Don Multiple Identities

43 Lessons Learned Keep customers happy

44 Happy Customers? accurate scores great support customer
easy to use product ??? customer happiness

45 Lessons Learned Keep customers happy Results must be understandable

46

47

48

49

50

51 Lessons Learned Keep customers happy Results must be understandable
Humans expect stability and speed

52 Lessons Learned Keep customers happy Results must be understandable
Humans expect stability and speed External knowledge changes over time

53 Data Changes Over Time User labels Exchange rates IP/Geo data
New features New models

54 Lessons Learned Keep customers happy Results must be understandable
Humans expect stability and speed External knowledge changes over time Noise is everywhere

55 Noise is EVERYWHERE Wrong labels Duplicate labels Bad integrations
Incomplete integrations Missing fields Bugs System downtime

56 Questions?

57 Got Feedback? Rate and Review the session using the GHC Mobile App
To download visit This is the last slide and must be included in the slide deck


Download ppt "Fraud Detection with Machine Learning: A Case Study from Sift Science"

Similar presentations


Ads by Google