Fraud Detection with Machine Learning: A Case Study from Sift Science #GHC14 Fraud Detection with Machine Learning: A Case Study from Sift Science Katherine Loh, Sift Science October 9th, 2014 2014
What is Sift Science? Sift Science fights fraud using large-scale machine learning Clarify that it’s mostly payments fraud, such as stolen credit cards
What is Fraud? Chargebacks
What is Fraud? Chargebacks From stolen credit cards
What is Fraud? Chargebacks From stolen credit cards Teams dedicated to fighting chargebacks
What is Fraud? Chargebacks From stolen credit cards Teams dedicated to fighting chargebacks Goods lost & fees (~$20)
What is Fraud? Chargebacks Spamming users
What is Fraud? Chargebacks Spamming users Fake listings
What is Fraud? Chargebacks Spamming users Fake listings Promo program abuse
How does Sift help? Site reports page, transaction, and custom events to the Sift API We build up a model of the site’s users in real-time Site may give guidance by labeling some users as “bad” or “not bad” Site consumes scores through the API or workflow tools
TLDR; Site sends data to Sift, Sift calculates fraud scores Site consumes fraud scores
Supervised ML Human judgments on historical data (labels) Statistical analysis of training data Model finds correlations between input data and observed labels Bad or Not Bad?
Real Time! Scores are necessary to process orders Must include latest events & labels Median score latency is under 200ms
How Large is Large? 1,000+ websites 700 events / second (at peak) 350M+ IP addresses roughly $3B of transaction volume analyzed each month 1,000+ features Millions of fraud patterns
SIFT
SIFT
Magic Algorithms Naïve Bayes Logistic Regression
Network vs Customer Models Customers start on our “Network Model” With 20 “bad” labels, they move to a customer-specific model
One User, One Purchase IP Address: 203.189.24.290 Billing Name: Katherine Loh Billing Address: San Francisco, CA Email Address: katherine@siftscience.com Credit Card: 4567xxxxxxxxxxxx Item Purchased: Sleeping Bag Cost: 50.00 USD Authorization Result: Success
One User, Over Time Account created Updated credit card info Updated settings Purchased Item Updated credit card info Purchased Item Purchased Item IP Address: 203.189.24.290 Billing Name: Katherine Loh Billing Address: San Francisco, CA Email Address: katherine@siftscience.com Credit Card: 6543xxxxxxxxxxxx Item Purchased: Sleeping Bag Cost: 50.00 USD Authorization Result: Success
One User, Over Time Account is 4 hours old Account created Updated credit card info Updated settings Purchased Item Updated credit card info Purchased Item Purchased Item IP Address: 203.189.24.290 Billing Name: Katherine Loh Billing Address: San Francisco, CA Email Address: katherine@siftscience.com Credit Card: 6543xxxxxxxxxxxx Item Purchased: Sleeping Bag Cost: 50.00 USD Authorization Result: Success
One User, Over Time 2 credit card updates in user’s history Account is 4 hours old Account created Updated credit card info Updated settings Purchased Item Updated credit card info Purchased Item Purchased Item IP Address: 203.189.24.290 Billing Name: Katherine Loh Billing Address: San Francisco, CA Email Address: katherine@siftscience.com Credit Card: 6543xxxxxxxxxxxx Item Purchased: Sleeping Bag Cost: 50.00 USD Authorization Result: Success
One User, Over Time 2 credit card updates in user’s history 3 transactions in the last hour Account is 4 hours old Account created Updated credit card info Updated settings Purchased Item Updated credit card info Purchased Item Purchased Item IP Address: 203.189.24.290 Billing Name: Katherine Loh Billing Address: San Francisco, CA Email Address: katherine@siftscience.com Credit Card: 6543xxxxxxxxxxxx Item Purchased: Sleeping Bag Cost: 50.00 USD Authorization Result: Success
One Site, Many Users taylor@siftscience.com jtan123@gmail.com time taylor@siftscience.com jtan123@gmail.com beyonce@gmail.com b.yonce@gmail.com katherine@siftscience.com
x = marked bad by site owner One Site, Many Users time taylor@siftscience.com jtan123@gmail.com beyonce@gmail.com b.yonce@gmail.com katherine@siftscience.com x x x = marked bad by site owner
Transacted from same IP One Site, Many Users time taylor@siftscience.com jtan123@gmail.com beyonce@gmail.com b.yonce@gmail.com katherine@siftscience.com x x Transacted from same IP
One Site, Many Users taylor@siftscience.com jtan123@gmail.com time taylor@siftscience.com jtan123@gmail.com beyonce@gmail.com b.yonce@gmail.com katherine@siftscience.com x x Similar email addresses Transacted from same IP
Many Sites, Many Users Site 1 Site 2 Site 3
Transacted from same IP Many Sites, Many Users Site 1 Transacted from same IP Site 2 Site 3
Features Event features State features Temporal features Graph features
Event Features Properties of user’s most recent event Credit card type, billing zip code, shipping type Billing address, shipping address, product SKU
State Features Properties of user’s current state Broad Attributes: Country, time of day, browser type Identity Features: IP address, device fingerprint, cookie, name
Temporal Features Properties of user’s time series up to that point Velocities: Number of purchases in the past hour? IP addresses? Sequence Features: Last 5 actions taken? Last few geo locations?
Graph Features How the user relates to others on the sites and other sites Number of other users using the same shipping address Similarity of this user with the seller of the item (for an online marketplace)
Graph Features normal less normal
Evaluating Features
Evaluating Features
Evaluating Features
Normal Users Eat Lunch
Fraudsters Skip Lunch
Fraudsters Are Night Owls
Fraudsters Don Multiple Identities
Lessons Learned Keep customers happy
Happy Customers? accurate scores great support customer easy to use product ??? customer happiness
Lessons Learned Keep customers happy Results must be understandable
Lessons Learned Keep customers happy Results must be understandable Humans expect stability and speed
Lessons Learned Keep customers happy Results must be understandable Humans expect stability and speed External knowledge changes over time
Data Changes Over Time User labels Exchange rates IP/Geo data New features New models
Lessons Learned Keep customers happy Results must be understandable Humans expect stability and speed External knowledge changes over time Noise is everywhere
Noise is EVERYWHERE Wrong labels Duplicate labels Bad integrations Incomplete integrations Missing fields Bugs System downtime
Questions? katherine@siftscience.com
Got Feedback? Rate and Review the session using the GHC Mobile App To download visit www.gracehopper.org This is the last slide and must be included in the slide deck