Presentation is loading. Please wait.

Presentation is loading. Please wait.

David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology.

Similar presentations


Presentation on theme: "David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology."— Presentation transcript:

1 David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology Arnaud Legout, INRIA Sophia-Antipolis ReCon: Revealing and Controling PII Leaks in Mobile Network Systems DTL Workshop, Nov. 2015 Sponsored by:

2 Motivation 2  Mobile devices  Rich sensors  Ubiquitous connectivity  Key questions  What personal information is transmitted?  To whom does it go?  What can average users do about it?

3 How Frequently Is PII Leaked? 3 Basic tracking is common Significant fraction of very personal information leaked across all platforms PII leakage is pervasive! Fraction of top 100 apps leaking PII (Tested in September, 2015)

4 How to Detect PII Leaks in Mobile? 4  At the OS  Information flow analysis (static/dynamic/hybrid)  Ok solutions, but not perfect or easily deployable  In the network  Independent of OS, app store  Easy to detect if you know what PII to search for What if you don’t know the PII a priori?

5 ReCon: Automatically Identifying PII Leaks 5  Hypothesis: PII leaks have distinguishing characteristics  Is it just simple key/value pairs (e.g., “ user=R3C0N ”)? Nope, this leads to high FP/FN rates  Need to learn the structure of PII leaks  Approach: Build ML classifiers to reliably detect leaks  Does not require knowing PII in advance  Resilient to changes in PII leak formats over time  We built ReCon  Machine learning to reveal PII leaks from mobile devices  Software middleboxes to intercept and control leaks  Works on all major platforms (iOS, Android, Windows Phone)

6 ReCon: Viewing detected leaks 6  PII Category Device Identifiers Contact Information User Identifiers Credentials  User Feedback Correct Incorrect Not sure Not about me

7 Where They Know You’ve Been 7  Location information is hard to digest using text alone  WTKYB shows just how pervasive location tracking is  Creepiness factor to help users care more about privacy(?)

8 Mitigating PII Leaks 8  ReCon gives users control over leaks  Example simple strategies Block PII Modify PII Randomize identifiers Coarsen locations  Advanced mitigation (under dev) Mock user profiles Provide k-anonymity

9 How does ReCon work? 9  Key challenges for ML-based PII detection  Which classifier do we use? C4.5 Decision Tree is best trade-off between speed and accuracy  How do we train the classifier? Use traces from real users and controlled experiments Break flows into separate words that may indicate a leak Feature selection for scalability  How well are we doing? Controlled experiments In the wild: Only the users themselves know for sure! Crowdsourced reinforcement

10 Key Results: ReCon accuracy 10  How accurate is ReCon?  99% overall accuracy from controlled experiments  FPR: 2.2%, FNR: 3.5%  Why? Per-domain classifiers Decision tree captures non-trivial cases

11 Key Results: ReCon Has Good Coverage 11  How does it compare to other solutions? ReCon finds significantly more PII than IFA solutions ReCon successfully idenifies missing leaks after retraining Fraction of total leaks found

12 Key Results: User study 12  IRB-approved user study  24 iOS, 13 Android devices  20/26 responses: system useful & behavior change  165 cases of credential leaks, 94 verified  Average leaks: iOS > Android   Unexpected, suspicious leaks Recipe/cooking app tracks location Video/Game/News app leaks gender  And more… Check out http://recon.meddle.mobihttp://recon.meddle.mobi

13 Summary 13  ReCon: Provides transparency/control over PII leaks  Relies only on access to network traffic (OS independent)  Machine learning to automatically identify PII leaks  Crowdsourced reinforcement with user feedback  Works today! Check out http://recon.meddle.mobihttp://recon.meddle.mobi Sponsor: Questions? David Choffnes choffnes@ccs.neu.edu

14 Backups 14

15 Encryption and ReCon 15  What is your answer for increasing use of encryption?  Recon needs access only to plaintext flows  mcTLS, BlindBox  Route to trusted middlebox that can do MITM Works for most apps, but usually not logins  Haystack (on Android device)

16 Encryption: What is leaked? 16  Leaks over SSL (not much)  Send PII to trackers over SSL (100 apps/device) 6 iOS 2 Android 1 Windows  Problem with SSL Certification pining Not working with VPN enabled  Obfuscation  Little evidence in controlled experiment using IFA

17 Other applications of ReCon 17  K-anonymity  Explicit sharing  Allow users to control how much shared to third-parties  Obfuscation  Retrain classifiers to identify obfuscated leaks  Use static/dynamic to analysis tools that are resilient to evasion techniques

18 Deployment models 18  ReCon only needs access to network flows  VPN proxy (current deployment): tunnel to proxy server Currently supported by all mobile OSes Can run VMs anywhere in the world  Raspberry Pi In home network Enables HTTPS decryption with minimal additional risks  On device Haystack on Android  In network Awazza and other APN/middlebox deployment models

19 Methodology Details 19  Controlled experiments as ground truth  Text classification approaches  Problem: Given a network flow, whether it contains PII information or not?  Feature Extraction: Bag-of-word model Example.com /someevent?x=1&y=2 {“z”:”xx@y”} Words: someevent, x, 1, y, 2, z, xx@y,  Per-Domain classifiers (e.g. Google-Analytics) Faster (compared to one-for-all) More accurate  Library: weka

20 Why Run ReCon? 20  User incentives  Control over data leaks!  Blocking unwanted content  k-anonymity for increased privacy


Download ppt "David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology."

Similar presentations


Ads by Google