Presentation is loading. Please wait.

Presentation is loading. Please wait.

NetworkProfiler: Towards Automatic Fingerprinting of Android Apps Shuaifu Dai, Alok Tongaonkar, Xiaoyin Wang, Antonio Nucci, and Dawn Song Presented by:

Similar presentations


Presentation on theme: "NetworkProfiler: Towards Automatic Fingerprinting of Android Apps Shuaifu Dai, Alok Tongaonkar, Xiaoyin Wang, Antonio Nucci, and Dawn Song Presented by:"— Presentation transcript:

1 NetworkProfiler: Towards Automatic Fingerprinting of Android Apps Shuaifu Dai, Alok Tongaonkar, Xiaoyin Wang, Antonio Nucci, and Dawn Song Presented by: Junaed Bin Halim

2 Outline Goal Motivation System Overview Evaluation Limitations Related Work Conclusion/Question

3 Goal What? – Develop a systematic tool Automatically generate network profiles In HTTP traffic – To Identify Apps How? – By detecting fingerprints / signatures

4 Motivation Why do we need to identify applications? – To classify traffic generated by the applications for better network management. – Operators can have a clear visibility into their network Better security: Intrusion detection Better throughput: Real time video over download etc. What is traffic classification? – Categorize network traffic according to various parameters, e.g., port number or protocol

5 Motivation (contd.) Why only Android apps? – Smartphone usage is increasing 488m smartphones vs 415m pcs in 2011 – Users installs applications (apps) on their smartphones (avg 26 ~ 41) Most applications generate network traffic – Researchers prefer android over iOS (openness, availability of tools etc.) Why http traffic? – >80% smartphone traffic is http.

6 Observation An app can have many different network behaviors Important to cover as many network behavior as possible Key Idea: Identify the invariant parts of the flows belonging to an app

7 Network Profiler System Overview

8 Fingerprint Extractor : Parser Each HTTP request is composed of 3 parts – m: method – p: page pc: page component fn: file name – q: query k-v: key-value pair

9 Fingerprint Extractor: Clusterer Uses agglomerative clustering to group HTTP requests by similarities. How to find similarity? – Use Jaccard index as a measure of similarity

10 Fingerprint Extractor: Clusterer (2) Cluster – Distance between pages, : 1 - similarity – Distance between queries, : 1 – similarity – Distance between headers, + )/2 – Same cluster if [ = 0.6] – Merge cluster A and B if cluster C is similar to both.

11 Fingerprint Extractor: Generation Build state machine for each cluster Merge state machines that contain the same hosts

12 Fingerprint Extractor: Generation(2) Query-values: – Some have the app name embedded Extract keywords from manifest file – Any unique keyword is sufficient Third-party traffic: – Presence of app_id or key

13 Droid Driver Executes android apps and collects the network traces Consists of two components – Random Tester For traffic between the app provider, or third-party – Directed Tester For traffic between a CDN, or others Runs either component for an app

14 Droid Driver: Random Tester Runs the app randomly – Application events are generated at random – For applications that generate Traffic between the app server Third party traffic – Admob, Google DoubleClick – Omniture, Google Analytics Efficient

15 Droid Driver: Directed Tester Not all app has unique id in its traffic – In some cases, the unique id is developer id (Angry Birds, ESPN) Directed Tester – Consists of 3 components Path Recorder Heuristic Path Generator Path Replayer

16 Droid Driver: Directed Tester(2) Path Recorder – Records user events in an emulator Heuristic Path Generator – Generates unexplored paths Path Replayer – Forces the app to execute a given path – Captures the network trace

17 Evaluation Downloaded 90k apps 70k uses internet For 2 different traffic – Ad Traffic Identified ad library from the manifest files of 32k apps – 25k uses 1 ad library – 4k, 1k, 600, and 400 apps uses 2,3,4 and 5 ad libraries – Less than 300 uses more than 5 – Non-Ad Traffic Considered 6 apps only – Youtube, flixter, espn, score center, cnet news, pandora, and zedge

18 Evaluation: Ad Traffic

19

20 Evaluation: Non-Ad Traffic Manually generated seed-action-path. Used Directed Testing to generated traffic. All ads traffic were excluded. Remaining traffic was annotated with the name of the app.

21 Results All applications were successfully identified in their experiment – For which network profile was generated – Not all were verified

22 Limitations Only identify apps that generate network traffic – Most application does these days Only works for HTTP traffic – Does not work for HTTPS – Does not work for apps that use proprietary protocols (skype etc.) Uses supervised learning – Applications must be known prior to classification. – Need new signatures if app developer changes the http request structure

23 Related Work Several works tried to classify traffic – Packet inspection Port based – Historically many applications utilize “well-known” ports – Classifier looks only the port in TCP SYN packets – Not all applications have registered port with IANA Payload based – Payload is visible, known to the classifier – Does not work if payload is obfuscated/encrypted – Packet Inspection is computationally expensive

24 Related Work (contd.) Classification based on statistical traffic properties – empirical models of connection characteristics - such as bytes, duration, arrival periodicity – flow duration, packet inter-arrival time and packet size and byte profile – distributions of packet lengths and packet inter- arrival times – Etc.

25 Related Work (contd.) Machine Learning – Based on statistical properties of the traffic – Supervised Learning (Classification) – Unsupervised Learning (Clustering) Different work uses different ML algorithms See: “A Survey of Techniques for Internet Traffic Classification using Machine Learning” - Thuy T.T. Nguyen, Grenville Armitage

26 Related Work : Examples Discoverer : 2007 – Automatically reverse engineers the protocol message formats of an application from its network trace Application session : group of messages Message format specification : sequence of fields Common field semantics: length, offset, pointer, cookie, endpoint-address etc. – Discoverer derives message format specification Using cluster

27 Related Work: Examples EarlyBird: 2004, Polygraph: 2005, Hamsa : 2006 – Detects previously unknown worms and viruses – Generates signatures of worms by identifying common byte flows in the network traffic

28 Related Work (contd.) Intrusion detection – 2 approaches Signature based Anomaly based This paper uses signature based application classification Anomaly-based detection – Monitors system activity to classify

29 Anomaly Based Detection Triggers alarm when some type of unusual behavior occurs on the network. – Anything that deviates from “normal” is unusual Heuristic based Example: – Protocol anomaly: HTTP traffic on a non-standard port – Application anomaly: A segment of binary code in a user password. – Statistical anomaly: Too much UDP compared to TCP traffic.

30 Signature Based vs Anomaly Based Signature Based: – Strength: Precise if signatures are correctly generated – Weakness: Requires prior knowledge about the signatures Anomaly Based: – Strength: Has the potential to detect new or unknown attacks – Weakness: Often results in false alarms due to the difficulty in modeling the “norm”

31 Related Work: Application Profiling Profiledroid: 2012 – Profiles applications at 4 layers: Static layer, User layer, Operating system layer, and Network layer Network layer metrics: Traffic intensity, Origin of traffic, CDN + Cloud traffic, Google traffic, Third-party traffic, Incoming vs outgoing traffic, Number of distinct traffic sources, Ratio between Http vs Https traffic Relies completely on users running apps to generate traffic

32 Problems with existing works Not Scalable Requires user’s involvement / not automatic Coupled with the underlying TCP/Application layer protocol

33 Inter-component control flow graph Used to specify control flow in android applications Model components: – Activity – Service – Broadcast receivers External Signals: User Events Internal Signals: Generated by method calls See: http://danious.files.wordpress.com/2013/05/dominguezthesis2.pdfhttp://danious.files.wordpress.com/2013/05/dominguezthesis2.pdf

34 Inter-component control flow graph (contd.)

35

36 Why this paper in CSCE 715? Network operators can provide better security for their network – Block malicious traffic – Apply traffic engineering Is that all?

37 The smartphone app you use reveal your personality – Cornell University Study, 2011 Appthusiasts Appcentrics Live Wires Creators Connectors Apprentices – App market research firm Flurry Analytics also confirms this http://www.news.cornell.edu/stories/2011/02/trevor-pinch-links-app-usage-personality-types http://sachendra.wordpress.com/2011/05/11/the-smartphone-apps-you-use-reveal-your-personality/ http://wallstcheatsheet.com/stocks/can-your-apple-device-app-usage-reveal-your-personality.html/

38 Conclusion NetworkProfiler can identify applications with high precision – Uses network trace generated by the apps – Needs to know the patterns of generated traffic beforehand – Works only for known applications DirectedTesting can automate traffic generation from all paths of an application

39 Questions?


Download ppt "NetworkProfiler: Towards Automatic Fingerprinting of Android Apps Shuaifu Dai, Alok Tongaonkar, Xiaoyin Wang, Antonio Nucci, and Dawn Song Presented by:"

Similar presentations


Ads by Google