Siegfried Rasthofer, Steven Arzt, Marc Miltenberger, Eric Bodden

Harvesting Runtime Values in Android Applications That Feature Anti-Analysis Techniques
Siegfried Rasthofer, Steven Arzt, Marc Miltenberger, Eric Bodden Presented by Moana Stirling

Distinguishing malware from benign software
The Google Play store contains over 2.8 million apps. 99% of mobile malware is on Android. How do you decide what is safe and what isn’t? Runtime values can often reveal malware, i.e. a blacklisted phone number that a message is sent to. However many malware attempt to obfuscate these values. So just to refresh some information covered in this course, the Google Play store contains over 2.8 million apps, and 99% of all mobile malware is on Android. Being able to distinguish between which of these apps are malicious and which are benign is an important task. Being able to detect these malicious apps automatically en masse is a major complication of this problem. For example Google Bouncer, the tool aimed to identify malware submitted to the Play store, cannot identify some sophisticated malware. Having improved automatic detection tools could assist with this. Firstly how can we begin to determine what apps are safe and what apps are malicious? Runtime values often reveal malicious intent, for example if an app is sending a message to a phone number that is from a known list of charged numbers used in malware, then the appearance of this number indicates the app is malware. Other security relevant runtime values include targets of reflective method calls and URLs that data is sent to. These values however are often difficult to extract even from benign software and several notorious examples of malware increase this difficulty by storing these values encrypted, only decrypting them at runtime.

Example Situation Here’s a simplified example situation from the article. The main purpose of this code is that in lines 6-7 it sends an SMS message to a phone number. However this purpose is obfuscated. In lines 6-7 the class and method that send the message are called via reflection. In lines 4-5 the actual values of the class and method names are stored as encrypted strings.

Current Approaches Manual deduction – Analyst must understand and reconstruct decrypt method; time consuming and tedious. Static Analysis – Many current approaches do not support reflection or only support constant target strings. Other approaches are often inadequate in other ways. Dynamic Analysis – Generally impossible to cover all execution paths of code. Current approaches do not scale well and can take hours. Dynamic Analysis – Vulnerable to logic/time bombs. Manually attempting to deduce the purpose of this code requires determining what the decrypt method is and reconstructing it before it can be determined what the class and method reflectively called are which is a time consuming and tedious process. Many current static analysis approaches cannot handle situations that involve reflection and only support constant values of strings, which are not frequent situations in malware. Other approaches fail in other respects, especially when native code is used in producing values. Ultimately static analysis can never provide a complete picture of runtime values. In dynamic analysis one of the greatest issues is that it is generally impossible to cover all code execution paths, either automatically or manually, in finite time making complete dynamic analysis difficult or impossible. Current approaches to complete dynamic analysis do not scale well and can takes hours to perform even on medium-sized apps. Additionally dynamic analysis is vulnerable to so called logic and time bombs. Logic bombs prevent malicious code from being executed when it is determined that the app is being run in a testing environment. Time bombs prevent the app from running malicious code for long periods of time or until a significant/unusual event, like rebooting the phone, has occurred.

Example Logic Bomb In the example code lines 1-2 contain a logic bomb which prevents running malicious code when it is determined that the app is running within an emulator, a common situation for testing apps.

The Problem: How to extract these values?
The problem then is how do we extract these runtime values and, more specifically, how can it be done automatically?

HARVESTER A tool aimed to fully automatically extract runtime values from Android applications. Designed to extract these values even from state-of-the-art obfuscated code. Logging point: <v, s>, variable or field access v, and statement s such that v is in scope at s. Value of interest: Concrete runtime value of v at a logging point <v, s>. HARVESTER RUNTIME VALUES This article proposes HARVESTER; a tool designed to automatically extract runtime values from Android applications, specifically those under state-of-the-art obfuscation. HARVESTER works by returning the values of interest at some user defined logging points. Defining these logging points is the only user interaction required. A logging point is a code statement and some variable in scope during the statement. A value of interest is some concrete value of a variable retrieved during runtime from a logging point. For example here is a statement, s, and a couple of possible logging points. Logging points could also be set for arg2, arg4, arg5, however in the SendTextMessage method these arguments do not provide any security-relevant information. Source: Statement s: sendTextMessage(targetNumber, arg2, messageText, arg4, arg5) Possible logging points: <targetNumber, s> <messageText, s>

Harvester workflow HARVESTER works by incorporating both static and dynamic analysis techniques. To begin HARVESTER reads the APK and a configuration file listing the logging points. In Part A the static analysis occurs. Slices are computed from the APK for each logging point. A slice is essentially a portion of the program that has had sections removed such that the execution is the same as the original with respect to some slicing criterion, in the case of HARVESTER the slicing criterion are the logging points. So each slice consists only of the code required to reach a logging point and compute the values of interest therein. In Part B the dynamic analysis occurs. The slices are used to construct a reduced APK that contains only the code needed to compute the values of interest and an executor activity. This executor activity invokes the computed slices and reports the values of interest. This APK is then executed and the runtime values retrieved. Part C is an optional step that enables HARVESTER to be used in combination with existing analysis tools or to handle reflection. Part A: Static analysis; produces slices from APK necessary to compute values of interest. Part B: Dynamic analysis; Reduced APK is constructed from slices. Reduced APK is then executed and runtime values retrieved. Part C: Optional; can be used to combine HARVESTER with existing analysis tools or to handle reflection.

Revisiting the example code, above we have the original code and below the sliced version of the code after Part A is executed. The values of interest are clazz, method and messageText as highlighted in the sliced code. So beginning at the top you can see that lines 1-2 in the original code, the emulator check logic bomb, have been removed from the sliced code. This is because the slicing criteria, the values of interest, are only reachable if that branch is not followed through. Next in the original code, line 3, we can observe another feature of the slicing process. Conditional statements that depend on the execution environment and would not be removed otherwise, in this case the check for whether the phones sim is from the US or not, are replaced by simple Boolean variables. This means that the slices values of interest can be produced for all environments, despite what the actual testing environment is, by simply changing the value of this variable. So here the statement is replaced in line 1 of the sliced code with the variable EXECUTOR_1. Finally in the sliced code, line 4, you can see where HARVESTER reports the values of interest just before the logging point.

PART B: Dynamic Execution of Reduced APK
No user interaction required. Reduced app does not contain any GUI elements. Assembles each slice into a single new method. Executor activity calls each method sequentially immediately after app starts. During the dynamic analysis phase no user interaction is required. The reduced app also does not contain any GUI elements which make testing these elements via a user or automated-test driver unnecessary. Each slice from the previous phase is assembled into a single method. The executor activity that is added to the new APK then calls each method one after the other. Each method is run for all possible combinations of the slice parameters. For example in this figure, showing the execution of the earlier example code: in MainActivity(), the executor activity, EXECUTOR_1 is set to false,; the method for the code slice is called and values reported; EXECUTOR_1 is set to true; And the method is called again;

PART C: Runtime Value Injection
Optional step; can be used to combine HARVESTER with existing tools or handle reflection Static analysis techniques require a call graph; HARVESTER can remove reflective method calls that interfere with this. This is how HARVESTER itself handles reflective calls. Static analysis tools require the construction of a call graph, however call graph construction fails in the presence of reflective method calls. HARVESTER can aid those off-the-shelf tools by replacing these reflective calls. This is what the enhanced APK produced here is, an APK where the reflective calls have been replaced by ordinary method calls using the values resolved during dynamic execution. To the best of the researchers knowledge this is the first fully-automated approach that implements such a value injection. HARVESTER itself handles reflective calls through this mechanism. Essentially on the first pass of Part A an incomplete call graph is constructed. Data values are then extracted during Part B and during Part C the reflective calls are substituted with regular method calls using those values. Iterating through this process again will result in a more complete call graph and more extracted values. Performing this is shown in the figure where the it loops back on itself from Part C to Part A.

Evaluation: HARVESTER’s precision and recall
Precision and recall evaluated on 12 complex examples, manually verified results. HARVESTER detected at least one value for 86.6% of all logging points. All values HARVESTER discovered were actual runtime values, 100% precision on this dataset. Recall was evaluated based on the extracted SMS numbers, SMS messages and shell commands. HARVESTER extracted all runtime values for these categories, 100% recall for these categories. Precision and recall were evaluated on 12 complex example apps. This low number was chosen as results were manually verified. Harvester detected at least one value of interest for 745 out of all 860 logging points, 86.6%. The article claims that the fraction missed is due to limitations of HARVESTER rather than inaccuracies with HARVESTER itself. All values HARVESTER discovered for logging points were verified as actual runtime values, 100% precision on this dataset which indicates HARVESTER’s techniques work very accurately when logging points are able to be reached. Evaluation of recall was based on the extracted SMS numbers, SMS messages and shell commands of the test data as these values are among the most important for malware investigation. HARVESTER extracted all runtime values for these categories, a 100% recall for these categories, which definitely indicates HARVESTER is a useful tool for identifying current malware.

Evaluation: HARVESTER compared to other tools and efficiency of HARVESTER.
When compared to dynamic analysis tools over 150 malware samples above results were obtained. HARVESTER was compared to static analysis tool SAAF over 6,100 malware samples. SAAF was unable to extract many values of interest. When tested on 16,799 apps, HARVESTER took about 2.5 mins per app. HARVESTER was directly compared to several dynamic analysis tools as shown in this table. HARVESTER covered about 3 – 4 times as many logging points as other analysis tools. The article proposes the reason behind this is that logic bombs based off emulator detection prevented the other tools from reaching the majority of logging points. HARVESTER was also compared to a static analysis tool, which performed quite poorly in comparison as well. For example the other tool was unable to extract any values of interest from several of the examples from the precision and recall testing and was generally unable to extract SMS messages. In terms of efficiency, when HARVESTER was tested on 16,799 apps it generally took 2.5 minutes to complete analysis. This indicates that HARVESTER is a good candidate for mass analyses, such as for automatic testing of app store submissions.

Summary HARVESTER is a strong automatic tool for extracting runtime values from apps using current state-of-the-art obfuscation To summarise, HARVESTER is a very strong tool for extracting runtime values from obfuscated apps and has clear potential for assisting in distinguishing malicious apps from benign apps, especially when dealing with obfuscation popular amongst “current” malware.

Criticism No experimental comparison to other hybrid analysis tools.
Recall measurement based upon few categories, about 14% of all security-relevant logging points could not be reached. Vulnerable to time bombs still, especially in massive analysis setting. Unable to retrieve logging points in native code, native code is commonly used by malware authors. HARVESTER does provide significant improvements over many current static and dynamic analysis tools. However HARVESTER was not directly compared to other hybrid approaches through experimentation, only through theoretical comparisons. Comparing HARVESTER directly to other hybrid approaches through experimentation would likely provide a better view of where HARVESTER fits into the analysis tool landscape. It is also somewhat unusual that the article bases its recall assessment off only a few of the categories of logging points the experiment was run against. Asserting a 100% recall rate based on this when out of all the security-relevant logging points 14% were not reached indicates HARVESTER is a strong candidate for detecting malware that uses SMS messages or shell commands but it does not provide an indication of its usefulness in detecting other forms of malware. HARVESTER incorporates a timeout mechanism, the standard is 10 minutes. If an app is able to ensure that a time bomb exists in the creation of values of interest it would be able to increase the time taken to process the slice HARVESTER generates and potentially induce a timeout. While this is not a major issue when analysing specific apps, as the timeout can be manually increased, when doing massive automated analysis, such as part of a system like Google Bouncer, this could result in malware not being identified. Of course such obfuscation would affect the actual size and runtime of the original app. HARVESTER can handle native code within slices, so long as the logging points themselves are not composed from native code. If this is the case these logging points will be missed. The article asserts that currently this is not a major issue as data encoded in this way is often leaked in the Dalvik bytecode. However malware authors already have a tendency to encode information in native code and it seems fairly easy for authors to modify their apps to include their logging points in native code and so avoid detection by HARVESTER. This is not an easily overcome issue with HARVESTER either as HARVESTER is based upon Soot, a tool for Dalvik bytecode manipulation. HARVESTER does have other limitations, for example it has issues with inter-process communication, where if a value is computed in one activity and then sent to another where it would be picked up by a logging point HARVESTER would not be able to reconstruct this. However there are tangible solutions to these problems pointed out in the article; for example HARVESTER can be extended with other tools based upon Soot that handle inter-process communication. While the time bomb situation does result in a significant trade off for the app creator, natively encoding the logging points does not, indicating a serious potential security vulnerability with HARVESTER in the future. So to conclude: HARVESTER is a very powerful tool, it works very well at detecting current malware, however there are significant potential vulnerabilities that could be exploited by future malware.

THANKs for listening Questions?

Siegfried Rasthofer, Steven Arzt, Marc Miltenberger, Eric Bodden

Similar presentations

Presentation on theme: "Siegfried Rasthofer, Steven Arzt, Marc Miltenberger, Eric Bodden"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Siegfried Rasthofer, Steven Arzt, Marc Miltenberger, Eric Bodden

Similar presentations

Presentation on theme: "Siegfried Rasthofer, Steven Arzt, Marc Miltenberger, Eric Bodden"— Presentation transcript:

Similar presentations

About project

Feedback