Presentation is loading. Please wait.

Presentation is loading. Please wait.

Borja Sanz, Igor Santos, Carlos Laorden, Xabier Ugarte-Pedrero and Pablo Garcia Bringas The 9th Annual IEEE Consumer Communications and Networking Conference.

Similar presentations


Presentation on theme: "Borja Sanz, Igor Santos, Carlos Laorden, Xabier Ugarte-Pedrero and Pablo Garcia Bringas The 9th Annual IEEE Consumer Communications and Networking Conference."— Presentation transcript:

1 Borja Sanz, Igor Santos, Carlos Laorden, Xabier Ugarte-Pedrero and Pablo Garcia Bringas The 9th Annual IEEE Consumer Communications and Networking Conference - Security and Content Protection, 2012 報告人 : 張文銓 On the Automatic Categorisation of Android Applications 1

2 OUT LINE 2 INTRODUCTION APPS FEATURE EXTRACTION FEATURES EXTRACTION METHOD MACHINE LEARNING CLASSIFIERS CONCLUSIONS

3 INTRODUCTION 3 For Apple devices the AppStore is the single official way to obtain applications, Android allows users to install applications that have been downloaded from markets or directly from Internet. Shabtai et al. trained machine learning models using as features the count of elements, attributes or namespaces of the parsed apk. They obtained 89% of accuracy classifying applications into only 2 categories: tools or games.

4 APPS FEATURE EXTRACTION 4 Classifying Android applications into several categories using the features extracted both from the Android Market and the application itself.

5 APPS FEATURE EXTRACTION 5 we have collected 820 applications, that have been classified in 7 categories.

6 APPS FEATURE EXTRACTION 6 Phase: 1.We describe the process of extracting features from the Android.apk files. 2. Show that it can achive high accuracy rates.

7 Extracting Features From Android.apk 7 Retrieve several features from the applications: 1.Strings contained in the application. 2. Use an open-source non-official API, called android-market-api extracted infomation from the Android Market: (1) rating, (2) number of ratings and (3) size of application. 3. Permissions of the applications.

8 Extracting Features From Android.apk 8 Permissions are stored in an XML file inside each application, named “AndroidManifest.xml”. This file declares the execution requirements of the application, such as the version of the operating system that requires or the libraries used.

9 FEATURES EXTRACTION METHOD 9 General steps we have followed for each application are: 1. We extract the permissions and the resources from the application. 2. We disassemble the sample. 3. We extract the strings from the disassembled sample. 4. We obtain data from the Android Market.

10 FEATURES EXTRACTION METHOD 10 To extract every string, we search the operational code “const-string”, that identifies the strings of the application. We process the strings using Term Frequency (TF). TF is a weight widely used in information retrieval and text mining

11 MACHINE LEARNING CLASSIFIERS 11 Machine-learning algorithms can commonly be divided into three different types depending on the training data: supervised learning ( 監督式學習 ): Bayesian Networks ( 貝氏網路 ) Decision Trees( 決策樹 ) K-Nearest Neighbour(KNN) Support Vector Machines (SVM) ( 支持向量機 ) unsupervised learning ( 無監督學習 ): 關聯規則分析 semi-supervised learning( 半監督式學習 )

12 MACHINE LEARNING CLASSIFIERS 12 Bayesian Networks: which are based on the Bayes Theorem. Algorithm: Tree Augmented Na¨ıve (TAN) [28] [28] D. Geiger, M. Goldszmidt, G. Provan, P. Langley, and P. Smyth, “Bayesian network classifiers,” in Machine Learning, 1997, pp. 131–163. Decision Trees Random Forest [19] [19] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp.5–32, 2001. K-Nearest Neighbour performed experiments for k = 1, k = 2 and k = 5 to train KNN. Support Vector Machines (SVM)

13 MACHINE LEARNING CLASSIFIERS 13 To evaluate each classifier’s capability, we measured the Area Under the ROC (Receiver Operator Characteristics) Curve (AUC), [31]. [31] Y. Singh, A. Kaur, and R. Malhotra, “Comparative analysis of regression and machine learning methods for predicting fault proneness models.” Best: Bayes TAN 0.93. Second: Random Forest 0.9.

14 CONCLUSIONS 1.There are other features from the applications that could be used to improve the detection. 2. Despite these features are inefficient to avoid malware to be uploaded into market, these features can prevent installation of malware in the smartphone. 3.It will detection good apps and bad apps if the sample enough. 14


Download ppt "Borja Sanz, Igor Santos, Carlos Laorden, Xabier Ugarte-Pedrero and Pablo Garcia Bringas The 9th Annual IEEE Consumer Communications and Networking Conference."

Similar presentations


Ads by Google