Borja Sanz, Igor Santos, Carlos Laorden, Xabier Ugarte-Pedrero and Pablo Garcia Bringas The 9th Annual IEEE Consumer Communications and Networking Conference.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
Advertisements

Data Mining Classification: Alternative Techniques
An Overview of Machine Learning
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
A METHODOLOGY FOR EMPIRICAL ANALYSIS OF PERMISSION-BASED SECURITY MODELS AND ITS APPLICATION TO ANDROID David Barrera, H. Güne¸s Kayacık, P.C. van Oorschot,
A Platform for the Evaluation of Fingerprint Positioning Algorithms on Android Smartphones C. Laoudias, G.Constantinou, M. Constantinides, S. Nicolaou,
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
COMP 328: Midterm Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Sentence Classifier for Helpdesk s Anthony 6 June 2006 Supervisors: Dr. Yuval Marom Dr. David Albrecht.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
SPAM DETECTION USING MACHINE LEARNING Lydia Song, Lauren Steimle, Xiaoxiao Xu.
Automated malware classification based on network behavior
CS 5604 Spring 2015 Classification Xuewen Cui Rongrong Tao Ruide Zhang May 5th, 2015.
A Hybrid Model to Detect Malicious Executables Mohammad M. Masud Latifur Khan Bhavani Thuraisingham Department of Computer Science The University of Texas.
Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.
A.C. Chen ADL M Zubair Rafique Muhammad Khurram Khan Khaled Alghathbar Muddassar Farooq The 8th FTRA International Conference on Secure and.
A METHODOLOGY FOR EMPIRICAL ANALYSIS OF PERMISSION-BASED SECURITY MODELS AND ITS APPLICATION TO ANDROID.
Detection of Financial Statement fraud and Feature Selection Using Data Mining Techniques 指導教授:徐立群 教授 學生:吳泰霖 R 顏伶安 R Jan 9, 2015.
Jay Stokes, Microsoft Research John Platt, Microsoft Research Joseph Kravis, Microsoft Network Security Michael Shilman, ChatterPop, Inc. ALADIN: Active.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Bayesian Networks. Male brain wiring Female brain wiring.
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
Towards Improving Classification of Real World Biomedical Articles Kostas Fragos TEI of Athens Christos Skourlas TEI of Athens
Appendix: The WEKA Data Mining Software
Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a large-format printer. Customizing the Content: The placeholders in this.
Android for Java Developers Denver Java Users Group Jan 11, Mike
AUTHORS: ASAF SHABTAI, URI KANONOV, YUVAL ELOVICI, CHANAN GLEZER, AND YAEL WEISS "ANDROMALY": A BEHAVIORAL MALWARE DETECTION FRAMEWORK FOR ANDROID.
Permission-based Malware Detection in Android Devices REU fellow: Nadeen Saleh 1, Faculty mentor: Dr. Wenjia Li 2 Affiliation: 1. Florida Atlantic University,
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
SPAM DETECTION AND FILTERING By Prasanna Kunchavaram.
Prediction of Influencers from Word Use Chan Shing Hei.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Machine Learning Documentation Initiative Workshop on the Modernisation of Statistical Production Topic iii) Innovation in technology and methods driving.
Web Mining ( 網路探勘 ) WM06 TLMXM1A Wed 8,9 (15:10-17:00) U705 Information Retrieval and Web Search ( 資訊檢索與網路搜尋 ) Min-Yuh Day 戴敏育 Assistant Professor.
Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa
Nuhi BESIMI, Adrian BESIMI, Visar SHEHU
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Speaker : Shau-Shiang Hung ( 洪紹祥 ) Adviser : Shu-Chen Cheng ( 鄭淑真 ) Date : 99/05/04 1 Qirui Zhang, Jinghua Tan, Huaying Zhou, Weiye Tao, Kejing He, "Machine.
يادگيري ماشين Machine Learning Lecturer: A. Rabiee
Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA
Web: ~ laoudias/pages/platform.htmlhttp://www2.ucy.ac.cy/ ~ laoudias/pages/platform.html
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
 Using Touchloggers To Build User Profiles Through Machine Learning Craig Dezangle.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
October 20-23rd, 2015 Automatically Combining Static Malware Detection Techniques ir. David De Lille 1.
Advancements in Analytics with Azure Machine Learning James Wang Technical Evangelist Microsoft Taiwan Slide modified from
Experience Report: System Log Analysis for Anomaly Detection
Learning to Detect and Classify Malicious Executables in the Wild by J
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Source: Procedia Computer Science(2015)70:
AirPlace Indoor Positioning Platform for Android Smartphones
PEBL: Web Page Classification without Negative Examples
The Assistive System Progress Report 2 Shifali Kumar Bishwo Gurung
iSRD Spam Review Detection with Imbalanced Data Distributions
Prasit Usaphapanus Krerk Piromsopa
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Somi Jacob and Christian Bach
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with Clinical Data
When Machine Learning Meets Security – Secure ML or Use ML to Secure sth.? ECE 693.
Machine Learning for Cyber
Presentation transcript:

Borja Sanz, Igor Santos, Carlos Laorden, Xabier Ugarte-Pedrero and Pablo Garcia Bringas The 9th Annual IEEE Consumer Communications and Networking Conference - Security and Content Protection, 2012 報告人 : 張文銓 On the Automatic Categorisation of Android Applications 1

OUT LINE 2 INTRODUCTION APPS FEATURE EXTRACTION FEATURES EXTRACTION METHOD MACHINE LEARNING CLASSIFIERS CONCLUSIONS

INTRODUCTION 3 For Apple devices the AppStore is the single official way to obtain applications, Android allows users to install applications that have been downloaded from markets or directly from Internet. Shabtai et al. trained machine learning models using as features the count of elements, attributes or namespaces of the parsed apk. They obtained 89% of accuracy classifying applications into only 2 categories: tools or games.

APPS FEATURE EXTRACTION 4 Classifying Android applications into several categories using the features extracted both from the Android Market and the application itself.

APPS FEATURE EXTRACTION 5 we have collected 820 applications, that have been classified in 7 categories.

APPS FEATURE EXTRACTION 6 Phase: 1.We describe the process of extracting features from the Android.apk files. 2. Show that it can achive high accuracy rates.

Extracting Features From Android.apk 7 Retrieve several features from the applications: 1.Strings contained in the application. 2. Use an open-source non-official API, called android-market-api extracted infomation from the Android Market: (1) rating, (2) number of ratings and (3) size of application. 3. Permissions of the applications.

Extracting Features From Android.apk 8 Permissions are stored in an XML file inside each application, named “AndroidManifest.xml”. This file declares the execution requirements of the application, such as the version of the operating system that requires or the libraries used.

FEATURES EXTRACTION METHOD 9 General steps we have followed for each application are: 1. We extract the permissions and the resources from the application. 2. We disassemble the sample. 3. We extract the strings from the disassembled sample. 4. We obtain data from the Android Market.

FEATURES EXTRACTION METHOD 10 To extract every string, we search the operational code “const-string”, that identifies the strings of the application. We process the strings using Term Frequency (TF). TF is a weight widely used in information retrieval and text mining

MACHINE LEARNING CLASSIFIERS 11 Machine-learning algorithms can commonly be divided into three different types depending on the training data: supervised learning ( 監督式學習 ): Bayesian Networks ( 貝氏網路 ) Decision Trees( 決策樹 ) K-Nearest Neighbour(KNN) Support Vector Machines (SVM) ( 支持向量機 ) unsupervised learning ( 無監督學習 ): 關聯規則分析 semi-supervised learning( 半監督式學習 )

MACHINE LEARNING CLASSIFIERS 12 Bayesian Networks: which are based on the Bayes Theorem. Algorithm: Tree Augmented Na¨ıve (TAN) [28] [28] D. Geiger, M. Goldszmidt, G. Provan, P. Langley, and P. Smyth, “Bayesian network classifiers,” in Machine Learning, 1997, pp. 131–163. Decision Trees Random Forest [19] [19] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp.5–32, K-Nearest Neighbour performed experiments for k = 1, k = 2 and k = 5 to train KNN. Support Vector Machines (SVM)

MACHINE LEARNING CLASSIFIERS 13 To evaluate each classifier’s capability, we measured the Area Under the ROC (Receiver Operator Characteristics) Curve (AUC), [31]. [31] Y. Singh, A. Kaur, and R. Malhotra, “Comparative analysis of regression and machine learning methods for predicting fault proneness models.” Best: Bayes TAN Second: Random Forest 0.9.

CONCLUSIONS 1.There are other features from the applications that could be used to improve the detection. 2. Despite these features are inefficient to avoid malware to be uploaded into market, these features can prevent installation of malware in the smartphone. 3.It will detection good apps and bad apps if the sample enough. 14