Download presentation
Presentation is loading. Please wait.
Published byAshley Watson Modified over 8 years ago
1
Finding the Hidden Scenes Behind Android Applications Joey Allen Mentor: Xiangyu Niu CURENT REU Program: Final Presentation 7/16/2014
2
Previous Work Crawled Google Play Store Scraped Descriptions, Author, and Categories of Applications Applied LDA Model Descriptions Permissions Applied Author Topic Model Descriptions
3
APPIC Framework Figure 1. Flow Chart of APPIC Framework. 1.User Requests to Download App A. 2.Description, Category, and Permissions are filtered. 3.Category is assigned to C a. 1.Embedded Topic models auto-tag the description, S a, and permissions, p a. 2.C a, S a, and p a are compared. 1.If they all match, the app is considered safe.
4
LDA MODEL Latent Dirichlet Allocation (LDA) is a generative probabilistic model for collections of discrete data such as a text corpora [1]. The LDA Model creates topics that are distributions over words. The words in a document can then be compared to a set of topics, and a category can be chosen for a document. Figure 2. Graphical Representations of LDA Model [1].
5
Author Topic Model Author-topic model is a generative model for documents that extends LDA to include authorship information [2]. Authors are distributed over topics and topics are distributed over words. Figure 3. Graphical Model of Author-Topic Model [2].
6
Calculating Results User Reads Application Description Compare APPIC tags with Author’s Tags CI = Correct Inference II = Incorrect Inference CI = Correct Inference II = Incorrect Inference APPIC finds App in wrong category. (CI + 1) APPIC finds App in wrong category. (CI + 1) APPIC incorrectly categorizes application (II + 1) APPIC incorrectly categorizes application (II + 1) APPIC and author incorrectly categorize app. (II + 1) APPIC and author incorrectly categorize app. (II + 1) APPIC and author incorrectly categorize app. (II + 1) APPIC and author incorrectly categorize app. (II + 1)
7
LDA Results (Descriptions)
8
LDA Results (Permissions)
9
AT Results (Descriptions)
10
Comparison of Results Topic ModelResults LDA (3 Tags)83% LDA (2 Tags)64% Author-topic58% PLDA [3]88% [3] Topic ModelResults LDA (4 Tags)34% PDLA [3]77% [3]
11
Conclusion LDA performed better than AT at categorizing descriptions. More tags increase accuracy but decrease efficiency. AT model was not as accurate in categorizing applications. Useful for finding authors that create similar apps
12
Future Work Find a better method to calculate accuracy. Learn a different method to categorize permissions Dependencies between permissions and descriptions. Modify AT Model
13
D Document Author-Topic Model (Modified) β ϕ T Topic distribution over words w Word z Topic αθ A Distribution of permissions over topics x NdNd Permissions pdpd Uniform distribution of documents over permissions
14
References {slide #} [1] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” the Journal of machine Learning research, vol. 3, pp. 993–1022, 2003. [2] M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth, “The author-topic model for authors and documents,” in Proceedings of the 20th conference on Uncertainty in artificial intelligence, 2004, pp. 487–494. [3] Y. Yang, J. S. Sun, and M. W. Berry, “APPIC: Finding The Hidden Scene Behind Description Files for Android Apps.”
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.