Dynodroid: An Input Generation System for Android Apps

Slides:



Advertisements
Similar presentations
Automating Software Module Testing for FAA Certification Usha Santhanam The Boeing Company.
Advertisements

Cross Platform UI testing using Sikuli
Test Automation on Mobile environnents Eder Figueroa 4/29/203.
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
Dynodroid: Dynamic Analysis of Smartphone Apps
SOFTWARE MAINTENANCE 24 March 2013 William W. McMillan.
1 Automated Testing & Test Tools Apirada Thadadech.
IPOG: A General Strategy for T-Way Software Testing
XProtect® Expert 2013 Product presentation
Company confidential Prepared by HERE Transit Sr. Product Manager, HERE Transit Product Overview David Volpe.
Ch 11 Cognitive Walkthroughs and Heuristic Evaluation Yonglei Tao School of Computing and Info Systems GVSU.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
SE 450 Software Processes & Product Metrics Reliability: An Introduction.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
Testing Components in the Context of a System CMSC 737 Fall 2006 Sharath Srinivas.
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
Swami NatarajanJuly 14, 2015 RIT Software Engineering Reliability: Introduction.
Aiding intelligent next-gen systems with mobile applications Dr. Jeyakesavan Veerasamy University of Texas at Dallas Note: Almost all.
Software Testing and QA Theory and Practice (Chapter 4: Control Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Automated Diagnosis of Software Configuration Errors
By Ms.A.C.Sumathi AP(SG)/ Dept of CSE SNS College of Engineering, CBE.
1 Joe Meehean. 2 Testing is the process of executing a program with the intent of finding errors. -Glenford Myers.
1 Functional Testing Motivation Example Basic Methods Timing: 30 minutes.
Apps VS Mobile Websites Which is better?. Bizness Apps Survey Bizness Apps surveyed over 500 small business owners with both a mobile app and a mobile.
Efficient Privilege De-Escalation for Ad Libraries in Mobile Apps Bin Liu (SRA), Bin Liu (CMU), Hongxia Jin (SRA), Ramesh Govindan (USC)
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
JS Arrays, Functions, Events Week 5 INFM 603. Agenda Arrays Functions Event-Driven Programming.
MGS Testing A High Level Overview of Testing in Microsoft Games Studio Joe Djorgee – Test Lead.
Dynodroid: An Input Generation System for Android Apps
ConfidentialPA Testing Mobile Applications A Model for Mobile Testing.
Presented by: Kushal Mehta University of Central Florida Michael Spreitzenbarth, Felix Freiling Friedrich-Alexander- University Erlangen, Germany michael.spreitzenbart,
Mihir Daptardar Software Engineering 577b Center for Systems and Software Engineering (CSSE) Viterbi School of Engineering 1.
Key Challenges for Modeling Language Creation by Demonstration Hyun Cho, Jeff Gray Department of Computer Science University of Alabama Jules White Bradley.
Christine Laham, Fahed Abdu, David Dezano,Shelly Kim.

Event Management & ITIL V3
Dr. Tom WayCSC Testing and Test-Driven Development CSC 4700 Software Engineering Based on Sommerville slides.
Black-box Testing.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
Automatically Repairing Broken Workflows for Evolving GUI Applications Sai Zhang University of Washington Joint work with: Hao Lü, Michael D. Ernst.
Software Testing Reference: Software Engineering, Ian Sommerville, 6 th edition, Chapter 20.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
TEST-1 6. Testing & Refactoring. TEST-2 How we create classes? We think about what a class must do We focus on its implementation We write fields We write.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
interactive logbook Paul Kiddie, Mike Sharples et al. The Development of an Application to Enhance.
Chapter 8 Lecture 1 Software Testing. Program testing Testing is intended to show that a program does what it is intended to do and to discover program.
Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.
Android absolutely dominated the number of smartphones shipped worldwide in the first three months of 2015, with.
Future & Emerging Technology for Multimedia Wilky Chan ( ) University of Ulster BSc Interactive Multimedia Design Final Research Report.
Mohit Anand, Software Engineer Adobe 1 Selecting GUI Automation Testing Tool for Mobile Domain.
Testing in Android. Methods Unit Testing Integration Testing System Testing Regression Testing Compatibility Testing Black Box (Functional) White Box.
Using Symbolic PathFinder at NASA Corina Pãsãreanu Carnegie Mellon/NASA Ames.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
T Project Review WellIT I2 Iteration
Software Quality Assurance and Testing Fazal Rehman Shamil.
PROGRAMMING TESTING B MODULE 2: SOFTWARE SYSTEMS 22 NOVEMBER 2013.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
T EST T OOLS U NIT VI This unit contains the overview of the test tools. Also prerequisites for applying these tools, tools selection and implementation.
Software Testing Reference: Software Engineering, Ian Sommerville, 6 th edition, Chapter 20.
CS223: Software Engineering Lecture 25: Software Testing.
How to Use an Android Tablet Well Come To You few Steps For How to Use an Android Tablet?
Testability.
Computing.
Mobile Application Test Case Automation
On the road: Test automation in practice for a BMW map update service
The Most Popular Android UI Automation Testing Tool Andrii Voitenko
Area Coverage Problem Optimization by (local) Search
Presentation transcript:

Dynodroid: An Input Generation System for Android Apps Aravind Machiry Rohan Tahiliani Mayur Naik Georgia Institute of Technology

The Growth of Smartphones and Tablets 1 million new Android devices activated every day 750 million total (March 2013)

The Growth of Mobile Apps 30K new apps on Google Play per month 1 million total (July 2013)

The Growth of Mobile Apps 1.5 billion downloads from Google Play per month 50 billion total (July 2013)

The Life of a Mobile App Reliability Security Performance Development and testing Pre-deployment Certification Post-deployment Adaptation New software engineering problems in all stages ) need new program analysis-based tools

Program Analysis for Mobile Apps Static Analysis Program analysis using program text Hindered by features common in mobile apps Large SDK, obfuscated and native code, concurrency, IPC, databases, GUIs, … Dynamic Analysis Program analysis using program runs Needs test inputs yielding high app coverage Focus of our work

Desiderata for Input Generation System Robust: handles real-world apps Black-box: does not need sources or ability to decompile binaries Versatile: exercises important app functionality Automated: reduces manual effort Efficient: avoids generating redundant inputs

Our Contributions Design of a system Dynodroid satisfying the five desired criteria Open-source implementation of Dynodroid on the dominant Android platform Evaluation of Dynodroid on real-world apps against state-of-the-art approaches

Our Approach View an app is an event-driven program Broadly two kinds of events: UI event: LongTap(245, 310), Drag(0, 0, 245, 310), … System event: BatteryLow, SmsReceived(“hello”), ... Assumption: Fixed concrete data in each event and environment (sdcard, network, etc.) May cause loss of coverage e1 e2 e3 s0 s1 s2 s3 . . .

Relevant Events Key challenge: Large number of possible events E.g., 108 system events in Android Gingerbread Insight #1: In any state, few events are relevant ) vast majority of events are no-ops Insight #2: Can identify relevant events by lightly instrumenting SDK once and for all ) Does not require instrumenting app

Observe-Select-Execute Algorithm ¢ ¢ ¢ s0 executor s1 executor s2 e1 observer e2 observer e3 E1 selector E2 selector Statelessness does not cause any coverage loss in principle provided: observer treats “restart app” event always relevant selector is fair

Event Selection Algorithms Frequency Selects event that has been selected least often Drawback: deterministic => unfair UniformRandom Selects event uniformly at random Drawback: does not consider domain knowledge; no distinction of UI vs. system events, contexts in which event occurs, frequent vs. rare events BiasedRandom Combines benefits of above without drawbacks

BiasedRandom Event Selection Algorithm Global map G(e, S) tracks number of times e is selected in context S Context = set of events relevant when e is selected Local map L(e) computed to select next event from relevant set S Initialize: L(e) to 0 for each e in S Repeat: Pick an e in S uniformly at random If L(e) = G(e, S) increment G(e, S) and return e else increment L(e) Hallmark: No starvation

Implementation of Dynodroid Implemented for Android 2.3.4 (Gingerbread) Covers 50% of all Android devices (March 2013) Modified ~ 50 lines of the SDK ) Easy to port to other Android versions Heavily used off-the-shelf tools HierarchyViewer to observe UI events MonkeyRunner to execute UI events ActivityManager (am) to execute system events Emma to measure source code coverage Comprises 16 KLOC of Java Open-source: http://dyno-droid.googlecode.com

Demo: Dynodroid on Photostream App

Evaluation Study 1: App Code Coverage 50 open-source apps from F-Droid SLOC ranging from 16 to 22K, mean of 2.7K Evaluated Approaches: Dynodroid (various configurations) Monkey fuzz testing tool Expert human users Ten graduate students at Georgia Tech All familiar with Android development

Testing Approaches Used in Our Evaluation #Events #Runs Dynodroid - Frequency 2,000 1 Dynodroid - UniformRandom 3 Dynodroid - BiasedRandom Monkey 10,000 Humans No limit >= 2

Dynodroid achieves higher coverage than Monkey for 30 of the 50 apps. Dynodroid vs. Monkey 47% 8% 6% Both Dynodroid and Monkey cover 4-81% of code per app, for an average of 47%. Dynodroid exclusively covers 0-46% of code, for an average of 8%, which is attributed to system events that only Dynodroid can trigger. Monkey exclusively covers 0-61% of code, for an average of 6%, which is attributed to the richer set of UI events that Monkey can trigger (see Table 2 vs. Table 7). Another reason is that Dynodroid only generates straight (Drag) gestures but Monkey combines short sequences of such gestures to generate more complex (e.g., circular) gestures. Finally, Dynodroid uses fixed parameter values for UI events whereas Monkey uses random values, giving it superior ability to exercise custom widgets. In terms of the total code covered for each app, however, Dynodroid outperforms Monkey, achieving higher coverage for 30 of the 50 apps. Dynodroid achieves higher coverage than Monkey for 30 of the 50 apps.

Dynodroid vs. Humans 51% 4% 7% Both Dynodroid and Human cover 4-91% of code per app, for an average of 51%. Dynodroid exclusively covers 0-26% of code, for an average of 4%, and Human exclusively covers 0-43% of code, for an average of 7%. In terms of the total code covered for each app, Human easily outperforms Dynodroid, achieving higher coverage for 34 of the 50 apps. This is not surprising, given that the users in our study were expert Android users, could provide intelligent text inputs and event sequences, and most importantly, could inspect Emma’s coverage reports and attempt to trigger events to cover any missed code. But all the ten users in our study also reported tedious- ness during testing, how easy it was to miss combinations of events, and that it was especially mundane to click various options in the settings of apps one by one. Dynodroid could be used to automate most of the testing effort of Human, as measured by what we call the automation degree, measured as the ratio of coverage achieved by the intersection of Dynodroid and Human, to the total coverage achieved by Human. This ratio varies from 8% to 100% across our 50 apps, with mean 83% and standard deviation 21%. These observations justify Dynodroid’s vision of synergistically combine human and machine. It already provides support for intelligent text inputs, where a user with knowledge of an app can specify the text that it should use (instead of random text) in the specific text box widget prior to execution, or can pause its event generation when it reaches the screen, key in the input, and let it resume (as described in Section 3). Automation Degree = C(Dynodroid Å Human) /C(Human) Range = 8-100%, Average = 83%, S.D. = 21%

Sample Feedback from Participants “Tried to cancel download to raise exception.” “Human cannot trigger change to AudioFocus.” “Many, many options and lots of clicking but no actions really involved human intelligence.” “There are too many combinations of state changes (play -> pause, etc.) for a human to track.”

Dynodroid without vs. with System Events 47% 2% 8.3% https://docs.google.com/spreadsheet/ccc?key=0AoXwT5D6qkNmdERUaGUzOVQwbThIcHFkUE1DeGNzRGc#gid=0

Dynodroid without System Events vs. Monkey 43% 6% 10% https://docs.google.com/spreadsheet/ccc?key=0AoXwT5D6qkNmdERUaGUzOVQwbThIcHFkUE1DeGNzRGc#gid=0

Minimum Number of Events to Peak Coverage Monkey requires 20X more events than BiasedRandom Frequency and UniformRandom require 2X more events than BiasedRandom

Evaluation Study 2: Bugs Found in Apps 1,000 most popular free apps from Google Play Conservative notion of bug: FATAL EXCEPTION (app forcibly terminated) Popularity metric used is a score given to each app by Google that depends on various factors like number of downloads and ratings. These apps are uniformly distributed over all 31 app categories in Google Play.

Bugs Found in 50 F-Droid Apps App Name Bugs Kind Description PasswordMakerProForAndroid 1 Null Improper handling of user data. com.morphoss.acal Dereferencing null returned by an online service. hu.vsza.adsdroid 2 cri.sanity com.zoffcc.applications.aagtl org.beide.bomber Array Game indexes an array with improper index. com.addi

Bugs Found in 1,000 Google Play Apps App Name Bugs Kind Description com.ibm.events.android.usopen 1 Null Null pointer check missed in onCreate() of an activity. com.nullsoft.winamp 2 Improper handling of RSS feeds read from online service. com.almalence.night com.avast.android.mobilesecurity Receiver callback fails to check for null in optional data. com.aviary.android.feather

Limitations Does not exercise inter-app communication Communication via key-value maps (“Bundle” objects) Could synthesize such maps symbolically Uses fixed, concrete data for events E.g., geo-location, touch-screen coordinates, etc. Could randomize or symbolically infer such data Requires instrumenting the platform SDK ) Limited to particular SDK version But lightweight enough to implement for other versions

Related Work Model-based Testing Fuzz Testing Symbolic Execution GUITAR [ASE’12], EXSYST [ICSE’12], … Fuzz Testing Monkey, … Symbolic Execution Acteve [FSE’12], Symdroid, …

Conclusion Proposed a practical system for generating relevant inputs to mobile apps Satisfying the five desirable criteria we identified: robust, black-box, versatile, automated, efficient Showed its effectiveness on real-world apps Significantly automates tasks that users consider tedious Yields significantly more concise inputs than fuzz testing Exposed handful of crashing bugs

Thank You! http://pag.gatech.edu/dynodroid