Tao Xie University of Illinois at Urbana-Champaign 0

Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu

Mobile App Markets Apple App Store Google Play Microsoft Windows Phone

App Store beyond Mobile Apps!

What If Formal Specs Are Written?! 3 APP DEVELOPERS APP USERS App Functional Requirements App Security Requirements User Functional Requirements User Security Requirements informal: app description, etc. permission list, etc.

Informal App Functional Requirements: App Description 4 App Code App Code App Permissions App Permissions

App Security Requirements: Permission List 5

What If Formal Specs Are Written?! 6 APP DEVELOPERS APP USERS App Functional Requirements App Security Requirements User Functional Requirements User Security Requirements informal: app description, etc. permission list, etc.

Example Andriod App: Angry Birds 7

What If Formal Specs Are Written?! 8 APP DEVELOPERS APP USERS App Functional Requirements App Security Requirements User Functional Requirements User Security Requirements In reality, few of these requirements are (formally) specified!!  Hope?!: Bring human into the loop: user perception + judgment informal: app description, etc. permission list, etc.

Our Yin-Yang View on Mobile App Security 9 App Description App Code App Code App Permissions App Permissions User-Perceived Information App Security Behavior o Reason about user-perceived info, e.g., WHYPER ( ) o Push app security behavior across the boundary (  ) o Check consistency across the boundary (  ) o Reduce user judgment effort ( ) App UIs, App categories, App metadata, User forums, … [functional] [security]

o Apple (Market’s Responsibility) o Apple performs manual inspection o Google (User’s Responsibility) o Users approve permissions for security/privacy o Bouncer (static/dynamic malware analysis) o Windows Phone (Hybrid) o Permissions / manual inspection Assuring Market Security/Privacy 10

o Previous approaches look at permissions  code (runtime behaviors) o What does the users expect? o GPS Tracker: record and send location o Phone-Call Recorder: record audio during phone call Need More Than Program Analysis 11 App Description App Code App Code App Permissions App Permissions

o User expectations o user perception + user judgment o Focus on permission  app descriptions o permissions (protecting user understandable resources) should be discussed Vision “Bridging the gap between user expectation  app behaviors” 12 App Description Sentence Permission Linkage

WHYPER Overview 13 Pandita et al. WHYPER: Towards Automating Risk Assessment of Mobile Applications. USENIX Security 2013 http://web.engr.illinois.edu/~taoxie/publications/usenixsec13-whyper.pdf Enhance user experience while installing apps Enforce functionality disclosure on developers Complement program analysis to ensure justifications

Example Sentence in App Desc. 14 E.g., “Also you can share the yoga exercise to your friends via Email and SMS.” –Implication of using the contact permission –Permission sentences Keyword-based search on application descriptions

Problems with Ctrl + F Confounding effects: – Certain keywords such as “contact” have a confounding meaning – E.g., “... displays user contacts,...” vs “... contact me at abc@xyz.com”.abc@xyz.com Semantic inference: – Sentences often describe a sensitive operation without actually referring to keywords – E.g., “share yoga exercises with your friends via Email and SMS” 15

Natural Language Processing Natural Language Processing (NLP) techniques help computers understand NL artifacts In general, NLP is still difficult NLP on domain specific sentences with specific styles is feasible – Text2Policy: extraction of security policies from use cases [FSE 12] – APIInfer: inferring contracts from API docs [ICSE 12] – WHYPER: domain knowledge from API docs [USENIX Security 13]

Overview of WHYPER 17 APP Description APP Permission Semantic Graphs Preprocessor Intermediate Representation Generator Semantic Engine NLP Parser Semantic Graph Generator API Docs Annotated Description FOL Representation WHYPER Domain Knowledge

Preprocessor 18 Period Handling –Decimals, ellipsis, shorthand notations (Mr., Dr.) Sentence Boundaries –Tabs, bullet points, delimiters (:) –Symbols (*,-) and enumeration sentence Named Entity Handling –E.g., “Pandora internet radio” Abbreviation Handling –E.g., “Instant Message (IM)”

Intermediate-Representation Generator AlsoyoucanshareyogaexercisetoyourfriendsviaEmailandSMS VB RB PRPMDNNDTNNNNSPRPNNP the Also you can share exercise your friends Email SMS yoga advmod nsubj aux dobj det nn prep_to poss prep_via conj_and the 19 share to you yoga exercise owned you via friends and Email SMS Predicate Governing Entity Dependent Entity

Semantic Engine share to you yoga exercise owned you via friends and Email SMS Email share WordNet Similarity 20 Inferred from API Docs Governing Entity Dependent Entity

Systematic approach to infer graphs o Identify resource associated with the permissions from the API class name oContactsContract.Contacts o Inspect the member variables and member methods to identify actions and subordinate resources oContactsContract.CommonDataKinds.Email Semantic-Graph Generator 21

Evaluation 22 Subjects – Permissions: READ_CONTACTS READ_CALENDAR RECORD_AUDIO – 581 application descriptions – 9,953 sentences Evaluation setup – Manual annotation of the sentences – WHYPER for identifying permission sentences – Comparison to keyword-based searching

Evaluation Results Precision and recall of WHYPER – Average precision (82.8%) and recall (81.5%) Comparison to keyword-based searching – Improving precision (41.6%) and recall (-1.2%) – E.g., microphone-blow into and call-record 23 PermissionKeywords READ_CONTACTS contact, data, number, name, email READ_CALENDAR calendar, event, date, month, day, year RECORD_AUDIO record, audio, voice, capture, microphone

Access Control Policies (ACP) in Requirements Document Access control is often governed by security policies called Access Control Policies (ACP) –I–Includes rules to control which principals have access to which resources A policy rule includes four elements –S–Subject – HCP –A–Action – edit –R–Resource - patient's account –E–Effect - deny “The Health Care Personnel (HCP) does not have the ability to edit the patient's account.” ex.

Overview of Text2Policy A HCP should not change patient’s account. An [subject: HCP] should not [action: change] [resource: patient’s account]. ACP Rule Effect SubjectActionResource HCP UPDATE - change patient’s account deny Linguistic Analysis Model-Instance Construction Transformation Xiao et al. Automated Extraction of Security Policies from Natural-Language Software Documents. FSE 2012. http://web.engr.illinois.edu/~taoxie/publications/fse12-nlp.pdf

Example Technical Challenges in ACP Extraction Semantic Structure Variance – different ways to specify the same rule Negative Meaning Implicitness – verb could have negative meaning ACP 1: An HCP cannot change patient’s account. ACP2: An HCP is disallowed to change patient’s account.

Road Ahead: Yin-Yang View 27 App Description App Code App Code App Permissions App Permissions User-Perceived Information App Security Behavior o Reason about user-perceived info, e.g., WHYPER ( ) o Push app security behavior across the boundary (  ) o Check consistency across the boundary (  ) o Reduce user judgment effort ( ) App UIs, App categories, App metadata, User forums, … [functional] [security]

Text Analytics for Mobile App Security and Beyond 28 App Description App Code App Code App Permissions App Permissions taoxie@illinois.edu App UIs, App categories, App metadata, User forums, … Acknowledgments: Supported in part by NSA Science of Security (SoS) Lablet, NSF SaTC, NSF SHF, NSF CAREER

Problems with Ctrl + F o Confounding effects: o Certain keywords such as “contact” have a confounding meaning. o For instance, “... displays user contacts,...” vs “... contact me at abc@xyz.com”.abc@xyz.com o Semantic Inference: o Sentences often describe a sensitive operation such as reading contacts without actually referring to keyword “contact”. o For instance, “share yoga exercises with your friends via email, sms”. 30

NLP techniques help computers understand NL artifacts NLP is still difficult NLP on domain specific sentences with specific styles is feasible Natural Language Processing (NLP) 31

RQ1 Results: Effectiveness of WHYPER Low FPs and FNs out of 9,061 sentences, only 129 are flagged as FPs among 581 applications, 109 applications (18.8%) contain at least one FP among 581 applications, 86 applications (14.8%) contain at least one FN PermissionSISI TPFPFNTNPrec.RecallF-ScoreAcc READ_CONTACTS20418618492,93091.279.284.897.9 READ_CALENDAR28824147422,42283.785.284.596.8 RECORD_AUDIO25919564503,47075.379.677.497.0 TOTAL7516221291419,06182.881.582.297.3 32

Incorrect parsing “MyLink Advanced provides full synchronization of all Microsoft Outlook emails (inbox, sent, outbox and drafts), contacts, calendar, tasks and notes with all Android phones via USB” Synonym analysis “You can now turn recordings into ringtones.” Result Analysis (False Positives) 33

Incorrect parsing Incorrect identification of sentence boundaries and limitations of underlying NLP infrastructure Limitations of Semantic Graphs Manual Augmentation microphone-blow into and call-record significant improvement of Delta Recalls: -6.6% to 0.6% Automatic mining from user comments and forums Result Analysis (False Negatives) 34

Overview of Text2Policy A HCP should not change patient’s account. An [subject: HCP] should not [action: change] [resource: patient’s account]. ACP Rule Effect SubjectActionResource HCP UPDATE - change patient’s account deny Linguistic Analysis Model-Instance Construction Transformation

Linguistic Analysis Incorporate syntactic and semantic analysis – syntactic structure -> noun group, verb group, etc. – semantic meaning -> subject, action, resource, negative meaning, etc. Provide New techniques for model extraction – Identify ACP and AS sentences – Infer semantic meaning

Common Techniques Shallow parsing Domain dictionary Anaphora resolution An HCP can view patient’s account. He is disallowed to change the patient’s account. SubjectMain Verb GroupObject NPPNP UPDATE HCP VG

Technical Challenges (TC) in ACP Extraction TC1: Semantic Structure Variance – different ways to specify the same rule TC2: Negative Meaning Implicitness – verb could have negative meaning ACP 1: An HCP cannot change patient’s account. ACP2: An HCP is disallowed to change patient’s account.

Semantic-Pattern Matching Address TC1 Semantic Structure Variance Compose pattern based on grammatical function An HCP is disallowed to change the patient’s account. ex. passive voiceto-infinitive phrase followed by

Negative-Expression Identification Address TC2 Negative Meaning Implicitness Negative expression – “not” in subject: – “not” in verb group: Negative meaning words in main verb group No HCP can edit patient’s account. ex. HCP can not edit patient’s account. HCP can never edit patient’s account. ex. An HCP is disallowed to change the patient’s account.

AS: Syntactic-Pattern Matching Syntactic elements – Subject, Main verb, Object Subject and Object Checking – subject is a not a user or object is not a resource Filtering negative-meaning sentences – Negative sentences tend not to describe ASs The prescription list should include medication, the name of the doctor... ex.

Overview of Text2Policy A HCP should not change patient’s account. An [subject: HCP] should not [action: change] [resource: patient’s account]. ACP Rule Effect SubjectActionResource HCP UPDATE - change patient’s account deny Linguistic Analysis Model-Instance Construction Transformation

ACP Model-Instance Construction Identify subject, action, and resource: – Subject: HCP – Action: change – Resource: patient’s account Infer effect: – Negative Expression: none – Negative Verb: disallow – Inferred Effect: deny An HCP is disallowed to change the patient’s account. ex. ACP Rule Effect SubjectActionResource HCP UPDATE - change patient’s account deny

AS Model-Instance Construction Use case patterns – industry use cases [DSN’09] – public use cases Model-Instance Construction The patient views access log. ex. Action Step ActorActionResource patient OUTPUT – view access log

Technical Challenges in Action-Step Extraction TC4: Transitive Subject TC5: Perspective Variance AS 1:He edits the account. AS 2: The system updates the account. AS 3: The system displays the updated account. HCP HCP views the updated account.

Subject Flow Tracking Address TC4 Transitive Subject Apply data flow to track non-system subject: AS 1: The HCP edits the account. AS 2: The system updates the account. Tracking Only system as subject replaced with HCP as subject

Perspective Conversion Address TC5 Perspective Variance Apply data flow to track non-system subject: AS 1: The HCP edits the account. AS 2: The system shows the updated account. Tracking Only system as subject and action is output Converting to “HCP views the updated account”

Evaluation – RQs RQ1: How effectively does Text2Policy identify ACP sentences in NL documents? RQ2: How effectively does Text2Policy extract ACP rules from ACP sentences? RQ3: How effectively does Text2Policy extract action steps from action-step sentences?

Evaluation – Subject iTrust open source project – http://agile.csc.ncsu.edu/iTrust/wiki/ – 448 use-case sentences (37 use cases) – preprocessed use cases Collected ACP sentences – 100 ACP sentences – From 17 sources (published papers and websites) A module of an IBMApp (financial domain) – 25 use cases

RQ1 ACP Sentence Identification Apply Text2Policy to identify ACP sentences in iTrust use cases and IBMApp use cases Text2Policy effectively identifies ACP sentences with precision and recall more than 88% Precision on IBMApp use cases is better – proprietary use cases are often of higher quality compared to open-source use cases

Evaluation – RQ2 Accuracy of Policy Extraction Apply Text2Policy to extract ACP rules from ACP sentences Text2Policy effectively extracts ACP model instances with accuracy above 86%

Evaluation – RQ3 Accuracy of Action-Step Extraction Apply Text2Policy to extract action steps from iTrust and IBMApp use cases Text2Policy effectively extracts AS model instances with accuracy above 81% Limitations: – Subordinate conjunction or else and long phrases

Detected Inconsistencies No violation between ASs against the extracted ACPs Inconsistent names used for referring to the same entity (e.g., user) across different use cases editor used in UC 4 of iTrust use cases actually refers to HCP, admin, and all users in UCs 1, 2, and 4 ex.

Summary Natural Language Processing (NLP) for domain- specific purposes is feasible – Challenging for general documents – Feasible for domain-specific sentences with specific styles New techniques are required – Addressing unique challenges in software engineering http://research.csc.ncsu.edu/ase/projects/text2policy/

Tao Xie University of Illinois at Urbana-Champaign 0

Similar presentations

Presentation on theme: "Tao Xie University of Illinois at Urbana-Champaign 0"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tao Xie University of Illinois at Urbana-Champaign 0

Similar presentations

Presentation on theme: "Tao Xie University of Illinois at Urbana-Champaign 0"— Presentation transcript:

Similar presentations

About project

Feedback