Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Large Scale Exploratory Analysis of Software Vulnerability Life Cycles Muhammad Shahzad Dept. of Computer Science and Engineering Michigan State University.

Similar presentations


Presentation on theme: "A Large Scale Exploratory Analysis of Software Vulnerability Life Cycles Muhammad Shahzad Dept. of Computer Science and Engineering Michigan State University."— Presentation transcript:

1 A Large Scale Exploratory Analysis of Software Vulnerability Life Cycles Muhammad Shahzad Dept. of Computer Science and Engineering Michigan State University Joint work with Muhammad Zubair Shafiq and Alex X. Liu

2 2 ICSE 2012, Zürich Software Vulnerabilities  A Software vulnerability is a weakness in software that allows attackers to compromise the security of a system.  An exploit is a means of taking advantage of a software vulnerability to compromise the security of a system. ─ In form of a piece of software, or a sequence of commands.  A patch is a means of fixing the vulnerability so that exploit becomes ineffective.  Vulnerability lifecycle ICSE 2012, Zürich

3 3 Why Study Software Vulnerability Lifecycle  Software vendors are adversely affected by vulnerability announcements. ─ Lost money: vendors loses 0.63% in market value on disclosure date [Telang and Vattal 2007] ─ Lost reputation  Goal: to know how the software industry is doing w.r.t vulnerabilities

4 4 ICSE 2012, Zürich Data Set  Sources ─ National Vulnerability Database (NVD) ─ Open Source Vulnerability Database (OSVDB) ─ Vulnerability data by Frei et al (FVDB)  46310 vulnerabilities ─ 9667 vulnerabilities with patch dates ─ 15456 vulnerabilities with exploit dates  Software vendors ─ Over 11 thousand vendors and 17 thousand products

5 5 ICSE 2012, Zürich Vulnerability Information  Risk Score: low, medium, or high ─ Assigned by Common Vulnerability Scoring System (CVSS)  Access Vector: Local, Adjacent Network, Network ─ From which place hackers can launch attacks  Access Complexity: low, medium, or high ─ Complexity of the attack that exploits a vulnerability  Integrity Impact: none, partial, or complete ─ Impact of the attack that exploits a vulnerability  Disclosure date: when a vulnerability is disclosed  Exploit date: when an exploit is available  Patch date: when the patch is available  Text description of the vulnerability

6 6 ICSE 2012, Zürich Vulnerability Disclosure Rate

7 7 ICSE 2012, Zürich Access Vector

8 8 ICSE 2012, Zürich Access Complexity

9 9 ICSE 2012, Zürich Integrity Impact

10 Evolution of Different Types of Vulnerabilities

11 11 ICSE 2012, Zürich Vulnerability Clustering  Data set does not have vulnerability type.  The total number of vulnerability types is unknown.  Solution: use clustering algorithms to determine type and number of vulnerabilities. ─ Extracted relevant keywords from text description ─ Keywords used as features for clustering ─ Obtained 7 clusters ● EXE (Executables) ● DoS (Denial of Service) ● BO (Buffer Overflow) ● SQL injection ● XSS (Cross Site Scripting) ● PHP ● Misc

12 12 ICSE 2012, Zürich Vulnerability Evolution by Type

13 Evolution of Exploitation Behavior

14 14 ICSE 2012, Zürich t ed = Exploit Date - Disclosure Date  t ed < 0 ─ 2.8% vulnerabilities  t ed = 0 ─ 88.2% vulnerabilities  t ed > 0 ─ 9% vulnerabilities ─ Sub-ranges ● 0 < t ed ≤ 7: exploit released within a week after disclosure ● 7 < t ed ≤ 30: exploit released after a week but before a month ● t ed > 30: exploit released more than a month after disclosure

15 15 ICSE 2012, Zürich Evolution of Aggregate Exploitation Behavior

16 16 ICSE 2012, Zürich Evolution of Exploitation Behavior by Vendor

17 17 ICSE 2012, Zürich Evolution of Exploitation Behavior by Product

18 Evolution of Patching Behavior

19 19 ICSE 2012, Zürich t pd = Patch Date – Disclosure Date  t pd < 0 ─ 10.1% vulnerabilities ● Greater that the corresponding 2.8% of t ed < 0  t pd = 0 ─ 62.2% vulnerabilities ● Lesser compared to 88.2% of t ed = 0  t pd > 0 ─ 27.7% vulnerabilities ─ Sub-ranges ● 0 < t pd ≤ 7: patch released within a week after disclosure ● 7 < t pd ≤ 30: patch released after a week but before a month ● t pd > 30: patch released more than a month after disclosure

20 20 ICSE 2012, Zürich Evolution of Aggregate Patching Behavior

21 21 ICSE 2012, Zürich Evolution of Patching Behavior by Vendor

22 22 ICSE 2012, Zürich Evolution of Patching Behavior by Product

23 23 ICSE 2012, Zürich Conclusions  Number of vulnerabilities being disclosed each year has stopped increasing since 2006  Percentage of remotely exploitable vulnerabilities has gradually increased to over 80%  The access complexity of vulnerabilities has also been increasing  Closed source vendors are faster at patching the vulnerabilities  Since 2008, vendors have become very agile in patching the vulnerabilities  Still, average time for hackers to exploit a vulnerability is shorter than the time for vendors to patch.

24 24 ICSE 2012, Zürich Questions?

25 25 BACKUP SLIDES

26 26 ICSE 2012, Zürich Evolution of Exploitation Behavior by Type

27 27 ICSE 2012, Zürich Evolution of Patching Behavior by Type

28 28 ICSE 2012, Zürich Data Sources  http://nvd.nist.gov/ http://nvd.nist.gov/  www.osvdb.org/ www.osvdb.org/

29 29 ICSE 2012, Zürich Interesting Patterns Mined Using Association Rules  Attributes used for association rule mining ─ Vendor name, product name, vulnerability type, Risk, t ed, t pd  For Microsoft, majority of high risk vulnerabilities are exploited on the disclosure date ─ vnd=Microsft type=XSS risk=H → ted=0  For Sun’s Solaris, medium risk vulnerabilities are exploited within a week from disclosure ─ vnd=Sun Prod=Solaris risk=M → 0<t ed ≤7  For Mozilla, we saw interesting rules stating that hackers are very quick in exploiting vulnerabilities that have not been patched while very slow for the patched vulnerabilities ─ vnd=Mozilla Prod=Firefox typ=BO t pd =0 → t ed >30 ─ vnd=Mozilla Prod=Firefox typ=BO 7<t pd ≤30 → t ed =0

30 30 ICSE 2012, Zürich Interesting Patterns Mined Using Association Rules  Microsoft is quicker in patching vulnerabilities in Windows compared to its other products ─ vnd=Microsoft prod=Windows type=BO → t pd =0 ─ vnd=Microsoft prod=IE type=BO → t pd >30  In case of Mozilla, BO and EXE vulnerabilities are patched very quickly ─ vnd=Mozilla prod=SeaMonkey type=BO → t pd =0

31 31 ICSE 2012, Zürich Implications  Observations from this study have important implications in ─ Software Design ─ Code Development Practices ─ Customer assessment of vendors and products

32 32 ICSE 2012, Zürich Software Design  Analysis of access requirements, functionality, and risk level ─ can reveal inherent flaws in software design process ─ For example, If a particular software series has abundant BO vulnerabilities ● shows lack of sanity check in socket and read processes  DoS vulnerabilities ─ In Solaris 38.85% of all exploited vulnerabilities ─ In OS X only 11.7% of all exploited vulnerabilities ─ Solaris is more susceptible to DoS attacks ─ Solaris developers need to take additional steps to avoid DoS attacks

33 33 ICSE 2012, Zürich Code Development Practices  Analysis of life cycles of vulnerabilities can reveal insights into code development and testing practices ─ For example, we observed that percentage of vulnerabilities with t pd >0 for open source vendors are significantly greater than for closed source ─ Shows that open source software have less resources dedicated to security compared to closed source

34 34 ICSE 2012, Zürich Customer Assessment of Vendors and Products  This analysis can be used in product assessment, certification, and security recommendations to customers  For example, ─ Sun should be preferred if patch response of vendor is of prime importance ─ MAC OS X should be used if a customer infrastructure has less tolerance to DoS attacks ─ Solaris should be used if customer wants to be robust against BO attacks

35 35 ICSE 2012, Zürich Proposed Methodology  Preprocess the data ─ Extract relevant keywords from the text description ─ Represent each vulnerability in terms of the keywords  Data Mining ─ Cluster the vulnerabilities ─ Identify the types of vulnerabilities in each cluster  Post processing ─ Assign each vulnerability a type

36 36 ICSE 2012, Zürich Preprocessing  Attributes are required to cluster  Representative keywords in the text can act as attributes ─ Take all words in all text descriptions ─ Compare the words with everyday news articles ─ Remove the matching words ─ Manually go through the remaining words ─ Remove the words that are non technical ─ Leaves us with 608 keywords

37 37 ICSE 2012, Zürich Preprocessing  Each vulnerability is a data point ─ 608 binary attributes DenialServiceBuffer…Overflow CVE-xxxx- yyyy 001…1 100…1 010…0

38 38 ICSE 2012, Zürich Clustering: Scheme  Selection of clustering scheme ─ Same vulnerability type ─ Different vendors ─ E.g., Buffer Overflow vulnerabilities ● Can be subdivided into: Apple BO, Microsoft BO  Hierarchical more suitable compared to Partitional ─ Ward ● Less susceptible to noise ● Does not break large clusters ● Ensures that SSE is small

39 39 ICSE 2012, Zürich Clustering: Distance Measure  Desired: Jaccard ─ Not implemented in Weka, problems in Matlab  Used: Hamming ─ Not implemented in Weka, available in Matlab  Euclidean not used ─ Asymmetric data  Cosine not used ─ Values in many cases become very small but non zero ─ Matlab does not handle them and results in error

40 40 ICSE 2012, Zürich Clustering: Challenges  Hierarchical clustering uses proximity matrix ─ 46261 by 46261 ─ Requires about 15.9GB RAM in Matlab  Solution ─ Sampling ─ 10 files randomly generated ● 5% sampling rate  If dataset has valid clusters, each random file should generate same centroids

41 41

42 42 ICSE 2012, Zürich Clustering: Centroids  608 attributes ─ Value of each attribute: 0 or 1 ─ Data points lie at the edges of the 608 dimensional unit hypercube  Take each cluster at a time and find the centroid ─ Values of each of the 608 attributes lies in [0,1] ─ Value close to 1 means occurred in a large number of data points of the cluster and vice versa ─ Get the attributes which are greater than 0.8 ● appeared in the description of over 80% of vulnerabilities in the cluster ─ e.g., in one cluster ● Denial, Service –Represent DoS attacks  We get the centroids ─ Dominant keywords represent type cluster

43 43 ICSE 2012, Zürich Clustering: Number of clusters  No universal way of determining exact number of clusters  Visualize the dendrogram ─ Decide appropriate number of clusters

44 44 ICSE 2012, Zürich Hierarchical Clustering SQLMiscXSSEXE DoSMisc BOCEXE MiscPHPPHP EXEC- EX E LocalMisc A- EXEA- EXE EXE US-EXEUS-EXE BOA-BOA-BO CEXEBO SQL MiscPHPPHP DoS XSS

45 45 ICSE 2012, Zürich Clustering: Remaining Samples  This analysis was on 1 sample  Did the same analysis on remaining 9 samples  Centroids obtained from all 10 samples are shown next

46 46 ICSE 2012, Zürich Clustering: Intensity Plot of Proximity Matrix

47 47 ICSE 2012, Zürich Final Clustering  We have all 7 centroids ─ Assign each of 46261 points to nearest centroid ─ Sizes of each cluster after assigning points PHPSQLBOXSSEXEDoSMisc 8.32%11.2%10.2%12.3%7.25%14.2%36.6%

48 48 ICSE 2012, Zürich Post Processing  Evolution of different types of vulnerabilities  Evolution for different types in vendors  Evolution of exploitation behavior of hackers  Evolution of patching behavior of vendors


Download ppt "A Large Scale Exploratory Analysis of Software Vulnerability Life Cycles Muhammad Shahzad Dept. of Computer Science and Engineering Michigan State University."

Similar presentations


Ads by Google