Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012) Outstanding paper award Generalized Vulnerability Extrapolation using Abstract Syntax Trees

Outline Introduction Vulnerability Extrapolation Evaluation Limitations 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 2

Introduction 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 3 The discovery of vulnerabilities in source code is a central issue of computer security. Many of these researches, however, are limited to specific conditions and types of vulnerabilities. The discovery of vulnerabilities in practice still mainly rests on tedious manual auditing that requires considerable time and expertise. Instead of striving for an automated solution, we aim at rendering manual auditing more effective by guiding the search for vulnerabilities.

Contributions 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 4 Generalized vulnerability extrapolation Structural comparison of code Evaluation and cases studies

Vulnerability Extrapolation 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 5 The concept of vulnerability extrapolation builds on the observation that source code often contains several vulnerabilities linked to the same flawed programming patterns. Given a known vulnerability, it is thus often possible to discover previously unknown vulnerabilities by finding functions sharing similar code structure. 2 advantages of this approach: It is a general approach that is not limited to any specific vulnerability type. The extrapolation does not hinge on any involved analysis machinery.

Schematic Overview 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 6

Robust AST Extraction 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 7 Our parser is based on a single grammar definition for the ANTLR parser generator [23] and publicly available. [link]link API node Syntax node

Embedding of ASTs in a Vector Space 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 8 We describe the AST of each functions in our code base using a set of subtrees S. We experiment with the following three definitions of the set: API nodes The set S simply consists of all individual API nodes. API subtrees The set S is defined as all subtrees of depth D in the code base that contain at least one API node. API/S subtrees The set S consists of all subtrees of depth D containing at least one API or syntax node. In the following we fix the depth of subtrees to D = 3.

Converting ASTs to Vectors 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 9 Function 1 Function 2 Function |X| M = 0*00*000*00*00... |S| |X| W s : TF-IDF weighting [link]link

Identification of Structural Patterns 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 10 However, we cannot yet compare functions with respect to more involved patterns. For example, the code base of a server application may contain functions related to network communication, message parsing and thread scheduling. It would be better to compare the functions with respect to these functionalities rather than looking at the plain subtrees of the ASTs. Latent semantic analysis is a classic technique of natural language processing (NLP) that is used for identifying topics in text documents. [link]link It determines dominant directions in the vector space. We refer to these directions of related subtrees as structural patterns.

Obtaining Directions 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 11 We obtain these d directions is by performing a singular value decomposition (SVD) of the matrix M. [link]link

Extrapolation of Vulnerabilities 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 12 Three activities can be performed to assist code auditing. Vulnerability extrapolation Finding structurally similar functions is thus as simple as comparing the rows of V using a suitable measure, such as the cosine distance [link].link Code base decomposition the matrix U storing the most prevalent structural patterns in its columns gives important insight into the structure of the code base. Detection of unusual functions

Evaluation 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 13 For the evaluation we consider 4 popular open-source projects. LibTIFF [link] is a library for processing images in the TIFF format.link 1,292 functions and 52,650 lines of code Version 3.8.1 of the library contains a stack-based buffer overflow in the parsing of TLV. (CVE-2006-3459 [link])link Candidate functions are all parsers for TLV elements.

Evaluation (cont.) 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 14 Pidgin [link] is a client for instant messaging implementing several communication protocols.link 11,505 functions and 272,866 lines of code. Version 2.10.0 of the client contains a vulnerability in the implementation of the AIM protocol (CVE-2011-4601 [link]).link Candidate functions are all AIM protocol handlers converting incoming binary messages to strings.

Evaluation (cont.) 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 15 FFmpeg [link] is a library for conversion of audio and video streams.link 6,941 functions with a total of 298,723 lines of code During the decoding of video frames in version 0.6, indices are incorrectly computed (CVE-2010-3429 [link]).link Candidate functions are all video decoding routines, which write decoded video frames to a pixel buffer.

Evaluation (cont.) 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 16 Asterisk [link] is a framework for Voice-over-IP communication.link 8,155 functions and 283,883 lines of code Version 1.6.1.0 of the framework contains a vulnerability (CVE-2011-2529 [link]), which allows a remote attacker to corrupt memory of the server via a crafted packet.link Candidate functions are all functions reading incoming packets from UDP/TCP sockets. We thoroughly inspect each code base and manually label all candidate functions, that is, all functions that potentially contain the same vulnerability. This manual analysis process required several weeks of work.

Quantitative Evaluation 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 17 The number of extracted structural patterns is not a critical parameter for vulnerability extrapolation. In the following case studies, we fix this parameter to 70.

Quantitative Evaluation (cont.) 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 18

Qualitative Evaluation (Case Study) 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 19 In a case study with FFmpeg and Pidgin, we now demonstrate the practical merit of vulnerability extrapolation and show how our method plays the key role in identifying 8 zero-day vulnerabilities. We have conducted two further studies with Pidgin and Asterisk uncovering 2 more zero- day vulnerabilities. For the sake of brevity however, we omit these case studies here.

Case Study: FFmpeg 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 20 CVE-2010-3429 3 further vulnerabilities 2 of which were zero-day * *

Case Study: FFmpeg 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 21

Case Study: Pidgin 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 22 CVE-2011-4601 9 further vulnerabilities Six of which were zero-day

Case Study: Pidgin 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 23

Limitations 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 24 Only identifying potentially vulnerable code Due to Rice’s theorem [link], however, a generic discovery of vulnerabilities is impossible anyway.link The existence of a starting vulnerability Complex flaws that span several functions across a code base can be difficult to detect for our method.

Q & A 2013/1/29A SEMINAR AT ADVANCED DEFENSE LAB 25

Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

Similar presentations

Presentation on theme: "Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)

Similar presentations

Presentation on theme: "Fabian Yamaguchi, University of Göttingen Markus Lottmann, Technische Universität Berlin Konrad Rieck, University of Göttingen 28 th ACSAC (December, 2012)"— Presentation transcript:

Similar presentations

About project

Feedback