Fortify Performance Guide

Fortify Performance Guide
HP ESP User Group – Spring 2015 Simon Corlett HP Fortify Technical Account Manager

WebInspect Performance

Automated Black Box Testing
The WebInspect Agent Automated Black Box Testing Limitations: Same code paths tested multiple times Request and response data is not precise for developers False positive or false negative rate Hybrid correlation between static and dynamic scans impossible or very difficult Actual application behavior goes unobserved Application running on a web server WebInspect Interactive Application Security Testing (IAST) Benefits: Find More Find Faster Fix Faster It is estimated that currently 60% of large organizations use some form of automated dynamic testing approach. Many of you here likely use WebInspect for this very purpose. While this testing is an important part of your application security process, it’s important to be aware that it does have limitations. Even with the complex algorithms we employ in WebInspect today the same path of code may be tested multiple times depending on the entry point. While request and response data might be a smoking gun for security Analysts, its not exactly developer friendly. A developer must take that data and translate it into what code they believe was responsible for the behavior. Not exactly ideal. In order to avoid missed results (false negatives) a scanner might flag several items for review by the auditor, leading to an excess of false positives. A security engineer or a developer must then wade through each of these to figure which are legitimate and need to be fixed. Additionally, the scanners have no visibility into what the application is actually doing under the covers. It can easily miss parts of the application simply because it’s not aware that it exists. This is where Interactive Application Security Testing comes in. This involves a tool that sits on the web server and integrates with the runtime of the application – at Fortify we call this tool the WebInspect agent. The agent allows us to see what is going on at a deep level, down to the code being executed which can pay huge dividends in the outcome of a scan. It allows us to Find More, Find issues faster and also fix issues faster. WebInspect Agent Application running on a web server with WIA WebInspect

Find More 2 way communication directly with WebInspect Client to suggest attacks Find categories of vulnerabilities that would otherwise be hidden Support for both Java and .Net applications Index about Account Details Deposit Withdraw Message Center Send Message Read Message Admin Backup So how do we find more vulnerabilities? The agent uses a 2 way communication channel with the WebInspect client to transfer back additional information about the success of an attack. This communication channel uses port 80 or 443 so that firewall rules do not need to be adjusted to allow the communication. As we’re on the inside, it also gives us a means to identify pages or sections of the application which may not normally be picked up during a WI crawl of the site. Certain vulnerability categories have escaped the realm of dynamic scanning since the beginning, the Agent allows us to now find those vulnerabilities and return information to the developers on how to fix them. Among the list are OAUTH vulnerabilities, certain session type vulnerabilities, unused parameters and privacy violations regarding writing cred or SS numbers to logs or database unencrypted. The WebInspect agent currently has support for both .Net and Java in Passive mode and currently only support for Java in active mode. WebInspect Agent

Find Faster 2 modes of the Agent to tailor the scan
Active mode uses deduplication and check avoidance Up to 35% faster Agent Passive Mode Agent Active Mode WIA’s new Active mode uses a host of techniques to improve on scan time. A few of the more interesting and less complex to explain is deduplication. With the agent positioned inside the runtime of the application it is capable of seeing what is happening at a deep level, it can tell when the same class or function will be executed by an attack even if they come from vastly different areas of the application. If the scanner has already thoroughly tested that call then the agent can alert the scanner to skip additional attacks and move on. In this way it reduces the number of attacks sent and the number of responses that must be analyzed. Another way the agent reduces the time a scan takes is check avoidance. If the scanner sends several attacks of a specific check type and the agent can determine that the application has been built to properly handle these types of attacks then the agent will tell the scanner to avoid sending attacks for that check type for the duration of the scan. In this way the scanner also reduces the number of attacks it must send, again reducing the overall scan time.

Deduplication Same Code Don’t Evaluate Forum Admin Area Forum Page 2 Forum Page 1 Use deep runtime level information to identify code paths that are similar and avoid running checks against each one

Fix Faster Stack traces available for certain checks
Less false positives mean less noise Finally, the WIA allows you to fix issues faster. As WebInspect detects certain vulnerabilities in the application, the agent is able to provide a stack trace showing exactly which lines of code were executed resulting in that issue. This saves auditors and developers an awful lot of time when it comes to remediation. The fact we’re also able to use the agent to verify whether attacks are successful means that your WI results will contain significantly less false positives. Making it easier to focus on remediating the genuine issues.

Any questions?

SCA Performance

The 3 stages of SCA Analysis
Clean sourceanalyzer –b BuildID -clean Translate sourceanalyzer –b BuildID ... Scan sourceanalyzer –b BuildID -scan -f results.fpr Before we get going it’s important to understand just how an SCA scan works. Broken down into 3 stages… Clean Removes any previous translations from the specified BuildID sourceanalyzer -clean – removes all previous translations Translate Creates .nst (normalised syntax tree) files for the code These are stored in “C:\Users\user\AppData\Local\Fortify\sca5.16\build\BuildID” or “.fortify/sca5.16/build/BuildID” Scan Scans all .nst’s on that build ID Produces .fpr

Factors affecting SCA Performance
Code type Size of the codebase Ancillary languages used (i.e. JSP, JavaScript, HTML…) Number of vulnerabilities Type of vulnerabilities (i.e. what analyzer is used) Complexity of the codebase As source code varies so widely, accurate predictions of how much memory and how long a static scan will take are all but impossible. There’s a whole host of factors which affect SCA performance, including: <list> The most difficult of these to measure is code-complexity. Even after a great deal of research on the subject it’s been suggested the problem is unsolvable in the general case. As such the best we can do is offer general guidelines based on anecdotal evidence and our real world experiences.

Rough guide for 3.x Size (LOC) <100k 100k to 500K 500K-1M 1M+ Java 32-bit machine 2GB RAM 4GB RAM 64-bit machine 8GB RAM 16GB RAM .NET C/C++ Note: If app > 20% JavaScript, please use next highest recommendation. So our experience so far has allowed us to draw up this table for users on the Fortify v3 releases. As you can see, for any sizeable app we expect you to be using 64 bit hardware. As SCA is a single threaded process in the v3 releases these scans are going to take a significant time. It’s also worth noting that JavaScript is notoriously memory intensive to translate and scan. As such, for any application with large amounts of JavaScript please use the next highest recommendation.

Rough Guide for 4.x Application Complexity CPU Cores RAM Average scan time Notes Simple 2 cores 4GB 0.5 hours A system that runs on a server or desktop in a stand-alone manner like a batch job or a command line utility. Medium 4 cores 16GB 4 hours A standalone system, which works with complex computer models like a tax calculation system or a scheduling system. Complex 8 cores 64GB 2 days A three tiered business system with transactional data processing like a financial system or a commercial website. Very Complex 16 cores 256GB 4 days A system that serves up content like an application server, database server, or content management system. For our v4 releases we’ve tried to improve on these recommendations and based them on the complexity of the application being scanned. As SCA v4.0 introduced parallel scanning you can now also make full use of all cores on your machines to improve the scan time. From my work in the Support team, the majority of applications I see being scanned tend to fall within the Medium category shown here. It’s also worth noting that SCA currently can’t make effective use of more than 16 cores. Assigning more to a parallel scan will likely lead to the scan slowing down as opposed to speeding up. Note: At present SCA is unable to make effective use of more than 16 CPU cores.

Tips on Improving Performance

Mobile Build Sessions Machine T Machine S
sourceanalyzer -b BuildID <translation commands> sourceanalyzer -b BuildID -export-build-session build-session.mbs Machine T Transfer build-session.mbs from “Machine T” to “Machine S” sourceanalyzer -import-build-session build-session.mbs sourceanalyzer -b BuildID -scan -f results.fpr The best way to boost performance is by providing as many resources as possible to the scan. During an SCA scan the actual Scan stage is much more memory intensive than the Translation. This is because it’s at the Scan stage that SCA is essentially creating a complete model of the application and tracing all possible flows of data through that. As you can imagine even a small application could have significantly complex dataflows. Now, while the Translation stage requires all dependencies to be present, the Scan stage is platform independent. This means that it’s often easiest to perform the Translation on a developer’s build machine and then perform the Scan on a dedicated machine – ideally with much greater resources. This is done by creating a Mobile Build Session using the -make-mobile command, transferring the .mbs file to the dedicated Scan machine where it’s imported and the Scan kicked off. This process is similar to how Cloudscan works… Machine S

CloudScan CloudSca n Client CloudSca n Client CloudSca n Client
sourceanalyzer -b BuildID <translation commands> cloudscan –url start -b BuildID -scan <tuning options> CloudSca n Client CloudSca n Client CloudSca n Client MBS MBS MBS CloudSca n Controller SSC Server CloudScan automates the mobile build process. It allows you to perform the translation on the local developer machine and then submit this to the cloudscan setup. Once submitted the CloudScan controller will pass the scan to an available scanning box (you can have as many of these dedicated scanning boxes as you wish). Once complete the results can either be pushed to SSC Server or passed back down to the client machine from which the scan was submitted. This saves you manually having to move the build session and kick off the scan manually on the dedicated box. CloudScan should not be confused with Fortify on Demand which is our Software as a Service offering. CloudSca n Worker CloudSca n Worker CloudSca n Worker

Memory Tuning Symptoms Resolution Java Heap Exhaustion Increase -Xmx
There is not enough memory available to complete analysis. For details on making more memory available, please consult the user manual. java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: GC overhead limit exceeded Increase -Xmx Java Permanent Generation Exhaustion java.lang.OutOfMemoryError: PermGen space Increase ‑XX:MaxPermSize Native Heap Exhaustion # A fatal error has been detected by the Java Runtime Environment: # # java.lang.OutOfMemoryError: requested ... bytes for GrET ... Decrease -Xmx Stack Overflow java.lang.StackOverflowError Increase -Xss As we’ve discussed, by its very nature static analysis can be a very resource intensive process. The amount of physical RAM required for a scan will depend on the complexity of the code itself. As this will be an unknown until the first attempt to scan an application, it’s possible that you will encounter OutOfMemory errors during the analysis. These errors usually fall into one of these categories. Java Heap Exhaustion is by far the most common issue. When this occurs you’ll see one of the listed errors output from SCA. While there’s a few tactics you can use to get around this, which we’ll discuss later. The only true way to resolve it is to increase the memory allocated to the scan. This is done by passing the –Xmx parameter to the scan with an increased amount of memory. Java maintains a separate memory region from the main heap which is called the Permanent Generation. In some rare cases, this memory region may get filled up during a scan, causing an OutOfMemoryError. To resolve you’ll need to increase the value of –XX:MaxPermSize. By default this is only 64MB. Increasing the MaxPermSize will increase the overall memory required by the scan. Native Heap Exhaustion is a very rare scenario, where the JVM is able to allocate the memory regions on startup but is left with so few resources for it’s native operations it crashes. The solution to this is to actually decrease –Xmx to allow the JVM space for it’s native operations. Finally you may experience Stack Overflow Errors. These often occur when SCA has to process large data structures, commonly in SQL scans. To resolve you need to increase the value of –Xss. This is only set to 1M by default. Usually increasing to 8M or 16M will resolve the problem. This will slow the scan down though, so it should be kept as low as possible without failing.

CPUs, Parallel Processing & Multi-Threading
Default SCA execution Pre-analysis Analysis Post-analysis FPR generation Cores Memory Minimum 1 4 GB Recommended 8 + 32 GB Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Parallel Analysis Mode : -j <# worker processes> JVM Pre-analysis Master Post-analysis FPR generation Cores Memory Minimum 4 16 GB Recommended 8 + 64 GB + Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread JVM Analyzer worker Max # of threads: com.fortify.sca.ThreadCount Master heap size: -Xmx Worker heap size: com.fortify.sca.RmiWorkerMaxHea p Thread JVM Analyzer worker Prior to HP Fortify v4.0 SCA operates as a single threaded process and is confined to one core. However v4.x introduces major changes to the SCA scan phase. Multi‑threaded execution is now implemented during: pre‑analysis, where we construct the data structures used by the analyzers; post‑analysis, where we conduct whole‑program analysis and generate the final issues; and FPR generation, where we bundle up the source and write the issues to the FPR file itself. Parallel processing can be triggered during the main analysis stage, when we run the SCA analyzers. Parallel processing allows you to reduce scan times by harnessing multiple cores, memory, and processing power in your machine. To trigger parallel processing you need to pass the –j argument to SCA along with the number of worker processes you need. This number doesn’t include the master process, so if you have 4 cores you should pass –j 3 in order to use all cores. While parallel processing will speed up your scan, it will also use significantly more memory. As each separate process requires it’s own chunk of RAM. By default the worker threads will use the same quantity of memory as the master thread – this will be whatever is passed to the scan with –Xmx. Ideally we recommend you allocate the master twice as much as each worker process. You can set the worker size separately with the RmiWorkerMaxHeap property. This can be set on the commandline by adding –D before it or in the fortify-sca.properties file. For the other phases, SCA will use as many threads as are available on the machine. We recommend this be left as is, however if you do need to reduce the number of threads in use you can set this with the ThreadCount property. Our recommendations for using parallel processing are: Small to medium projects (< 500K LOC) Scan in default mode Allow SCA to spin up threads on all cores Set SCA Java heap size to all of physical memory minus about 1.5GB for the OS Medium to large projects (> 500K LOC) Scan in parallel mode Allow multi‑threaded phases to use all cores Use 5 to 10 workers for parallel analysis Set master heap size as 2x worker heap size Use care to ensure that you do not overcommit physical memory Thread JVM Analyzer worker Thread

Keeping Tainted Information in Memory
Disk Default SCA execution com.fortify.sca.DisableSwapTaintProfiles=true To save memory, SCA saves the tainted information to disk and swaps the information when needed. It’s possible to force SCA to keep all scan data in memory with this property. This can drastically improve performance - in one case, the scan was taking 33 hours to complete, but by setting this, the scan time was reduced to 13 hours. Obviously this comes at a cost of higher memory usage. If SCA has enough memory available then this option will improve scan time. Otherwise, it may cause the analysis to run out of memory and produce no results. Note: Drastically increases memory consumption!

Scan Quality vs. Performance

Breaking Down Codebases
- Scan a single project - Reuse Build ID Trans 1 Trans 2 Trans 3 Scan FPR - Use a single binary or object (C/C++) - Use -append option C:\Windows\System32\cmd.exe C:\>sourceanalyzer -b BuildID -show-build-tree Debug/Sample.exe Debug/Sample.lib Debug/Sample.obj Sample.cpp stdafx.h Debug/stdafx.obj stdafx.cpp C:\>sourceanalyzer -b BuildID -bin Debug/Sample.obj -scan -f out.fpr Trans 1 Trans 2 Trans 3 Scan 1 If resources are an issue one option may be to break the scan down into more manageable chunks. Less resources required, quicker scan times, but there will be missing dataflow. In rare cases the translation may be more intensive than the scan or be being conducted on a developer machine with limited resources. In this case you can run each translation separately and then scan together. There’ll be no dataflow tracked between the separate translations however. Another option is to break the scan into separate chunks and use -append - again this will miss dataflow between the separate scans. However, a crafty trick is to translate together then run separate scans for each analyser (the dataflow uses the most resources) and then writing the results to 1 FPR to give you a complete scan. Depending on language it’s also possible to just scan part of the project rather than the entire thing. This can be useful if there’s only 1 component you need to focus on. Scan 2 Scan 3 FPR

Quick Scan & The Limiters
Set -quick in fortify-sca-quickscan.properties com.fortify.sca.limiters.ConstraintPredicateSize Default value: 50000 Quick Scan value: 10000 Skips calculations defined as very complex in the buffer analyser to improve scanning time. com.fortify.sca.limiters.BufferConfidenceInconclusiveOnTimeout Default value: true Quick Scan value: false com.fortify.sca.limiters.MaxChainDepth Default value: 5 Quick Scan value: 4 Controls the maximum call depth through which the data flow analyser tracks tainted data. Increasing this value increases the coverage of data flow analysis, and results in longer analysis times. com.fortify.sca.limiters.MaxTaintDefForVar Default value: 1000 Quick Scan value: 500 This property sets the complexity limit for data flow precision backoff. Data flow incrementally decreases precision of analysis for functions that exceed this complexity metric for a given precision level. com.fortify.sca.limiters.MaxTaintDefForVarAbort Default value: 4000 Quick Scan value: 1000 This property sets a hard limit for function complexity. If complexity of a function exceeds this limit at the lowest precision level, the analyser will not analyse that function. SPEED QUALITY The depth of analysis SCA performs sometimes depends on the available resources. SCA uses a complexity metric to tradeoff these resources against the number of vulnerabilities that can be found. Sometimes, this means giving up on a particular function when it doesn't look like SCA has enough resources available. This is normally when you will see the a "Function too complex" warning output in the resulting FPR and the logfile. When this message appears, it doesn't necessarily mean the function in the program has been completely ignored. For example, the dataflow analyzer will typically visit a function many times before analysis is complete, and may not run into this complexity limit in the early visits (since its model of other functions is less developed). In this case, anything learned from the early visits will be reflected in the results. That said, we do allow the user to control the "give up" point via some SCA properties called limiters. Different analyzers have different limiters, a predefined set of these limiters can be run using the -quick option. This table shows a handful of the limiters and their effects. For a full set of limiters please see the SCA User Guide. It’s also worth noting that by default SSC will not accept quick scans. This has to be changed with a setting on a per-project basis. However changing the limiters in an individual basis does not have this affect and FPR’s can be uploaded to SSC as normal.

Scanning Complex Functions
Function <name> is too complex for <analyser> analysis and will be skipped (<identifier>) <analyser> dataflow control flow null pointer <identifier> m Increase -Xmx s Increase -Xss - t com.fortify.sca.CtrlflowMaxFunctionTime com.fortify.sca.NullPtrMaxFunctionTime l com.fortify.sca.limiters.MaxTaintDefForVar com.fortify.sca.limiters.MaxTaintDefForVarAbort com.fortify.sca.limiters.MaxFieldDepth m : out of memory s : stack size too small t : taken too much time l : too many distinct locations Note: Increasing limiters will likely also increase scan time When SCA hits one of the predefined limiters you’ll see a warning such as this printed to the SCA logs. As I’ve said this does not mean that SCA has failed to scan that function, just that we’ve reached a point where we’ve made the decision to give up and move on. If you have the available resources you can increase the appropriate limiters to perform a deeper scan. This table shows you which limiters will require adjusting based on the warning thrown. Please note though, that increasing any of the limiters may result in deep issues which we wouldn’t have reported otherwise, but it will lead to an increase in the scan time. The default limiters are set at the level our development and research teams feel appropriate for the vase majority of users.

Limiting Analysers and Languages
Disabling Analysers Disabling Languages -analyzers com.fortify.sca.DefaultAnalyzers -disable-languages com.fortify.sca.DISabledLanguages White List Black List actionscript vb vb6 c cfml cobol cpp html python plsql java abap tsql javascript objc any_sql php llvm jsp asp csharp vbscript dataflow semantic controlflow configuration structural content buffer On occasion you may find that a significant amount of the scan time is spent either running one particular analyzer or analyzing a particular language. It’s also possible that this particular analyzer or language is not of great interest to your security requirements. In these cases it’s possible to limit the specific analyzers which run, and also the specific languages which are translated. These properties allow you to configure SCA to only run a subset of the analysers, or only scan a subset of languages. With the –analyzers property you need to pass a list of the analysers you do want to run. However with the –disable-languages option you need to pass a list of the languages you do not want to scan.

Scan Size vs. Performance

Filtering Results Filter files Scan time filters -filter filter.txt
#List of categories, IID’s and Rule ID’s Poor Logging Practice 60AC727CCEEDE041DE984E7CE 823FE039-A7FE-4AAD-B976-9EC53FFE4A59 txt Filter files -filter filter.txt filter.txt Scan time filters -project-template ProjectTemplate.xml -Dcom.fortify.sca.FilterSet=OWASP_Filter_set xml ProjectTemplate.xml

Creating FPRs without Source Code
Bring down scan time and reduce the FPR size with: -Dcom.fortify.sca.FPRDisableMetatable=true* *Undocumented property FPR Normal scan including both source and snippets FPR Scan run with -disable-source-bundling Once a scan has been completed, you can often find yourself dealing with an excessively large and unwieldy results file. This can make life very difficult if this needs to be distributed to other members of your team to audit or if large amounts of memory are needed to even open the FPR in Auditworkbench. There are however a number of ways to alleviate this pain… There’s actually a hidden property called FPRDisableMetatable. Setting this means we won’t write the data to the FPR which pertains to which functions were scanned and if they were covered by our rules. Many customers don’t actually use this functionality so it can be an acceptable loss. Please note though that there is currently a bug with DisableMetatable which means it also removes the archive to view the source on SSC Server - it will however still be viewable in Auditworkbench. Removing the bundled source and snippets will create lightweight FPR’s. It’s possible to link these back to the source in AWB. However no source will be shown in SSC so collaborative auditing on the SSC GUI isn’t really an option. It’s possible to download FPR’s from SSC which do not contain the source. So it can often be advantageous to perform a scan, upload this to SSC and then grab your lightweight FPR from SSC. While this won’t contain the full source archive, it will contain snippets however. FPR Scan run with -disable-source-bundling and -Dcom.fortify.sca.FVDLDisableSnippets=true

Opening Large FPRs Set in <SCA Install Directory>\Core\config\fortify.properties com.fortify.DisableProgramInfo=true This disables use of the code navigation features within AWB. com.fortify.model.IssueCutOffStartIndex=<number> (inclusive) com.fortify.model.IssueCutOffEndIndex=<number> (exclusive) The IssueCutOffStartIndex property is inclusive and IssueCutOffEndIndex is exclusive so that you can specify a subset of issues you wish to see. E.g. To see the first 100 issues, you can specify: com.fortify.model.IssueCutOffStartIndex=0 com.fortify.model.IssueCutOffEndIndex=101 However because the IssueCutOffStartIndex is 0 by default, this can be left out. com.fortify.model.IssueCutOffByCategoryStartIndex=<number> (inclusive) com.fortify.model.IssueCutOffByCategoryEndIndex=<number> (exclusive) These are similar to the above properties except these are specified for every category. E.g. If you wanted to see the first 5 issues for every category you would specify: com.fortify.model.IssueCutOffByCategoryEndIndex=6 com.fortify.RestrictIssueLoading=true This restricts the data that is held in memory, but may cause poor performance. com.fortify.model.MinimalLoad=true This restricts a lot of data from being loaded in the FPR so that only the bare minimum information is loaded. This will also restrict usage of the functions view and may prevent the source being loaded from within the FPR. com.fortify.model.MaxEngineErrorCount=<number> Available from v4.20. Limits the number of errors loaded with the FPR. For projects with a large number of scan warnings this can significantly reduce both load time in AWB and the amount of memory required to open the FPR. FPR

Monitoring Long Running Scans

SCAState [options] <SCA process ID>
--all --heap-dump --thread-dump --full-thread-dump --program-info -properties -scaversion -timers -vminfo Windows Open a command prompt Run: tasklist Locate java.exe PID Linux/Unix Run: ps aux | grep sourceanalyzer

JConsole and JVisualVM
JMX JMX Monitor HPROF / HAT JConsole Java VisualVM SCA Process JMX 9090/? JConsole and JVisualVM Set configuration: export SCA_VM_OPTS= "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9090 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management .jmxremote.authenticate=false" Run scan as usual: sourceanalyzer -b BuildID –scan –f myResults.fpr Open results with either tool: jconsole localhost:9090 jvisualvm localhost:9090 HPROF & HAT Run scan with HPROF: sourceanalyzer -b BuildID –scan –f myResults.fpr -Xrunhprof:cpu=samples,interval=1, depth=10,format=b, file=java.hprof.bin,heap=dump View results with HAT: jhat -J-Xmx4G ./java.hprof.bin

Any questions?

Thank you

Fortify Performance Guide

Similar presentations

Presentation on theme: "Fortify Performance Guide"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fortify Performance Guide

Similar presentations

Presentation on theme: "Fortify Performance Guide"— Presentation transcript:

Similar presentations

About project

Feedback