Exploitable Path Advanced Topics

This is the third and final blog on Exploitable Path – a unique feature that allows our customers to prioritize vulnerabilities in open-source libraries. In the first blog, we introduced the concept of Exploitable Path and its importance. The conclusion was that a vulnerability in a library is considered exploitable when:

The vulnerable method in the library needs to be called directly or indirectly from a user’s code.
An attacker needs a carefully crafted input to reach this method and trigger the vulnerability.

In the second blog, we discussed some of the challenges in developing such a feature, and our unique approach. Mainly:

Using a query language over the CxSAST engine for the abstraction of queries over source code. This allows a more language-agnostic approach, so that Exploitable Path works for every programming language supported by CxSAST.
We walked through the various CxSAST queries that are required to build a full call graph of a user’s source code and its libraries’ source code. By crossing it with vulnerability data, we can know if a vulnerability is exploitable or not.

In this last blog in the series, we will cover more advanced topics we faced during the development of Exploitable Path.

Challenge no. 1 – Supporting Multiple Library Versions

The public data on a CVE usually contains affected versions, but how can we use this information to support Exploitable Path across versions? Meaning, if the source code of a library changes between various versions, how can we have the required data for Exploitable Path for each of those versions?
Let’s assume we have a user’s source code that uses a single open-source library. This library contains a vulnerability, and using Mitre, we can figure out the affected versions.
To be able to assess if the vulnerability is exploitable, we need the following for each version on the library:

A call graph of the library’s code. This can be done automatically using CxSAST.
Is the current version vulnerable?
- If it is, the inner method in which the exploitation occurs is required.

Now the question is, “how can we find this inner method for each vulnerable version”? Going over each version manually is not practical, especially since a library can have hundreds of versions.
The first part of the solution is to find the inner method that’s vulnerable. Usually, a vulnerability goes together with a specific method (or methods) that are responsible for a certain logic. Pull requests and commits for the relevant CVE, help our Analysts uncover the relevant method.
Next, we generate a fingerprint of the fix – if a version contains the fix, we can mark it as not vulnerable to this CVE. This is where our powerful static code analysis tool comes into play again, making it easy to re-assess hundreds of library versions for the vulnerability.
Re-assessing the affected versions of a vulnerability is crucial. As it turns out, this data on public websites like Mitre is often not precise. Versions that are marked as vulnerable can be safe and vice versa. It can be the result of human error, or even a slight difference in the version tags between the public registry and the git repository on which the library is developed. By searching for the fingerprint of the fix, we can ensure the quality and accuracy of our vulnerabilities data.
Using the in-depth analysis process, the vulnerable method is marked for every affected version, eventually resulting in a very accurate Exploitable Path scan.

Challenge no. 2 – Data Flow

Just because your code calls a vulnerable method, that doesn’t mean you are automatically at risk. To assess the risk properly (and avoid false positives), it’s crucial to have both a call graph and a DFG (Data Flow Graph) of a code to assess its exploitability
Let’s start with an example, and assume that a method called parse(content) has a DoS (Denial of Service) vulnerability given the right input. If parse() is only called with a constant value, meaning parse(CONSTANT_VALUE), there is no attack surface for an attacker to exploit it and cause a DoS. On the other hand, if a user of the application controls the input parameter of parse(), it’s a different story. For example, this input can be a comment or other data provided by the user. In such a case, the attacker can easily exploit the vulnerability and craft the required input.
The reality is more complex, as there are various ways data can be transferred in code:

Input parameters
Global or class members
The return value of another method invocation

Also, not all data options are necessary for exploitation. For example, a method parseRequest(HttpRequest request, Config config) can be vulnerable for exploitation using only the HttpRequest.Content member in the request parameter.
Now we understand the importance, but how do you incorporate DFG in the process of assessing a vulnerability? To be more specific, how can we know that a vulnerability is exploitable from a data flow point of view?
First, we use CxSAST to build a DFG. We start at the vulnerable method and trace back the origins of data point. Eventually we’ll reach one of the following cases:

A constant value. This is not exploitable, of course.
An input parameter of a method that is not called by other methods. This is a potential data flow compromise, as in the context of the static code scan, we don’t know how the method is invoked.
An internal method of the language is called, such as fopen() in Python.
A method of a different library is called, and its source code is not available.

The last two cases are the most interesting ones, and have two complementary approaches:

As a rule of thumb, mark those methods as a potential for data flow compromise since the inner implementation is unknown.
Mark specific methods as definite data flow compromises. For example, reading contents from a database pipe file. The same goes for parsing HTTP packets, pulling a message from a message queue, etc.

These two approaches are the basis for DFG support in assessing a vulnerability for exploitability.

Summary

In this blog we covered two additional advanced topics in Exploitable Path. We started with the problem of supporting various library versions, and how this is solved using the in-depth analysis process. Then, we discussed the integration of DFG in the vulnerability evaluation process, and how to backtrack the flow of data in the code.
With CxSCA, Checkmarx enables your organizations to address open source vulnerabilities earlier in the SDLC and cut down on manual processes by reducing false positives and background noise, so you can deliver secure software faster and at scale. For a free demonstration of CxSCA, please contact us here.

Exploitable Path – Advanced Topics

Challenge no. 1 – Supporting Multiple Library Versions

Challenge no. 2 – Data Flow

Summary

Read More