Gradiant is part of the BIECO project that is developing different tools for vulnerability detection, that arises from the need to address the increase detected in cybersecurity attacks in recent years. The boost in teleworking has provided attackers with a vast field to act in id, in most cases, insufficiently protected. Clear examples of this were the attacks that affected the National Public Employment Service (SEPE) in 2021, which left its computer system paralysed, or the ransomware that caused the blocking of Media Markt’s servers in the middle of the Black Friday campaign.
In order to guarantee the security of an organisation and therefore prevent an attacker to break its system, it is essential to correctly assess its security. This is where the CIA triad becomes a fundamental feature. This involves taking stock of compliance with the properties of Confidentiality, Integrity and Availability. The biggest problem we may face when carrying out such a balance is the appearance of software vulnerabilities.
For this reason, it is important to identify these vulnerabilities at an early stage of the system development life cycle, as well as to improve the assessment tools and processes to detect, assess and mitigate them. Gradiant, which participates in the BIECO project, is developing different tools to enable such assessment in the future. In this article, we explain some of the most used techniques for the development of vulnerability detection tools through static code analysis.
Vulnerability detection techniques
As a general rule, vulnerabilities are not perceived by developers during normal system operation. This leads to the search for methods that help in this operation. These techniques can be classified into three main groups: anomaly detection techniques, vulnerable code pattern recognition techniques and Vulnerability Prediction Models (VPM).
Anomaly detection based techniques
Anomaly detection-based techniques look for events that are considered unusual, i.e. those that differ from the majority of the data. These techniques focus on a syntactic and semantic analysis of the source code to extract features or rules. In this way, a model of the normal behaviour of the system is obtained. Most of the studies reviewed use data mining techniques, association rules or even information from previous versions of the code to generate these rules. Although useful in detecting anomalies, such techniques can give false positives, as they do not take into account unusual behaviour or the use of inappropriate APIs. Also, they do not provide the type of vulnerability detected, which gives a more general overview in terms of vulnerability detection.
Vulnerable code pattern recognition techniques
Vulnerable code pattern recognition techniques use Machine Learning (ML) techniques to automatically identify code patterns by extracting different characteristics from the code. As with anomaly detection techniques, a syntactic and semantic analysis of the code is performed. The difference lies in defining models and patterns of vulnerable code segments. The vast majority of the techniques analysed propose the creation of a graphical representation of the code, i.e. they change the data into a vector, or embedding, which will feed the selected ML algorithms. This technique provides better results than the previous ones and facilitates an approximate localisation of the detected vulnerability. However, the reviewed studies still do not provide the type of vulnerability.
Vulnerability Prediction Models
Finally, there are Vulnerability Prediction Models. These types of models are based on the use of ML algorithms together with different software metrics. Some of the techniques used in these models are dependency analysis, code metrics or text mining. VPMs determine by means of different metrics which components are most likely to contain a vulnerability and make it possible to locate them within the system. Like the previous techniques, these models do not offer a concrete location or the type of vulnerability detected, but they can be very useful when it comes to narrow down possible conflict areas for a later, more detailed analysis.
A vulnerability detection technique for each objective
As we have seen, there are three main groups of techniques to use when creating vulnerability detection tools. Anomaly detection or VPM techniques are suitable for detecting blocks of source code that are prone to be vulnerable. Anomaly detection techniques can also be used to find bugs and possible faulty APIs. Meanwhile, based on vulnerable code pattern recognition are the most promising for detecting vulnerabilities in a more detailed way. There is a fourth very interesting line of research. This is the possibility of combining vulnerability detection techniques. One example is the joint work with pattern recognition methods and VPM, in order to narrow down parts of the code under analysis and improve the accuracy of the results.
This work was supported by the BIECO project (www.bieco.org) which received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 952702.
Author: Eva Sotos Martínez, researcher-engineer in Security & Privacy department at Gradiant