When in 1984, William Gibson wrote Neuromancer, years before world wide web was born, he was not only able to anticipate to hyper connected world using the term “cyberspace”, but he also foresaw cybersecurity associated problems. In the novel, Casi, the antihero and hacker (cyber cowboy in Gibson’s text) is hired to break the security of a critical system in a big corporation. These kind of technically advanced attacks, using cyberspace, well-funded and sponsored by governments and big corporations is unknown today; in fact, we all know them as Advanced Persistent Threats (APT).
In Gibson’s novel, corporations protect their critical systems with detection tools and attacks neutralization, known as Intrusion Countermeasures Electronics (ICE). The equivalent in our time would be IDS, Intrusion Detection System or IPS, Intrusion Protection System. In Neuromancer, the advanced versions of ICE, known as Black ICE are capable not only of avoiding attacks, but also of killing the person responsible of the attack, and are controlled by Artificial Intelligence. Besides the point of view of cybersecurity the goal is not to kill the attacker but also to identify, stop and extract intelligence of the attack, was Gibson right? How important is artificial intelligence in cybersecurity field and, particularly, in intrusion detection systems?
Artificial Intelligence in cybersecurity
Cybersecurity refers to a set of techniques used to protect the integrity of networks and nodes from attacks, damages or unauthorized access or modifications. The most significant of these technologies include intrusion detection systems (IDS), also for networks and nodes of network. Nowadays there are three kind of IDS depending on the underlying technologies in the detection: based on signatures, based on the detection of anomalies and hybrid systems.
Signature-based systems are particularly used for detecting known attacks. Such as a virus, that contains a specific file or a SQL Injection attack that has a specific known text string. Those systems require signatures be updated constantly, and of course, they cannot detect unknown attacks (zeroday attacks). Systems based on the detection of anomalies identify patterns of behaviour that represent a deviation from “normality”. Those systems are more powerful than those based on signatures, because it is possible to detect unknown attacks. Another important advantage is that “normality” is defined for each system, making more difficult to attackers the preparation of tools to go unnoticed for IDS. Hybrid systems combine both, signature-based and based on anomalies detection. In practice, there are hardly pure systems based on anomalies detection.
Anomalies detection can be done by using different techniques, such as pure statistical estimates and data mining, but artificial intelligence techniques, such as machine learning (ML), are very promising tools. Nowadays, we are living in a moment of technology mature, which is why machine-learning techniques can still show a high false alarms rate, making more difficult to work on security analysis. Nevertheless, research keeps going on faster in the right direction to achieve, more and more, better sensibility levels, giving a high profile of detected anomalies and a low rate of false alarms.
Red ICE, research at Gradiant
Drawing on Gradiant’ experience on machine learning systems and other artificial intelligence solutions design, and on the technologies development to improve cybersecurity, our team has just completed a proof of concept of a machine learning motor to validate how to apply ML techniques to intrusion detections. First step was to decide what sources of information we would use as system access.
Second step was to define a common normative language to transform the different data sources. The chosen language is inspire on Apache Spot Open Fecha Model, this initiative is giving first steps to a common taxonomy to describe the security telemetry data used to detect threats.
Once normalized, data can be introduced into Red ICE, machine-learning engine designed by Gradiant. This engine incorporates different algorithms of automatic learning, aimed at the detection of intrusions. As an example, mentioning an algorithm that models the role of each host in the network, that is, if the host is primarily a data producer or a data consumer. Next, the algorithm looks for paper changes to detect possible intrusions.
Another functionality supported by Gradiant Red ICE is the summary of event logs. Intrusion detection systems usually activate thousands of alerts every day. Because of that amount of information, important alerts may go unnoticed by the network administrator. Custom IDS configuration or post-processing output are expensive tasks that require expert knowledge. For this purpose, we applied techniques of pattern mining and clustering of texts to summarize the alerts and highlight the atypical values of the same.
The alerts were also analysed with process mining techniques. The algorithm generates model processes that represent attack strategies. Next, the data flow can be analysed in real time to detect an attack process and react to it before the attack is completed.
The next steps to expand the capacity of the system are mainly to expand the sources of information, including network data flows, network packet captures and antivirus information and system activity records. The good results obtained in the proof of concept serve as an indicator of the convenience of using Artificial Intelligence techniques to cybersecurity. To continue with this line of research, Gradiant within a powerful European consortium has just requested funding from the European Union through the research support program H2020.
The future
Walking around any computer security fair is easy to see that main manufacturers, especially the emerging, are betting on Artificial Intelligence to improve the capabilities of their products, whether they are antivirus, intrusion detection systems, event management systems of security, firewalls, spam detectors, etc. but they actually are small functionalities built on the core of their products. Due to their learning capacity, artificial Intelligence algorithms are a suitable technology for a problem, cyber-attacks, which constantly evolve to evade detection tools. We seek, therefore, to reverse the classical paradigm of cybersecurity according to which defensive measures always go behind the attackers. The application of AI to cybersecurity is an emerging field, also spiced up, of course, by the effect on the marketing of Machine Learning, Deep Learning, etc. This way, we can expect to apply AI in a whole new generation of products to improve safety and reduce the necessary costs for its management.