NeuroCTI - a custom LLM for CTI - benchmarking, successes, failures and lessons learned (updates)
2024-10-22 , Europe - Main Room

LLMs turn out to be highly practical for summarising and extracting information from unstructured Cyber Threat Intelligence (CTI) reports. However, most models were not trained specifically for understanding CTI. We will present a custom LLM, fine-tuned for CTI purposes. But of course, that only makes sense with a CTI text benchmark dataset. Creating these two systems is a challenging journey. Set-backs guaranteed. We will share our findings.


(This is an update from the FIRSTCON24 talk)

Many CTI practitioners and companies experimented with LLMs for extracting information from unstructured CTI reports in the last year. Often, the dream is to automate the analyst's job to correctly identify, copy & paste TTPs, threat actors and relationships from the report and to convert it into STIX.

Alas, off-the-shelf LLMs often fail at this task (GPT-4-turbo being already pretty good at the submission time). But there is another caveat: the requirements for IT security often demand that data remains on-premise or at least in a virtual server which is fully and only under the control of the organisation's IT team. For that we need local LLMs (as opposed to cloud bases SaaS/FaaS solutions such as openai.com's API). But how to achieve good results with local LLMs ? Can we beat openai?

To address the CTI text summarisation and information extraction problem, we

  1. propose an open source CTI LLM benchmark dataset which can be used to compare different LLMs and prompts
  2. a fine-tuned custom CTI LLM model ("neuroCTI") and
  3. evaluate it (as well as other LLMs) against the benchmark dataset and
  4. finally, integrate serving the model via ollama and MISP integration.

The model is freely available for local deployments.

See also: slides (8.3 MB)

Aaron likes to be at the forefront of tech developments because he feels it's important to understand trends and tech on a deep level in order to anticipate changes and form and guide them into a positive direction which serves humanity. Less dystopia, more positive utopia, please.
In a past life, he was working at the national CERT of Austria, CERT.at. He was doing mesh networks, and medical AI.