Explainability Should be a Solved Problem in Security Products
How LLMs finally solve the 'black box' problem for security products
Almost every security product is a decision making machine. Give it data from a few different sources and it tells you if something is safe, a threat, or somewhere in between. Accuracy in security products is a known issue, but the biggest complaints users have are often not about accuracy but instead about explainability.
Security products are often labelled ‘black boxes’ that struggle to explain why they’ve made a decision. As a result, they’re hard to assess in a trial, difficult to trust day to day, and can be extremely frustrating when things go wrong.
One of the biggest improvements we can make to security products is to improve their explainability. I hear some people saying that modern AI (Large Language Models (LLMs) etc.) are going to make explainability worse, but the reality should be the opposite - LLMs should finally be the answer to solving explainability in security products.
Rules → Machine Learning → LLMs
To explain why LLMs solve the explainability problem, first let me give a very quick history of the technology used in security products.
VERY roughly, the technology used to make decisions in security products can be placed into three buckets:
Rules: If-this-then-that decisions that rely on pre-defined logic or lists of known threats. The majority of security products still rely mostly on rule-based engines, even if their marketing says otherwise. Rules are great at catching the most obvious threats with high accuracy (e.g. a malicious URL that has been seen many times before) and provide excellent explainability (because they are so simple). However, rules struggle to accurately assess complex issues as they fail to capture the full context.
2010s era Machine Learning: Probabilistic models trained to predict whether something is a threat based on 1-n inputs (usually around 5-100 data points). The move to cloud enabled machine learning to become useful in security sometime around the early/mid 2010s and it has had a significant impact across endpoint, network, and email security in particular. These models were able to capture a lot more context than simple rules and could therefore catch mored advanced or novel threats with greater accuracy. However, due to their probabilistic nature, they suffered from significant explainability issues. How can you trust a tool if you don’t know why it’s made a decision? It’s like working with a team mate who will only respond ‘trust me’ when asked why they let something go.
2022+ era ‘AI’ (i.e. Large Langauge Models): Systems backed by Large Language Models that tackle problems in a more ‘human-like’ way. I’ve heard a lot of concerns about the explainability of newer ‘AI’ products. LLMs are by definition probabilistic so, on the face of it, it’s natural to be concerned about explainability. The reality should turn out different — LLMs actually give us the answer to poor explainability that has plagued machine learning based products in the past.
LLMs, Agents, & Explainability
LLMs work through next token prediction. In simple terms, you give them an input and they give you the most probable response, written as text.
For a single call to an LLM, how exactly they decide on each token (~word) can be opaque, but they do explain their thinking clearly.
Take this interaction with ChatGPT - I asked it a vague, open ended question (as many security questions are) and it explained its thinking clearly. It literally spells it out for us. This verbose feature of LLMs helps with explainability but doesn’t entirely solve the problem. What if the answer was wrong? Do we really get much insight into it’s thinking? Can we understand how the weights in the model and the training data led to this conclusion?
This is where I think a lot of people are getting confused. They are looking at single calls to LLMs like this (which is how a lot of LLM based products in security work at the moment) and saying well how can we really understand them?
When you’re only making a single call to an LLM like in the case above it’s too easy for the system to make mistakes yet sound like it has high confidence. Most of the time, too much is being asked in a single call to the LLM — the system lacks time to think, properly consider all the context, and come to a good answer. It’s part of the reason why we’ve seen so many early LLM products in security fall flat.
The real wins come when we move away from single calls to LLMs and towards Agents (more on these here). Agents are pipelines of LLMs that are called one after another without human input. Agents commonly combine 30-40 calls to LLMs and analyze many different data sources. The best agents step methodically through a decision, breaking it down into smaller questions, and challenges its own assumptions along the way.
Combine the step-by-step approach of Agents with the verbose nature of LLMs and you get incredible explainability. Understanding why an Agent made a decision is like reading a transcript of a long board meeting - it walks through the thinking step by step, describes all the variables in detail, captures debate from many angles, and explains how it reached a decision. The biggest challenge is actually collapsing all this thinking back down to something a human can quickly understand.
Explainability Should Soon Be a Solved Problem in Security
We’re still so early with AI in security that things will take time to change. The good news is that LLMs and Agents give us a path towards exceptional explainability as well as high accuracy.
The interesting corollary to the fact that Agents/LLMs provide great explainability is that it can be used as a signal to unpick fact from reality in the ‘AI’ marketing claims of security vendors. If a product tells you its based on modern AI (i.e LLMs/Agents) yet it has terrible explainability, it’s a pretty good signal that they are stretching the truth.
Explainability has been one of the most challenging things about building machine learning products for a long time. As LLMs improve explainability, users of security products should be able to assess them more easily during trials, trust them more readily day to day, and spend less time checking their homework. It’ll take some time, but explainability should soon be a solved problem in security products.