Prompt injection is a problem that may never be fixed, warns NCSC

Prompt injection is shaping up to be one of the most stubborn problems in AI security, and the UK’s National Cyber Security Centre (NCSC) has warned that it may never be “fixed” in the way SQL injection was.

Two years ago, the NCSC said prompt injection might turn out to be the “SQL injection of the future.” Apparently, they have come to realize it’s even worse.

Prompt injection works because AI models can’t tell the difference between the app’s instructions and the attacker’s instructions, so they sometimes obey the wrong one.

To avoid this, AI providers set up their models with guardrails: tools that help developers stop agents from doing things they shouldn’t, either intentionally or unintentionally. For example, if you tried to tell an agent to explain how to produce anthrax spores at scale, guardrails would ideally detect that request as undesirable and refuse to acknowledge it.

Getting an AI to go outside those boundaries is often referred to as jailbreaking. Guardrails are the safety systems that try to keep AI models from saying or doing harmful things. Jailbreaking is when someone crafts one or more prompts to get around those safety systems and make the model do what it’s not supposed to do. Prompt injection is a specific way of doing that: An attacker hides their own instructions inside user input or external content, so the model follows those hidden instructions instead of the original guardrails.

The danger grows when Large Language Models (LLMs), like ChatGPT, Claude or Gemini, stop being chatbots in a box and start acting as “autonomous agents” that can move money, read email, or change settings. If a model is wired into a bank’s internal tools, HR systems, or developer pipelines, a successful prompt injection stops being an embarrassing answer and becomes a potential data breach or fraud incident.

We’ve already seen several methods of prompt injection emerge. For example, researchers found that posting embedded instructions on Reddit could potentially get agentic browsers to drain the user’s bank account. Or attackers could use specially crafted dodgy documents to corrupt an AI. Even seemingly harmless images can be weaponized in prompt injection attacks.

Why we shouldn’t compare prompt injection with SQL injection

The temptation to frame prompt injection as “SQL injection for AI” is understandable. Both are injection attacks that smuggle harmful instructions into something that should have been safe. But the NCSC stresses that this comparison is dangerous if it leads teams to assume that a similar one‑shot fix is around the corner.

The comparison to SQL injection attacks alone was enough to make me nervous. The first documented SQL injection exploit was in 1998 by cybersecurity researcher Jeff Forristal, and we still see them today, 27 years later.

SQL injection became manageable because developers could draw a firm line between commands and untrusted input, and then enforce that line with libraries and frameworks. With LLMs, that line simply does not exist inside the model: Every token is fair game for interpretation as an instruction. That is why the NCSC believes prompt injection may never be totally mitigated and could drive a wave of data breaches as more systems plug LLMs into sensitive back‑ends.

Does this mean we have set up our AI models wrong? Maybe. Under the hood of an LLM, there’s no distinction made between data or instructions; it simply predicts the most likely next token from the text so far. This can lead to “confused deputy attacks.”

The NCSC warns that as more organizations bolt generative AI onto existing applications without designing for prompt injection from the start, the industry could see a surge of incidents similar to the SQL injection‑driven breaches of 10—15 years ago. Possibly even worse, because the possible failure modes are uncharted territory for now.

What can users do?

The NCSC provides advice for developers to reduce the risks of prompt injection. But how can we, as users, stay safe?

Take advice provided by AI agents with a grain of salt. Double-check what they’re telling you, especially when it’s important.
Limit the powers you provide to agentic browsers or other agents. Don’t let them handle large financial transactions or delete files. Take warning from this story where a developer found their entire D drive deleted.
Only connect AI assistants to the minimum data and systems they truly need, and keep anything that would be catastrophic to lose out of their control.
Treat AI‑driven workflows like any other exposed surface and log interactions so unusual behavior can be spotted and investigated.

We don’t just report on threats—we help safeguard your entire digital identity

Cybersecurity risks should never spread beyond a headline. Protect your, and your family’s, personal information by using identity protection.