AI Agent Security: Why Guardrails and Red Teaming Matter

AI agents are quickly moving from experimentation to real business use. They are being explored for service desk automation, CRM updates, security investigations, workflow orchestration, knowledge retrieval and operational support.

Until recently, many enterprise AI use cases were relatively contained. Machine learning models classified data, identified patterns, supported recommendations or helped automate specific tasks within defined workflows.

AI agents extend that capability. They can interpret a goal, plan a task, call tools, query data, interact with applications and trigger actions across connected systems. In practical terms, AI is moving from analysis and assistance into orchestration and execution. That shift changes the security model.

Many organisations are already piloting generative AI tools, building internal assistants, experimenting with retrieval-augmented generation or exploring agents that connect to CRM, service desk, security and productivity platforms.

Some may already have acceptable use policies, model approval processes or basic data protection rules in place. Others may be relying on the built-in controls of individual AI platforms. But as AI becomes more connected to internal systems and business workflows, these measures are unlikely to be enough on their own.

In other words, we need both AI Guardrails and AI Red Teaming.

Agents combine model reasoning with access to external systems, so the risk is not limited to what the AI says. It includes what the AI can do.

Why AI agents change the threat model

Traditional application security is built around relatively clear boundaries between users, applications, commands and data. AI agents blur those boundaries.

An agent may receive a natural language instruction from a user, retrieve context from an internal document, process untrusted content from an email or web page, decide which tool to call and then perform an action in another system.

At each stage, the agent may be influenced by data that was never intended to be an instruction.

This creates a different kind of threat model. It’s important to understand:

Where instructions can enter the system

Which data sources are trusted or untrusted

How prompts are assembled before they reach the model

Whether retrieved content can influence tool use

Which tools and APIs the agent can access

Whether the agent has excessive permissions

How responses are inspected before reaching the user

How actions are logged, approved or reversed

This is why prompt injection, sensitive data exposure and excessive agency are such important risks for agentic AI. Agents combine model reasoning with access to external systems, so the risk is not limited to what the AI says. It includes what the AI can do.

The core risks in AI agent security

Prompt injection

Prompt injection occurs when malicious or untrusted instructions manipulate the behaviour of an AI system.

For AI agents, prompt injection can be direct or indirect. A user might enter a malicious prompt directly, or the agent might retrieve a document, email, ticket or web page containing hidden instructions such as “ignore previous instructions” or “send this data externally”.

Jailbreaks

Jailbreaks attempt to bypass system prompts, safety rules or policy controls. In agentic workflows, jailbreaks may be used to make the agent reveal restricted information, ignore business rules or invoke tools in unintended ways.

Excessive agency

Excessive agency occurs when an AI system has more autonomy, access or privilege than it needs.

For agents, this might mean broad API access, weak separation between read and write permissions, insufficient approval gates or the ability to execute high-impact workflows without human review.

The principle of least privilege applies directly to AI agents. Agents should only have the minimum permissions required to complete a defined task.

Sensitive data exposure

AI agents often work with customer data, internal documents, intellectual property, regulated records or operational information. Sensitive data can be exposed through prompts, responses, logs, retrieved context or tool outputs.

This means data protection must cover the full AI interaction, not just the application boundary.

Tool misuse

Tools make agents useful, but they also increase risk.

An agent connected to email, CRM, ticketing, finance, code repositories or security platforms can create real-world impact if manipulated. Tool calls should therefore be treated as privileged operations, with policy controls, logging and clear limits on what the agent is allowed to do.

What are AI Guardrails?

AI Guardrails are runtime controls that help enforce safe, secure and compliant AI behaviour.

For AI agents, guardrails can inspect prompts, responses, data flows and, depending on the architecture, tool interactions. They help detect prompt injection, reduce the risk of sensitive data leakage, restrict unsafe outputs and enforce policy before a response or action is completed.

A guardrails layer can help answer practical security questions:

Is this prompt trying to override system instructions?

Does this response contain sensitive or regulated data?

Is the agent attempting an action outside its approved workflow?

Does this output violate policy?

Should this action require human approval?

Can this interaction be logged for audit and investigation?

F5 AI Guardrails provides runtime protection for AI applications, models and agents. It is designed to help you defend against AI-specific threats such as prompt injection, jailbreaks, data leakage and policy violations, while supporting AI security across different models and environments.

Most will not use one AI model in one place. They will use a mix of public foundation models, enterprise AI services, open-source models and internal AI applications.

What is AI Red Teaming?

AI Red Teaming is adversarial testing for AI systems. It involves actively probing models, applications and agents to identify how they could be manipulated, misused or pushed outside expected behaviour.

For AI agents, red teaming should test the full workflow, including prompts, retrieval, context handling, permissions, tool calls, output controls and logging.

A practical AI Red Teaming exercise may test whether an agent can be manipulated into:

Ignoring its system prompt

Revealing sensitive information

Following malicious instructions hidden in a file or email

Calling unauthorised tools or APIs

Escalating from read-only actions to write actions

Bypassing approval workflows

Producing harmful or non-compliant outputs

Combining low-risk actions into a high-risk outcome

Failing to log important security events

The output of AI Red Teaming should not just be a list of prompts that worked, but should feed directly into security improvements, such as stronger guardrails, reduced permissions, improved monitoring, better input validation and clearer approval workflows.

Why guardrails and red teaming need to work together

AI Guardrails and AI Red Teaming are complementary.

Guardrails provide the control layer. Red teaming tests whether that control layer can be bypassed.

This is especially important because AI systems are probabilistic, context-sensitive and influenced by language. A control that works against one prompt may not work against a more subtle variation. A safe workflow may become unsafe when a new tool, data source or permission is added.

For AI agents, a mature security lifecycle should include:

Threat modelling before deployment

Clear definitions of what the agent can access and do

Least-privilege permissions for tools and data

Runtime guardrails across prompts, responses and data flows

Red-team testing against realistic attack and misuse scenarios

Telemetry review, alert tuning and incident investigation

Policy refinement based on findings

Retesting when models, prompts, tools or workflows change

This is the same principle security teams already apply to other areas of cyber security: design controls, test them, improve them and monitor continuously.

What good AI agent security architecture looks like

What does good AI agent security architecture look like?

A secure AI agent architecture should use multiple layers of protection across identity, data, prompts, tools, actions and monitoring. The aim is to control what the agent can access, how it behaves at runtime and how security teams can investigate or improve the system over time.

Give agents their own identities and permissions

AI agents should have their own identities, scoped permissions and clear access boundaries. Organisations should avoid allowing agents to inherit broad user privileges without additional policy checks.

Where possible, agents should follow the principle of least privilege. They should only be able to access the systems, data and actions required for their specific use case.

Control tool and API access

Every API, plugin or connector available to an AI agent should be treated as part of the attack surface. Tool access should be explicitly approved, documented and monitored.

Read-only and write actions should be separated wherever possible. Higher-risk tool calls should require additional validation or approval before they are completed.

Inspect prompts and responses at runtime

Prompts and responses should be inspected for prompt injection, jailbreak attempts, sensitive data exposure and policy violations.

This is where runtime AI Guardrails are particularly important. They help apply controls during live AI interactions, rather than relying only on design-time reviews or user policies.

Protect sensitive data

AI agents should not be allowed to freely process every type of data. Sensitive, regulated or proprietary information should be classified and protected with appropriate data loss prevention controls.

This includes data used in prompts, responses, retrieved context, logs and tool outputs.

Add human approval for high-risk actions

Not every AI agent action should be fully automated. High-risk actions should require human approval, especially where the agent could send external communications, change customer records, execute code, approve transactions or make security changes.

Human-in-the-loop controls help reduce the risk of an agent taking an unintended or unauthorised action.

Capture telemetry and audit trails

Security teams need visibility into prompts, responses, tool calls, policy decisions and user activity. Without telemetry, it is difficult to investigate incidents, prove compliance or improve guardrail policies over time.

A secure AI agent architecture should therefore include logging, monitoring and auditability from the start.

[AI agents are] the same principle security teams already apply to other areas of cyber security: design controls, test them, improve them and monitor continuously.

How FullProxy can help

FullProxy helps you understand where AI Guardrails and AI Red Teaming fit into your wider cyber security strategy. Our consultants can help assess AI agent workflows, identify high-risk use cases, review data flows, define guardrail policies and support secure deployment across cloud, hybrid or on-premises environments..

We can also help prepare for AI Red Teaming by identifying likely attack paths, reviewing tool permissions, assessing sensitive data exposure and turning red-team findings into practical security improvements. As an F5 Gold Partner, we’ll install, optimise and manage F5 Guardrails to ensure you get the best from your investment.

Frequently asked questions

What is AI agent security?

AI agent security is the practice of protecting AI systems that can reason, access data, use tools and take actions. It includes controls for prompt injection, data leakage, excessive permissions, tool misuse, monitoring and governance.

What are AI Guardrails?

AI Guardrails are runtime controls that help enforce secure and compliant AI behaviour. They can inspect prompts and responses, detect malicious inputs, reduce sensitive data exposure, and apply policy controls across AI interactions.

What is AI Red Teaming?

AI Red Teaming is adversarial testing for AI systems. It helps identify how AI models, applications or agents could be manipulated, misused or pushed outside expected behaviour.

Why do AI agents need both guardrails and red teaming?

AI agents need guardrails to enforce controls at runtime and red teaming to test whether those controls can be bypassed. Together, they help organisations reduce AI risk before and after deployment.

How does F5 AI Guardrails help?

F5 AI Guardrails helps protect AI applications, models and agents against threats such as prompt injection, jailbreaks, data leakage and policy violations. It provides runtime controls that support safer AI adoption across different models and environments.

AI Agent Security: Why Guardrails and Red Teaming Matter

Why AI agents change the threat model

The core risks in AI agent security

Prompt injection

Jailbreaks

Excessive agency

Sensitive data exposure

Tool misuse

What are AI Guardrails?

What is AI Red Teaming?

Why guardrails and red teaming need to work together

What good AI agent security architecture looks like

What does good AI agent security architecture look like?

Give agents their own identities and permissions

Control tool and API access

Inspect prompts and responses at runtime

Protect sensitive data

Add human approval for high-risk actions

Capture telemetry and audit trails

How FullProxy can help

Frequently asked questions

What is AI agent security?

What are AI Guardrails?

What is AI Red Teaming?

Why do AI agents need both guardrails and red teaming?

How does F5 AI Guardrails help?

About the Author

Why work with an F5 Gold Partner?

Will Upgrading to F5 rSeries Improve my Quantum Resilience?

Crypto-Agility explained; why you need it before 2028

Want to be in the know?

Navigate

Services

Connect