AI agents are quickly moving from experimentation to real business use. They are being explored for service desk automation, CRM updates, security investigations, workflow orchestration, knowledge retrieval and operational support.

Until recently, many enterprise AI use cases were relatively contained. Machine learning models classified data, identified patterns, supported recommendations or helped automate specific tasks within defined workflows.
AI agents extend that capability. They can interpret a goal, plan a task, call tools, query data, interact with applications and trigger actions across connected systems. In practical terms, AI is moving from analysis and assistance into orchestration and execution. That shift changes the security model.
Many organisations are already piloting generative AI tools, building internal assistants, experimenting with retrieval-augmented generation or exploring agents that connect to CRM, service desk, security and productivity platforms.
Some may already have acceptable use policies, model approval processes or basic data protection rules in place. Others may be relying on the built-in controls of individual AI platforms. But as AI becomes more connected to internal systems and business workflows, these measures are unlikely to be enough on their own.
In other words, we need both AI Guardrails and AI Red Teaming.
Agents combine model reasoning with access to external systems, so the risk is not limited to what the AI says. It includes what the AI can do.
Why AI agents change the threat model
Traditional application security is built around relatively clear boundaries between users, applications, commands and data. AI agents blur those boundaries.
An agent may receive a natural language instruction from a user, retrieve context from an internal document, process untrusted content from an email or web page, decide which tool to call and then perform an action in another system.
At each stage, the agent may be influenced by data that was never intended to be an instruction.
This creates a different kind of threat model. It’s important to understand:
- Where instructions can enter the system
- Which data sources are trusted or untrusted
- How prompts are assembled before they reach the model
- Whether retrieved content can influence tool use
- Which tools and APIs the agent can access
- Whether the agent has excessive permissions
- How responses are inspected before reaching the user
- How actions are logged, approved or reversed
This is why prompt injection, sensitive data exposure and excessive agency are such important risks for agentic AI. Agents combine model reasoning with access to external systems, so the risk is not limited to what the AI says. It includes what the AI can do.
The core risks in AI agent security
Prompt injection
Prompt injection occurs when malicious or untrusted instructions manipulate the behaviour of an AI system.
For AI agents, prompt injection can be direct or indirect. A user might enter a malicious prompt directly, or the agent might retrieve a document, email, ticket or web page containing hidden instructions such as “ignore previous instructions” or “send this data externally”.
Jailbreaks
Jailbreaks attempt to bypass system prompts, safety rules or policy controls. In agentic workflows, jailbreaks may be used to make the agent reveal restricted information, ignore business rules or invoke tools in unintended ways.
Excessive agency
Excessive agency occurs when an AI system has more autonomy, access or privilege than it needs.
For agents, this might mean broad API access, weak separation between read and write permissions, insufficient approval gates or the ability to execute high-impact workflows without human review.
The principle of least privilege applies directly to AI agents. Agents should only have the minimum permissions required to complete a defined task.
Sensitive data exposure
AI agents often work with customer data, internal documents, intellectual property, regulated records or operational information. Sensitive data can be exposed through prompts, responses, logs, retrieved context or tool outputs.
This means data protection must cover the full AI interaction, not just the application boundary.
Tool misuse
Tools make agents useful, but they also increase risk.
An agent connected to email, CRM, ticketing, finance, code repositories or security platforms can create real-world impact if manipulated. Tool calls should therefore be treated as privileged operations, with policy controls, logging and clear limits on what the agent is allowed to do.
What are AI Guardrails?
AI Guardrails are runtime controls that help enforce safe, secure and compliant AI behaviour.
For AI agents, guardrails can inspect prompts, responses, data flows and, depending on the architecture, tool interactions. They help detect prompt injection, reduce the risk of sensitive data leakage, restrict unsafe outputs and enforce policy before a response or action is completed.
A guardrails layer can help answer practical security questions:
- Is this prompt trying to override system instructions?
- Does this response contain sensitive or regulated data?
- Is the agent attempting an action outside its approved workflow?
- Does this output violate policy?
- Should this action require human approval?
- Can this interaction be logged for audit and investigation?
F5 AI Guardrails provides runtime protection for AI applications, models and agents. It is designed to help you defend against AI-specific threats such as prompt injection, jailbreaks, data leakage and policy violations, while supporting AI security across different models and environments.
Most will not use one AI model in one place. They will use a mix of public foundation models, enterprise AI services, open-source models and internal AI applications.
What is AI Red Teaming?
AI Red Teaming is adversarial testing for AI systems. It involves actively probing models, applications and agents to identify how they could be manipulated, misused or pushed outside expected behaviour.
For AI agents, red teaming should test the full workflow, including prompts, retrieval, context handling, permissions, tool calls, output controls and logging.
A practical AI Red Teaming exercise may test whether an agent can be manipulated into:
- Ignoring its system prompt
- Revealing sensitive information
- Following malicious instructions hidden in a file or email
- Calling unauthorised tools or APIs
- Escalating from read-only actions to write actions
- Bypassing approval workflows
- Producing harmful or non-compliant outputs
- Combining low-risk actions into a high-risk outcome
- Failing to log important security events
The output of AI Red Teaming should not just be a list of prompts that worked, but should feed directly into security improvements, such as stronger guardrails, reduced permissions, improved monitoring, better input validation and clearer approval workflows.
Why guardrails and red teaming need to work together
AI Guardrails and AI Red Teaming are complementary.
Guardrails provide the control layer. Red teaming tests whether that control layer can be bypassed.
This is especially important because AI systems are probabilistic, context-sensitive and influenced by language. A control that works against one prompt may not work against a more subtle variation. A safe workflow may become unsafe when a new tool, data source or permission is added.
For AI agents, a mature security lifecycle should include:
- Threat modelling before deployment
- Clear definitions of what the agent can access and do
- Least-privilege permissions for tools and data
- Runtime guardrails across prompts, responses and data flows
- Red-team testing against realistic attack and misuse scenarios
- Telemetry review, alert tuning and incident investigation
- Policy refinement based on findings
- Retesting when models, prompts, tools or workflows change
This is the same principle security teams already apply to other areas of cyber security: design controls, test them, improve them and monitor continuously.
What good AI agent security architecture looks like
What does good AI agent security architecture look like?
A secure AI agent architecture should use multiple layers of protection across identity, data, prompts, tools, actions and monitoring. The aim is to control what the agent can access, how it behaves at runtime and how security teams can investigate or improve the system over time.
Give agents their own identities and permissions
AI agents should have their own identities, scoped permissions and clear access boundaries. Organisations should avoid allowing agents to inherit broad user privileges without additional policy checks.
Where possible, agents should follow the principle of least privilege. They should only be able to access the systems, data and actions required for their specific use case.
Control tool and API access
Every API, plugin or connector available to an AI agent should be treated as part of the attack surface. Tool access should be explicitly approved, documented and monitored.
Read-only and write actions should be separated wherever possible. Higher-risk tool calls should require additional validation or approval before they are completed.
Inspect prompts and responses at runtime
Prompts and responses should be inspected for prompt injection, jailbreak attempts, sensitive data exposure and policy violations.
This is where runtime AI Guardrails are particularly important. They help apply controls during live AI interactions, rather than relying only on design-time reviews or user policies.
Protect sensitive data
AI agents should not be allowed to freely process every type of data. Sensitive, regulated or proprietary information should be classified and protected with appropriate data loss prevention controls.
This includes data used in prompts, responses, retrieved context, logs and tool outputs.
Add human approval for high-risk actions
Not every AI agent action should be fully automated. High-risk actions should require human approval, especially where the agent could send external communications, change customer records, execute code, approve transactions or make security changes.
Human-in-the-loop controls help reduce the risk of an agent taking an unintended or unauthorised action.
Capture telemetry and audit trails
Security teams need visibility into prompts, responses, tool calls, policy decisions and user activity. Without telemetry, it is difficult to investigate incidents, prove compliance or improve guardrail policies over time.
A secure AI agent architecture should therefore include logging, monitoring and auditability from the start.
[AI agents are] the same principle security teams already apply to other areas of cyber security: design controls, test them, improve them and monitor continuously.
How FullProxy can help
FullProxy helps you understand where AI Guardrails and AI Red Teaming fit into your wider cyber security strategy. Our consultants can help assess AI agent workflows, identify high-risk use cases, review data flows, define guardrail policies and support secure deployment across cloud, hybrid or on-premises environments..
We can also help prepare for AI Red Teaming by identifying likely attack paths, reviewing tool permissions, assessing sensitive data exposure and turning red-team findings into practical security improvements. As an F5 Gold Partner, we’ll install, optimise and manage F5 Guardrails to ensure you get the best from your investment.
Frequently asked questions
What is AI agent security?
AI agent security is the practice of protecting AI systems that can reason, access data, use tools and take actions. It includes controls for prompt injection, data leakage, excessive permissions, tool misuse, monitoring and governance.
What are AI Guardrails?
AI Guardrails are runtime controls that help enforce secure and compliant AI behaviour. They can inspect prompts and responses, detect malicious inputs, reduce sensitive data exposure, and apply policy controls across AI interactions.
What is AI Red Teaming?
AI Red Teaming is adversarial testing for AI systems. It helps identify how AI models, applications or agents could be manipulated, misused or pushed outside expected behaviour.
Why do AI agents need both guardrails and red teaming?
AI agents need guardrails to enforce controls at runtime and red teaming to test whether those controls can be bypassed. Together, they help organisations reduce AI risk before and after deployment.
How does F5 AI Guardrails help?
F5 AI Guardrails helps protect AI applications, models and agents against threats such as prompt injection, jailbreaks, data leakage and policy violations. It provides runtime controls that support safer AI adoption across different models and environments.