Skip to main content

New: Announcing our Series A funding

Blog

Why “Universal Guardrails” for AI Agents Sound Appealing But Rarely Work in Practice

Why guardrails that work inside a single system fall short in real agent workflows, and how risk emerges across systems as agents move, connect, and carry context.

Across security and engineering teams, the same question is being asked with increasing urgency: how do we put guardrails around AI agents?

The concern is well placed. In most security architectures, guardrails are a foundational control. They define boundaries, enforce policy, and provide a reliable way to constrain how systems behave. If a workflow passes through a known control point, guardrails can inspect activity and prevent actions that fall outside acceptable limits.

It is natural to apply that model to agentic systems.

Organizations want a way to ensure that agents operate within clear boundaries, that tool usage is controlled, and that behavior remains aligned with policy as these systems move into production.

The difficulty is that agents do not operate within a single control plane. Their behavior extends across tools, systems, and environments, which changes where and how guardrails need to apply. The question is no longer simply how to define guardrails, but how to ensure they reflect how agents actually behave in practice.

A common view is that stronger guardrails at the model or orchestration layer will address most of the risk in agentic systems. That holds within contained environments. Guardrails can constrain tool usage, enforce policy, and reduce obvious failure modes.

The limitation is where they are applied.

This model assumes agent behavior is fully expressed within a single system. In practice, agents operate across multiple systems, combining context, calling external tools, and carrying decisions forward across steps. The behavior that matters often emerges outside the point where those guardrails are defined. The issue is not whether guardrails work. It is whether they are positioned where behavior actually occurs.

Guardrails in Theory and in Practice

Architectures for AI agents are often drawn in a reassuringly simple way.

A central orchestration layer sits in the middle. Prompts enter from one side. Tools sit on the other. Guardrails surround the system to ensure the agent behaves safely.

The model is logical. If every decision passes through a single orchestrator, guardrails can inspect prompts, evaluate tool calls, and enforce policies before actions occur.

User Prompt

Agent Reasoning

Orchestrator

Guardrails

Tool Call

Response

Within a contained system, this approach works well.

For example, a coding agent reviewing pull requests inside a development platform may be restricted to reading repository code, running predefined tests, and generating suggested changes. Guardrails in the orchestration layer can ensure the agent only calls approved tools, cannot modify protected branches, and cannot access external services. Because the workflow remains inside the development environment, those controls can reliably govern behavior.

The gap is not in how these guardrails are implemented. It is in what they are able to see.

They govern a single execution path, while real agent behavior unfolds across many.

Architecture and Reality

Most enterprise agent systems do not remain inside a single platform.

A workflow that begins in a development environment may extend into SaaS applications, cloud infrastructure, APIs, and external services. Each system enforces its own controls, but the workflow itself spans several environments.

Agents are often described as platform-bound. In practice, they are anything but.

This transition rarely happens all at once. An agent starts with a narrow role and a small set of tools. As it proves useful, capabilities are added. New integrations are introduced. More context is made available to improve performance.

Each step is reasonable. Each change improves utility.

Over time, the agent’s effective scope expands beyond the system where it was originally defined. Context retrieved in one environment influences behavior in another. Actions taken in one system trigger consequences in the next.

From the perspective of any individual platform, the behavior appears valid. Across the full workflow, the path becomes harder to reason about.

User / Trigger

Email · PR · Ticket · Alert

Agent (Origin System)

IDE · Copilot · Automation Platform

no chain-level view

System 1

Dev Environment / Repo
reads code, config

Guardrail

local scope only

partial context

System 2

Cloud / CI Pipeline
runs tests, builds

Guardrail

local scope only

no shared state

System 3

SaaS / Copilot / CRM
writes, updates, communicates

Guardrail

local scope only

context gap

System 4

External API / Service
debugging, summarization, enrichment

Guardrail

local scope only

Action / Output

Delivered in a different system

There is no single point of failure. The system evolves from contained to distributed without a clear moment where governance is reconsidered.

Guardrails Do Not Travel

Guardrails are defined within systems. They are not designed to operate across them. Agent workflows, however, routinely cross those boundaries.

Different frameworks implement them in different ways. Some rely on prompt constraints. Others inspect tool calls. Some operate as middleware, while others exist primarily during development.

In a coding environment, an agent running through tools like Claude Code typically relies on prompt-level constraints, repository scoping, and local configuration. Enforcement depends heavily on how the agent interprets instructions in that context.

In a cloud or SaaS environment, such as a Copilot built in Microsoft Copilot Studio, guardrails are enforced through identity, connectors, and predefined action scopes tied to services like Microsoft Graph or internal APIs. Control is strongest within that ecosystem.

Both approaches are effective within their respective boundaries.

The challenge appears when workflows span across them.

An agent may generate output in a development environment that is passed into a SaaS copilot. A cloud-based agent may call external APIs or trigger developer workflows. Each system enforces its own controls, but those controls do not extend across the full sequence of actions.

Even when guardrails are correctly implemented in each system, they remain blind to how decisions connect across them.

The result is fragmented governance. What is constrained in one environment may be unconstrained in the next, even though the workflow is continuous.

Where This Appears in Practice

The gap becomes clearer when looking at how agents operate in real environments.

In development workflows, a coding agent may retrieve code from a repository, run validation checks, trigger a build pipeline, and call an external debugging service. Each action is authorised, yet the workflow extends beyond the environment where guardrails were originally defined. Context from the repository may appear in a debugging request sent to an external service. The request is legitimate. The data flow is not always intended.

In business workflows, an agent handling a customer request may retrieve account data, reference internal documentation, and call an external service to summarize or transform the response. Each step is permitted. If internal context is carried into that external call, sensitive information may leave the organization without any single control being violated.

These outcomes do not come from a single incorrect action. They emerge from how agents combine context, tools, and decisions across systems.

Where Guardrails Stop Providing Full Coverage

Guardrails remain effective within the environments where they are defined.

They can validate prompts, constrain tool usage, and enforce clear boundaries inside a given system. That remains necessary. The limitation appears when workflows extend beyond those boundaries.

Enterprise agents routinely interact with external APIs, cloud services, and specialized tools. Once that happens, no single control point governs the entire sequence of actions.

The guardrails continue to function locally. They simply do not capture how the workflow unfolds across systems. From a security perspective, the system appears controlled while behavior remains only partially understood.

The question becomes less about whether an action was allowed, and more about how a series of allowed actions produced an outcome.

Governance Must Follow the Workflow

As agents move into real enterprise use, governance needs to reflect how they actually operate.

Agents act across development environments, SaaS platforms, cloud infrastructure, and external services. Each environment enforces its own controls, but the behavior that matters emerges across them.

Security teams need to understand how decisions unfold step by step, which tools are invoked, how context moves between systems, and where authority is exercised.

Without that, organizations are governing architecture diagrams rather than operational behavior.

The Next Phase of Agentic Governance

Early approaches to agent governance focused on model safety and prompt control. That reflected how these systems were first introduced: contained environments with clear boundaries.

That context no longer holds.

Agents operate across tools, systems, and services, carrying context forward and making decisions over time. The behavior that matters emerges across those interactions, not within any single component. The boundary of control has moved. It no longer sits inside a single system, and it cannot be enforced from a single point.

Improving guardrails within individual platforms remains important. It strengthens local control and reduces obvious failure modes. It does not provide a complete view of how agents operate once workflows extend beyond those boundaries.

Governance needs to follow how agents actually behave: across systems, across tools, and across time. That requires visibility into decision paths, context usage, and how outcomes are produced in practice.

This is where a different class of control becomes necessary. One that can observe and interpret behavior across environments, where those decisions actually occur.

Read our Guardrails Explainer

Keep reading