AI agents are starting to operate like powerful employees — making decisions, accessing tools, handling data — except they don't need sleep and they don't follow the org chart. That's created a gap in how companies think about security. Most organizations built safeguards around individual prompts, assuming that would be enough. It wasn't. The first documented AI-orchestrated espionage campaign exposed that approach as insufficient. Now security leaders are asking a harder question: How do you govern something that acts autonomously.
The answer emerging from recent guidance is straightforward in principle, difficult in practice: treat agents the way you'd treat a powerful, semi-autonomous employee. Give them narrow jobs. Lock down their access. Verify everything they touch.
Narrow the scope before you deploy
Start with identity. An agent should run as a specific user, in a specific part of your organization, with permissions tied to that user's actual role. No shortcuts that let an agent act "on behalf of" someone else across departments or tenants. If an agent needs to do something high-impact — transfer funds, delete records, grant access — require explicit human approval before it happens. This isn't friction for friction's sake. It's the difference between an agent that can cause damage and one that can't.
We're a new kind of news feed.
Regular news is designed to drain you. We're a non-profit built to restore you. Every story we publish is scored for impact, progress, and hope.
Start Your News DetoxThen constrain the tools it can reach. Treat your agent's toolchain like a supply chain: pin specific versions of external tools, require approval before new tools are added, and forbid the agent from automatically chaining tools together unless your policy explicitly allows it. Each tool should be bound to specific tasks and credentials, rotated regularly, and auditable. The agent doesn't get a master key. It gets narrowly scoped access for each job it's supposed to do.
Assume external data is hostile
AI agents often pull information from external sources — databases, documents, web content — to inform their decisions. Treat all of it as potentially compromised until you've verified otherwise. Gate what enters the agent's memory or retrieval systems. Review new sources before they're added. If untrusted context is present, disable persistent memory. Tag every piece of data with its source so you can trace decisions back to where the information came from.
When the agent produces output, don't let it execute automatically. Put a validator in between the agent and the real world. If the output involves sensitive data, mask or tokenize it until the moment an authorized person actually needs to see it — then log that reveal. Data privacy isn't something you bolt on at the end. It's baked into how the agent operates.
Instrument everything, then prove it
Don't deploy an agent and assume your one-time security test covered all the risks. Build continuous evaluation into the system from the start. Instrument agents with deep observability so you can see what they're doing. Run regular red-team exercises with adversarial test suites. Back it all up with robust logging.
Maintain a living inventory of every agent in your organization — what it does, what tools it can access, what data it touches, who approved it. Record every approval decision, every access to sensitive data, every high-impact action. When the board asks "Can you prove this is secure," you hand them evidence, not assurances.
The shift here is subtle but consequential. Security teams are moving from "How do we control what the model says" to "How do we control what the agent does." The first approach failed because it focused on the wrong boundary. The second works because it treats agents like what they actually are: powerful systems that need the same governance framework as any other high-privilege user in your organization.










