Agentic AI is a startup advantage, until it isn’t

Nik Kairinos is Founder, CEO and Chief AI Architect of…

For startups, AI agents promise speed, scale, and efficiency. But if founders treat AI safety as an afterthought, the very systems designed to accelerate growth can quickly create legal, operational, and reputational risks.

Reward and risk

AI agents can support customer service, automate back-office tasks, accelerate product development, qualify leads, and manage internal workflows around the clock. For founders looking to scale, this is a powerful advantage. But it does not come without risks.

As we move to a world of agentic AI, the stakes for accurate deployment have been raised. Concerns are no longer confined to whether a model might occasionally hallucinate, whether the output will be inaccurate, or whether a chatbot may sound a bit off-brand. Once AI systems are given more autonomy, such as the ability to act, decide, trigger processes, interact with tools or influence workflows, then the risk profile changes fundamentally. As McKinsey states, agentic AI introduces “novel and complex risks and vulnerabilities” precisely because these systems move beyond generating content and begin participating in business processes more directly.

A helpful illustration came in the Air Canada chatbot case. The British Columbia Civil Resolution Tribunal found Air Canada liable after its chatbot gave a customer incorrect information about bereavement fares. Air Canada argued the chatbot was effectively responsible for its own statements, but the tribunal rejected that argument and held the company accountable for the misinformation on its website.

The significance of this case goes beyond customer service. It shows that when AI produces misleading information in a live environment, the business cannot distance itself from the result, and the company remains responsible.

For startups, that should be a wake-up call. Any company, but particularly a young company in the process of building its credibility, may only need one public AI failure to damage its reputation with customers, partners or investors. And, unlike large enterprises, startups rarely have the balance sheet, the legal backing or the brand resilience to absorb mistakes.

Rethinking software

Historically, software was explicitly programmed for a specific function. If it failed, the fault could often be traced back to a source. AI agents are different. They can pursue goals in unexpected ways, interpret instructions imperfectly, and behave differently depending on changing context, data and environment. They may still appear to be functioning normally while producing outputs or actions that are different from what the business actually intended.

The recent ROME AI Agent incident is a striking example. A study published on arXiv described how the agentic AI model was detected performing actions resembling crypto mining operations and the creation of a reverse SSH tunnel, which is commonly used to establish remote access to servers, despite not having been instructed to carry either of these out. Researchers determined the behaviour came as a result of it being allowed to interact freely with tools and system resources in order to learn how to solve tasks. In other words, it determined and executed its own methods of reaching the prescribed outcomes.

Anthropic’s recent research into agent autonomy also makes this point clearly. In its work on measuring AI agent autonomy in practice, Anthropic defines agents as AI systems equipped with tools that allow them to take actions, such as running code, calling external APIs and sending messages. That definition matters because it captures the key issue: once models can act in the world, their behaviour must be understood not just through outputs, but through the decisions and actions they take.

As the way that software is deployed shifts, so too must the monitoring of it.

Continuous monitoring is key

Organisations may test AI systems internally, run sandbox demonstrations, and convince themselves their model is safe because nothing obviously goes wrong in development. But real-world deployment is where the risk begins. Live environments introduce unexpected behaviour, contaminated data, conflicting prompts, shifting objectives and edge cases no testing process can fully predict. Drift, hallucinations, unsafe use and feedback loops can all emerge over time. Even an AI agent that has been robustly tested pre-deployment can behave in unplanned ways once it is live.

That is why continuous monitoring is so important.

If AI is to be trusted with more autonomy, businesses need visibility of how it is behaving in practice, not how it behaved in testing. This is where many organisations are currently exposed. They are deploying AI into customer-facing and operational settings without any meaningful way to monitor for deviations before those deviations create harm. In effect, they are relying on trust without verification. This is a dangerous gamble.

Startup culture rewards speed, experimentation and rapid iteration. Founders are constantly looking to move first, learn quickly and scale what works. But with agentic AI, scaling what seems to work without understanding potential failure can create a far more serious problem later. A model that quietly deviates from its intended function can expose a company to bias, bad decisions, compliance issues, security weaknesses and reputational damage long before anyone realises what is happening.

In the ROME agent example, the deviation occurred in a testing environment, and a warning of a security breach was triggered. But there are many examples of where AI has gone rogue and resulted in dire consequences, such as Uber’s self-driving car that killed a pedestrian after misclassifying them as an unknown object, and Zillow’s home-pricing algorithm which caused $304 million in losses by purchasing homes at inflated prices during pandemic market volatility. The AI, trained on stable historical data, could not adapt to rapid market changes, buying 27,000 homes that it could not profitably resell. The failure led to 2,000 layoffs and the complete shutdown of the iBuying division.

There is also a wider governance point here. Many still talk about AI governance in the context of slowing innovation down. ‘Big tech’s’ pushback on the EU AI Act is the obvious example here. In reality, the opposite is true. If you want to scale AI responsibly, safety is what makes scale sustainable. Without it, every increase in autonomy increases uncertainty.

For startups especially, AI safety should not be seen as a future concern but as a present-day growth requirement. As many make AI increasingly central to their business, the more important it is to know when it is deviating from expected behaviour.

This is the real challenge with agentic AI: not whether it can create value, but whether you can trust it once it is live.

For more startup news, check out the other articles on the website, and subscribe to the magazine for free. Listen to The Cereal Entrepreneur podcast for more interviews with entrepreneurs and big-hitters in the startup ecosystem.