Why AI cannot replace operational judgement in product teams

A strategic and impact-driven Chief Product Officer with over 7…

At some point, every founder discovers that their product works perfectly in the system and imperfectly in reality. That gap is not a bug. It’s a judgement problem.

Founders constantly operate in environments where information is incomplete, signals conflict with one another, and reality evolves faster than reporting systems can capture. At this point, operational judgement becomes essential.

For me, operational judgement is not simply experience or intuition. It is the ability to understand how a system actually behaves rather than how it is designed to behave – the ability to make sound decisions when the system does not tell the full story.

Most teams today are surrounded by more data than ever. The available data appears to be clear. The challenge is that the clarity is often misleading.

Having worked on operational platforms used across franchise and retail networks with more than 1,000 locations, I have repeatedly seen organisations where compliance scores were high, and reports suggested healthy execution. At first glance, there appeared to be no problem. But digging deeper revealed that different locations interpreted the same operational standards differently. Reporting suggested that processes were being followed consistently, while actual execution varied significantly between sites.

An AI model analysing system data would likely conclude that compliance levels were high and operational standards were being executed correctly. Yet field observations revealed a different reality: different locations produced almost identical operational metrics while operating in fundamentally different ways. One location followed standards as intended. Another developed informal workarounds that generated similar reporting outcomes but created long-term operational risk. Recognising that gap proved critical. Once these execution differences became visible, we were able to redesign the operating model, reducing risk while also achieving 2x faster audits and improving issue-resolution rates from 80% to 95%.

I observed similar situations in workforce optimisation projects. Workforce planning models can appear highly efficient on paper because staffing calculations are based on historical workloads, timing data, and operational assumptions. However, local operational realities often evolve faster than the model itself. Teams’ informal practices and store-specific conditions influence productivity in ways that are difficult to capture through data alone. As a result, the model may appear efficient while the reality it represents is gradually changing.

AI is good at assessing dashboards, but understanding operational reality remains beyond its capabilities. This is why operational judgement matters. It helps leaders distinguish between what is measurable and what is meaningful.

What AI doesn’t capture

One of the most important lessons I have learned in product management is that many of the signals that determine success or failure never appear in dashboards, reports, or datasets.

Here is the thing: AI is extremely effective at identifying patterns in recorded information. But some of the most influential factors shaping product adoption and operational outcomes are either weakly represented in data or completely invisible to the system.

When organisations introduce new processes or technology, employees rarely follow the intended workflow exactly as designed. They create shortcuts, informal communication channels, and workarounds. I saw this repeatedly in large retail and franchise environments. One team would genuinely follow operational standards. Another would achieve similar reporting outcomes through local workarounds developed over time. Two locations can have nearly identical operational metrics while operating in completely different ways. The dashboard cannot distinguish between them. In one case, making that distinction visible – and redesigning workflows around it – reduced reporting preparation from 60 minutes to 6 and brought task completion rates close to full.

Another factor that AI struggles to understand is organisational incentives. People don’t optimise for the objective written in the policy document. They optimise for what they believe they’re actually being measured on. Reporting quality can appear to improve while real performance stays flat – because teams have learned how to satisfy the measurement system.

Local management behaviour is another signal that remains largely invisible to AI. Managers can communicate expectations, interpret priorities, and respond to operational pressure in their own way. Those differences often directly impact execution quality, even when locations follow the same formal processes and report similar metrics.

I observed this while working with a large franchise network, where sites operating under the same standards consistently delivered vastly different outcomes. A closer look revealed that the variation was driven less by the process itself and more by how local managers interpreted and enforced it. Once the organisation reduced that variability through a more consistent operational framework, standards execution increased from 76% to 89%. That story showed that leadership behaviours driving these outcomes can be highly influential, yet it’s almost impossible to capture in structured data.

Many important problems begin long before they become visible in reporting. Declining trust, growing resistance, hidden operational friction – these things appear first as subtle signals rather than measurable trends. Experienced product leaders learn to catch them early, before they show up in reports. Recognising these weak signals proves to be far more valuable than waiting for performance metrics to deteriorate months later.

AI, ignoring these factors, would likely produce output, creating a false sense of confidence. It can identify patterns, recommend actions, and highlight correlations. What it cannot reliably determine is whether the underlying assumptions behind those recommendations reflect operational reality. This is why human judgement remains non-negotiable.

The real competitive advantage for startups isn’t AI

Within a few years, AI itself may no longer be a meaningful competitive advantage. Every product team will have access to similar models. Every founder will be able to generate ideas faster, analyse customer feedback faster, and build prototypes faster.

Today, many discussions about AI focus on productivity. That makes sense because productivity gains are easy to observe. Teams can complete tasks in less time and process more information than ever before. However, we are slowly approaching a point where productivity becomes abundant.

When everyone can move faster, speed stops being a differentiator. The real divide won’t be between organisations that use AI and organisations that don’t. It will be between teams that use AI to sharpen their thinking and teams that delegate judgement to it. The first group gets stronger, the second gets faster but not necessarily wiser.

The future of product leadership will not be defined by who can generate the most insights. It will be defined by who can correctly interpret which insights matter and which assumptions are wrong. That is the essence of operational judgement, and it is one of the reasons why I believe it will remain fundamentally human for a long time.

For more startup news, check out the other articles on the website, and subscribe to the magazine for free. Listen to The Cereal Entrepreneur podcast for more interviews with entrepreneurs and big-hitters in the startup ecosystem.