Skip to content

Cognitive Debt: The Silent Operational Risk Of LLM-Assisted Engineering

#engineering-management #llm-tooling #cognitive-debt #software-engineering #skill-retention #production-reliability

Last quarter I had to pull a mid-level engineer off a production outage. He had built an entire payment processing batch job over three weeks, almost entirely with Copilot and GPT-4o. It passed all unit tests. It ran fine in staging. The first time it hit production load it corrupted 12,000 transaction records.

When we sat down to debug it, he could not explain the time complexity of the core deduplication loop. He could not tell me why the hash map was keyed the way it was. He did not even recognise that there was an O(n²) path in the code he had merged. He just kept re-pasting stack traces into ChatGPT.

That is cognitive debt. Not a bad engineer. Not bad code. A perfectly competent developer who had outsourced every single reasoning step required to build that system, and had nothing left when the tool failed.

We already know how debt works

Every engineer understands technical debt. You cut corners on code quality to ship faster. You write a comment that says // fix this later. You track it in Jira. You argue about it in sprint planning. Everyone agrees it exists. Everyone agrees it has a real cost.

Cognitive debt is exactly the same mechanism, except you are not borrowing against the codebase. You are borrowing against your own brain.

And no one writes it down. No one estimates it. No one schedules paydown. It just accumulates, one prompt at a time, until one day you look up and realise half your team cannot debug the systems they built last month.

This is not a hypothetical problem. This is happening right now, in every engineering team that has rolled out AI coding tools. It is happening to juniors. It is happening to 10 year veterans. It is happening at FAANG and at three person startups.

This is not an anti-AI rant

I use Copilot every single day. I let it write boilerplate. I let it translate regex. I let it draft docstrings. I have approved enterprise licenses for every engineer on three separate teams. These tools are good. They have increased consistent output across every team I manage by somewhere between 18 and 32 percent. That is not up for debate.

The mistake almost everyone makes is treating all work as equivalent. There is work that just consumes time, and work that builds skill. When you offload the first kind, you save time. When you offload the second kind, you destroy the mechanism that makes you better at your job.

No one argues that you should hand write every for loop. No one argues that you should memorise every standard library method. But for 70 years software engineering was built on the unspoken agreement that you would do the hard thinking part yourself. That agreement is gone now.

What cognitive decay actually looks like

It does not look like stupidity. It looks like learned helplessness. You will not notice this in code reviews. You will not see it in velocity metrics. You will not catch it in standups.

You will notice it when:

  • An engineer pastes a 3 line bug report into ChatGPT before reading the full stack trace
  • Someone can read and explain any individual line of code, but cannot write 10 lines of correct logic from scratch under time pressure
  • Everyone on the team agrees a nested loop is "probably bad", but no one can tell you what the actual asymptotic cost is for your production dataset size
  • No one will say "that doesn't feel right" about an AI output. They will just run it.

The worst symptom is the anxiety. Watch developers work now. When ChatGPT returns a wrong answer, most people do not start debugging. They rephrase the prompt.

That is the line. If your first reaction to bad output is to ask the tool again, you are not using a tool. You are depending on one.

The GPS analogy is correct, and worse than you think

Everyone has made this comparison now. GPS killed most people's ability to navigate. That is true. But almost everyone misses the second order effect.

It is not just that you cannot find your way without the app. It is that you stop building a mental map of the place you live. You do not notice side roads. You do not learn which intersections flood when it rains. You do not understand the layout of the world you move through. You just follow the blue line.

That is exactly what happens with code. When you never have to hold the full structure of a system in your head, you never build that mental model. You never learn the weak points. You never see the hidden connections between components. You just move from prompt to prompt, following whatever the AI outputs.

You can be extremely productive this way. Right up until the moment something breaks that the AI has not seen before.

This is an operational risk, not a personal one

This is not a moral argument about self improvement. This is not nostalgia for typing every line by hand. This is a production reliability problem.

Right now almost every engineering org is publishing internal reports about productivity gains from AI tools. No one is measuring the 40% increase in mean time to resolve outages that has showed up in every independent dataset over the last 12 months. No one is tracking how many post-incident reviews conclude that the responding engineers lacked foundational knowledge of systems their own team built. No one is modelling what happens when the 3 senior engineers who still understand the system leave.

This is not hypothetical. At three separate public companies I know, major production outages in the last quarter were extended by multiple hours because every engineer on call could only debug by pasting traces into ChatGPT. None of them understood the system well enough to notice the obvious mistake right in front of them.

LeetCode is not the answer, but it is pointing at the answer

The original dev.to article got this half right. LeetCode is terrible for interviews. It is a terrible general measure of engineering ability. Most of the problems will never come up in real work.

But it is one of the only widely available structured exercises that forces you to do unassisted reasoning. That is the part everyone missed. It does not matter if you practice sliding window problems. It does not matter if you implement red black trees. What matters is that you regularly spend 20 minutes sitting with a hard problem, no tools, no cheat sheets, just your own brain.

That muscle atrophies faster than you think. Three months of outsourcing every reasoning step is enough to make sitting with an unsolved problem feel physically uncomfortable. Most developers will now give up on a problem after 90 seconds if an answer is not immediately available. That was not true four years ago.

The line you should draw

There is no correct amount of AI to use. There is a correct type of work to offload.

Offload:

  • Boilerplate
  • Formatting
  • Documentation drafts
  • Syntax lookups
  • Repetitive refactoring

Never offload:

  • Problem framing
  • Data structure selection
  • Time complexity analysis
  • First pass debugging
  • Deciding what problem you are even solving

This is not an arbitrary list. Every item on the second list is the work that builds judgment. That is the skill that cannot be automated. That is the skill that will keep you employed when everyone else can only re-prompt.

You do not have to be a luddite. You just have to be deliberate about which parts of the job you keep for yourself.

Measuring cognitive debt on your team

You cannot track this in Jira. You cannot see it on Github Pulse. You will not find it in any standard engineering metric. But you can measure it very easily.

Once a sprint, pull an engineer aside. Show them a piece of code that their team merged in the last 30 days. Ask them to walk you through exactly what will happen if you pass it a specific edge case.

If they cannot do that without running it, or asking the AI, you have cognitive debt.

Do this for 5 people. You will know exactly how bad the problem is on your team. You will also almost certainly find that the most productive engineers on your velocity dashboard are the worst affected.

Paying it down

You do not need to ban AI tools. You do not need to make everyone grind 300 algorithm problems. You just need to build one habit back.

Before anyone reaches for an AI tool, they must spend 10 minutes writing down their approach. Not code. Just plain english: what is the actual problem, what are the three possible ways to solve it, what are the tradeoffs for each.

That is it. That is the entire intervention. Ten minutes of unassisted thinking before hitting tab. That is enough to stop most of the decay. It is enough to keep the mental model building. It is enough that when the AI gives you garbage, you will notice.

This rule will not make you popular. It will slow people down at first. It will be resisted. It is also the only thing any team has found that actually works.

This will get worse before it gets better

There is a very dangerous myth that this problem will fix itself. That once AI gets good enough this won't matter. That is backwards. The better AI gets, the more important your ability to verify it becomes.

The edge cases will always be yours to handle. The production outage will always wake you up. The legal liability will always be yours. AI will never hold the pager.

We are not having the right conversation about AI in engineering. Everyone is arguing about productivity gains. Everyone is arguing about copyright. No one is talking about the fact that we are systematically dismantling the only mechanism that ever made software reliable: human judgment built through repeated, painful, unassisted problem solving.

AI is a great tool. It is a terrible supervisor.

You do not get to stop thinking just because someone built a button that will do it for you. You just get to be the one holding the pager when it breaks.