Why AI Coding Tools Still Fail in Production
Granola MCP (Sponsor)
Take your meeting context to new places
If you’re already using Claude or ChatGPT for complex work, you know the drill: you feed it research docs, spreadsheets, project briefs... and then manually copy-paste meeting notes to give it the full picture.
What if your AI could just access your meeting context automatically?
Granola’s new Model Context Protocol (MCP) integration connects your meeting notes to your AI app of choice.
Ask Claude to review last week’s client meetings and update your CRM. Have ChatGPT extract tasks from multiple conversations and organize them in Linear. Turn meeting insights into automated workflows without missing a beat.
Perfect for engineers, PMs, and operators who want their AI to actually understand their work.
Use the code SCOOP
The biggest frustration with AI coding tools today isn’t that they can’t write code. It’s that you still can’t safely delegate to them. Modern AI can generate functions, APIs, and even entire services in seconds, but once that code enters a real production environment, the cracks start to show.
Hallucinated dependencies, subtle logic errors, broken edge cases, and inconsistent outputs force developers into a continuous loop of verification and cleanup. Instead of eliminating work, AI shifts it into a different phase—one that often erodes the efficiency it promises.
What’s becoming clear across the tech industry is that the limitation isn’t raw capability. It’s reliability. High-performing engineering teams are no longer treating AI as an autonomous developer. Instead, they are designing workflows that assume AI will make mistakes and building systems that catch those mistakes early. The result is not blind trust, but controlled productivity.
AI as a Fast but Unreliable Collaborator
Generating codes through AI predicts plausibility. It was not primarily designed to validate truth. it This is why developers often encounter code that looks correct, compiles cleanly, and still fails in real-world scenarios. Hallucinated APIs and non-existent libraries are not rare anomalies but expected behaviors when constraints are too loose.
To address this, engineering teams are reframing AI’s role. Instead of treating it as a senior engineer, they treat it as a fast but error-prone junior contributor. Every output is assumed to be a draft that must pass through structured validation before it can be trusted.
A typical workflow now looks like this:
[Prompt]
↓
[AI Generates Code]
↓
[Human Review Layer]
↓
[Tests + Linters]
↓
[Refactor / Fix]
↓
[Merge or Reject]This pipeline formalizes skepticism. Rather than relying on intuition, teams enforce guardrails directly in prompts and tooling. Schema-driven generation—such as anchoring outputs to OpenAPI specifications or strict interfaces—limits the model’s ability to invent structures.
Combined with explicit constraints like restricting dependencies to existing imports, this approach narrows the space where hallucinations can occur. The result is not perfect accuracy, but predictable behavior within defined boundaries.
Why AI Breaks on Complex Tasks
Another major friction point is context management. AI models struggle to maintain coherence across long or complex interactions. Instructions are forgotten, earlier decisions are ignored, and large codebases exceed context limits. This makes multi-step tasks particularly fragile, often forcing developers to restart or manually reintroduce context.
A response that would make sense is shifting away from conversational workflows and toward structured, scoped interactions. Instead of asking AI to handle entire systems, break work into smaller, clearly defined units—often limited to a single file or function. This keeps the model focused and reduces the risk of drift.
At the same time, context is being externalized. Persistent files like AI_CONTEXT.md are used to store architecture decisions, naming conventions, and constraints. Rather than relying on memory within a chat session, inject only the relevant context into each prompt.
This approach can be visualized as:
[AI_CONTEXT.md]
+
[Focused Prompt Scope]
+
[Small Code Chunk]
↓
[AI Task Execution]For more complex workflows, introduce explicit step-by-step plans and reset context between stages. Each step builds on verified outputs rather than relying on fragile conversational continuity. This transforms AI from a “memory-based assistant” into a deterministic tool operating within controlled inputs.
The Hidden Constraint on Scale
While accuracy and context get most of the attention, cost is an equally important factor shaping how AI is used in engineering workflows. Large prompts, repeated context injection, and verbose outputs can quickly increase token usage, turning even simple tasks into expensive operations.
To manage this, teams are adopting cost-aware prompting strategies. Instead of sending entire files or logs, they extract only the relevant portions and summarize the rest. They also constrain outputs to minimal formats, such as diffs or patches, rather than full rewrites.
The difference in efficiency is significant:
Full File Input
↓
High Token Usage
↓
Increased CostFocused Snippet
+
Clear Constraints
↓
Lower Tokens
↓
Efficient OutputAnother effective pattern is tiered model usage. Smaller, more cost-efficient models handle routine tasks like formatting and simple refactoring, while more advanced models are reserved for complex reasoning or architectural decisions. This layered approach ensures that resources are used where they provide the most value, rather than being wasted on low-impact operations.
Managing Overengineering and “AI Slop”
Even when AI produces correct code, it often introduces unnecessary complexity. Developers frequently encounter outputs filled with extra abstractions, redundant files, or verbose explanations that don’t align with the existing codebase. This phenomenon—often referred to as “AI slop”—creates additional cleanup work and reduces maintainability.
The root cause is not just the model, but the lack of constraints in how it is instructed. When prompts are vague, outputs tend to be overly elaborate. To counter this, teams are enforcing strict simplicity rules directly in their prompts. They explicitly require outputs to match existing styles, avoid introducing new abstractions, and return only executable code.
The impact of this shift is clear:
Vague Prompt
↓
Overengineered OutputConstrained Prompt
↓
Minimal, Production-Ready CodeEven with improved prompts, AI-generated code is rarely final. Post-processing has become a standard part of the workflow, with teams running formatters, linters, and manual reviews to ensure consistency. In this model, AI is not replacing engineering discipline—it is integrated into it.
The Learning Curve
The rapid evolution of AI tools has introduced another challenge: constant change. New frameworks, agents, and best practices emerge at a pace that can overwhelm teams and lead to fragmented workflows. Without structure, organizations risk falling into tool sprawl and inconsistent practices.
The most effective teams are addressing this by standardizing their approach. Instead of experimenting endlessly, they select a small set of reliable tools and define clear guidelines for how they should be used. This reduces cognitive overhead and allows teams to build deeper expertise.
Alongside this, internal playbooks are becoming essential. These documents capture prompt templates, validation processes, and security guidelines, turning AI usage into a repeatable system rather than an ad hoc experiment. Over time, this enables a structured progression from basic assistance to more advanced workflows.
This evolution can be represented as:
Ad Hoc Usage
↓
Defined Guidelines
↓
Standardized Stack
↓
Automated PipelinesRather than rushing into full automation, successful teams adopt AI incrementally, layering complexity only after strong validation mechanisms are in place.
Toward a Sustainable AI-Coding Workflow
The broader pattern across all these challenges is clear: AI is not eliminating the need for good engineering practices—it is amplifying them. Trust must be engineered through validation, not assumed. Context must be structured and externalized, not left to memory. Cost must be managed intentionally, not ignored. Output must be constrained, not taken at face value.
AI is not yet a fully autonomous developer, and treating it as one leads to frustration. But when used deliberately—within well-defined boundaries—it becomes a powerful force multiplier. The teams seeing real gains are not those using AI the most aggressively, but those integrating it the most thoughtfully, turning its limitations into design constraints that ultimately improve how software is built.



Nice article!
With so much effort being required to make sure that the AI is functioning correctly and producing usable code, you'd start to think it's much better to have humans continue to do it, lol - software engineering has been steadily maturing over the decades, with established practices that have been working so well - only to be upended by unreliable new ones. Slow and good beats fast and bad.