Driving AI Agents with Specifications

Vibe-coding with AI speeds software app creation but is a recipe for technical debt. Spec-driven AI coding solves that dilemma, creating well-made software in fast development cycles.

Aug 05, 2025

A painting of a person writing on a paper

AI-generated content may be incorrect. — Figure 1. An oil-painting-style picture of Leonardo Da Vinci; AI art by ChatGPT. Creativity is important, but AI needs direction.

Introduction – Context is Everything

There are many challenges with using AI, and the first hurdle is to figure out how to say the right thing to an AI to get the relevant, correct, artifact output you want.

It’s also true that AI has gotten particularly good at taking certain high-level non-detailed prompts and spinning a marvelously detailed and correct response. I prompted “Create a Space Invaders game that I can play in a web browser” to Gemini 2.5 Pro, and from that single prompt Gemini 2.5 Pro created a playable Space Invaders game in 600 lines of HTML and Javascript. It was great. It created the real thing.

There’s a catch, though. The Space Invaders AI generation game is so good because the game is a well-known established game design with clear implicit expectations. Dozens of Space Invaders implementations are on GitHub, and it’s all in Gemini’s training data. Gemini’s output was regurgitation not invention.

This satisfying experience of easily conjuring up games in minutes and apps on command has led to the huge popularity of ‘vibe coding.’ If Gemini 2.5 Pro and other AI coding models are good enough to create complete games from a single prompt, why not take it further? Prompt the AI iteratively and let the AI do the heavy lifting to make even more, with minimal push-back from the user.

Creating something unique or off the beaten path of typical apps becomes a challenge. The AI needs help to understand explicitly what you are asking for. Invent an imaginary game name and see it struggle.

Side note: I did try this. I fed this prompt to ChatGPT o4-mini: “Please create the classic Flying Wombat Circus video game using HTML and Javascript that I can play in my browser.” I got 110 lines of code that created a game that consisted of a falling red square. The saying goes, “Garbage In, Garbage Out,” but in this case it seems to be, “Vagueness in, uselessness out.”

The lesson: Prompt with precision to get precisely what you want out of AI.

Lovable Embraces the Vibe

There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. … I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works. – Andrej Karpathy

Andrey Karpathy gave the name ‘vibe-coding’ to act of letting AI lead on writing code. As AI keeps getting better, it’s a good bet to lean on the AI and see what is possible; it can do more than many expect.

Going from zero to passable-working-prototype in a matter of minutes has made vibe-coding popular, but it’s starting to influence more than the “throwaway weekend projects” that Andrej Karpathy used it for.

Microsoft reports that about 30% of their software is now written by AI. At Google, Madhu Guru on X says AI code generation is shifting Google software development culture from write-first to build-first:

Now, when time to vibe-code prototype ≈ time to write PRD, PMs can SHOW not tell. Role profiles are blurring, creativity and building are happening in parallel.

Elena Verna, doing Growth at the vibe-coding company Lovable, said on Linked-In post:

At Lovable, we shortcut entire PM lifecycle:
-> No PM organization … Engineers do all of the PMing. …
-> We don't do PRDs. We just build.
-> Alignment meetings don't exist. We just have a quick convo, a Linear ticket, an occasional short doc.
-> And if we get it wrong? We fix it. The cost of a mistake < the cost of delay.

... Vibe coding enables you to go from the idea to prototype (or even production) faster than you can schedule a meeting. Just build it.

The head of “Growth” at the fastest-growing vibe-coding platform on the planet might have a biased perspective, but it’s a compelling point that AI code generation is slicing cycle times drastically.

However, “We just build” glosses over many details. What is said in the ‘quick convo’ and to whom? What’s in that Linear ticket and “occasional short doc”? What planning or design do you do? How often do you get it ‘wrong’ and who fixes it? In other words, how do you go from defining product intent to a production code base that expresses it?

Here’s a detail from Madhu Guru – they don’t ship vibe-coded software:

vibe coded prototypes from PMs are purely inputs to the larger internal teams.

The PM-written prototype gets the sense of the product, but they don’t ship vibe-coded prototypes; the task of translating that into a robust, production-worthy product lays ahead. The vibe-prototype is your PRD, not your first product version.

Similarly, Intuit Mailchimp accelerated development by up to 40% using AI "vibe coding" tools for rapid prototyping. They found that human oversight and providing business context are crucial, as AI augments engineers but still needs human expertise to develop production-ready code.

The Problem with Vibe-Coding

“AI coding is like a brand new credit card here that is going to allow us to accumulate technical debt in ways we were never able to do before.” – MIT Professor Armando Solar-Lezama

Vibe-coding is brilliant if used right, but it can also get you into big trouble. Val Town blog points out the problem with vibe coding - technical debt:

We already have a phrase for code that nobody understands: Legacy code.
Legacy code is universally despised, and for good reasons. But why? You have the code, right? Can't you figure it out from there?
Wrong. Code that nobody understands is tech debt. It takes a lot of time to understand unfamiliar code enough to debug it, let alone introduce new features without also introducing bugs.
Programming is fundamentally theory building, not producing lines of code. …
When you vibe code, you are incurring tech debt as fast as the LLM can spit it out. Which is why vibe coding is perfect for prototypes and throwaway projects: It's only legacy code if you have to maintain it!

Vibe-coding without knowing what code the AI generated is a recipe for technical debt. It is code nobody understands, has hidden assumptions, and targets features in unknown ways.

That’s why the ‘low-hanging’ use case for AI coding is making prototypes, iterating fast yet avoiding tech debt by discarding the code.

To avoid technical debt, you can use vibe-coding only for throwaway code or you can understand the code that AI generates.

A study on codebases from GitClear published earlier this year shows trade-offs between speed and quality in AI-assisted development. They found that “Code churn” is increasing dramatically and also found a significant increase in copy-pasted code:

Perhaps more concerning for long-term maintainability is the change in the composition of code additions. The study found that “copy/pasted code” is increasing at a faster rate than “updated,” “deleted,” or “moved” code.

The article sharing the study suggested ways to avoid pitfalls of AI code generation. They include clear quality guidelines and stronger automated testing requirements for AI generated code, as well as educating teams on the strengths and limitations of AI coding assistants. They also suggest:

Creating feedback loops that help developers improve their prompting techniques.

Spec-Based Context for AI Coding

Generating robust production code requires maintaining a good understanding of software, requirement, design and code. The solution of maintaining understanding and good process while taking advantage of AI’s speed and automation: Drive the AI coding with specifications and co-develop with AI.

Spec-driven AI coding starts by defining what you want to build in natural language. Tell the AI what you want built by giving details in a clear software specification, for example as a product requirements document (PRD) in a text Markdown file.

AI coding assistants are getting better at working with such a flow. Most AI coding assistants now have some kind of Architect or planning mode that can turn detailed specifications into steps for AI coding development.

Amazon Kiro is explicitly supporting this, as a self-declared specification-driven agentic IDE that is “designed to bring structure to vibe-coding.” Kiro has a system prompt geared towards writing clean code and being strict about doing only what is asked:

Write only the ABSOLUTE MINIMAL amount of code needed to address the requirement, avoid verbose implementations and any code that doesn't directly contribute to the solution.

Other AI coding assistants have learned that it’s not helpful to ‘hallucinate’ unasked features. Obviously, that means users who want features will have to be specific and ask for them. A PRD makes it clear.

Requirements, Designs and Plans for driving AI Code Generation

A crucial part of software engineering involves understanding the difference between product requirements (what to do), design (how to structure the code), and plan (the steps to do it). In software development, you would do requirements first, then design, then plan; all would be separate documents.

So too with AI coding. A user can co-develop these documents with the AI, having the AI assist in turning requirements into design, and both of them into a plan of to-do items. AI coding assistants are also able to convert a requirement specification into a step-by-step plan maintained in the todo.md file.

The requirements, design, and plan (a simple todo.md is enough) could then be presented in separate markdown files in the context for the AI to generate code. Context Engineering ties it all together by providing additional supporting elements and guide how to use the product requirements document (PRD), design document, and plan.

Harper Reed’s blog shares his approach to iterating specs and plans for software development:

tl:dr; Brainstorm spec, then plan a plan, then execute using LLM codegen. Discrete loops. Then magic. ✩₊˚.⋆☾⋆⁺₊✧

He first co-develops the spec with the AI. Then he has a detailed prompt that walks the AI through first creating a detailed plan, then turning it into a series of specific prompts (“break it down into small, iterative chunks”) for the AI to generate code in discrete steps.

The AI coding assistant co-develops requirements, design, and plans with the user, then writes code, all while following good software engineering practices. Iterations are done by evaluating code and updating all documents to stay coordinated. Done right, specification-driven AI coding creates well-made software in fast development cycles.

A McKinsey report on AI-enabled software development points out how AI in the whole software development cycle promises to cut software development cycle times while improving quality.

By integrating all forms of AI into the end-to-end software product development life cycle (PDLC), companies can empower product managers (PMs), engineers, and their teams to spend more time on higher-value work and less on routine tasks.

The productivity gains possible will drive adoption of AI-enabled software development processes, which will include the techniques mentioned.

Beyond Coding – Spec-Driven AI Prompting

“Every prompt that fails does so because intent wasn’t clearly communicated.” - Nate Jones

While an AI agent like Cursor can generate code from a plain English prompt, its usefulness hinges on the quality of the prompt description. A vague idea will produce useless code; a precise, well-structured description will yield a functional result.

AI code generation’s strength and weakness is the same – speed. You can cut cycle times while speedily generating technical debt. To avoid technical debt, you can define the developed software through a specification. Software development with AI means evolving the specification in the AI’s context to direct the AI coding agent to generate code that reflects the desired product.

The implications of spec-driven prompting go beyond coding. Any AI agent workflow will be defined by the requirements and design for that workflow. The only way to ensure the AI agent executes the workflow desired is to have a clear specification of that in the AI agent’s context.

The critical skill in driving agentic systems is the ability to translate abstract human intent into a precise prompt - a machine-readable specification – that gives the AI the directions it needs.

This applies also to even personal AI prompts. In trip planning, a vague "plan my trip" prompt leaves intent unstated, which leads to AI guessing and possible hallucinating the wrong details. It won’t work. Either give AI enough details or, if unsure of details, work with the AI to co-develop a plan:

“I am planning a trip, what details should I provide so that you can help set up my flights, hotels, and itinerary?”

A “plan-then-execute” pattern will yield an effective specification for the AI agent to do its job. If you do know what you want specifically, spell it out. You will get better results from AI by expressing your intent clearly in the prompt, providing a specification of your desired result.

AI Changes Everything

Discussion about this post