<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://devplan.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://devplan.com/" rel="alternate" type="text/html" hreflang="en" /><updated>2026-05-19T16:44:31-07:00</updated><id>https://devplan.com/feed.xml</id><title type="html">Devplan</title><subtitle>Devplan is the coordination layer for AI-driven product teams. Connect GitHub, Jira, Slack, Zoom, and Notion into a living knowledge graph that surfaces insights, projects, and a roadmap that updates itself — so everyone stays in sync, automatically.</subtitle><author><name>Devplan Team</name><email>info@devplan.com</email></author><entry><title type="html">Context Engineering: Why Your Prompts Aren’t the Problem</title><link href="https://devplan.com/blog/context-engineering-is-the-real-variable/" rel="alternate" type="text/html" title="Context Engineering: Why Your Prompts Aren’t the Problem" /><published>2026-03-23T06:00:00-07:00</published><updated>2026-03-23T06:00:00-07:00</updated><id>https://devplan.com/blog/context-engineering-is-the-real-variable</id><content type="html" xml:base="https://devplan.com/blog/context-engineering-is-the-real-variable/"><![CDATA[<h2 id="the-teams-winning-with-ai-didnt-write-better-prompts">The Teams Winning With AI Didn’t Write Better Prompts</h2>

<p>Engineering teams shipping substantial code volumes are not using different models or writing superior prompts. They have constructed a superior operational environment for their agents. The gap stems not from capability differences—all teams access identical frontier models—but from what they present to those models: how context is structured, memory is managed, feedback loops close, and agents receive a coherent picture of their objectives.</p>

<h2 id="what-context-engineering-actually-is">What Context Engineering Actually Is</h2>

<p>Context engineering involves strategically populating the context window with precisely calibrated information for each step. Andrej Karpathy defined it as “the delicate art and science of filling the context window with just the right information for the next step.” Shopify CEO Tobi Lutke framed it as “the art of providing all the context for the task to be plausibly solvable by the LLM.” Google’s engineering team emphasized treating “context as a first-class system with its own architecture, lifecycle, and constraints.”</p>

<p>Context engineering differs fundamentally from prompt engineering. Most people conceptualize context as a container—you insert materials, the model reads them, you receive output. This misses what actually occurs. Models lack memory beyond the context window. Everything about your task, codebase, preferences, history, and constraints must exist within that window currently. Without it, the model guesses.</p>

<p>Two engineers face identical-complexity tasks on the same model. Engineer A writes a clean, specific prompt. Engineer B writes a mediocre prompt but maintains a CLAUDE.md file in the repository root, has loaded relevant code examples, and possesses a document explaining team conventions. Engineer B consistently receives superior output. The prompt barely mattered; context accomplished everything.</p>

<h2 id="why-context-failures-look-like-model-failures">Why Context Failures Look Like Model Failures</h2>

<p>Research on AI agent failures reveals teams misattribute root causes. When output disappoints, the reflexive response involves blaming the model, swapping models, or refining the prompt. The actual cause almost invariably resides in the context.</p>

<p>Anthropic’s engineering team observed that “most agent failures are not model failures. They are context failures.” The guidance emphasizes thoughtfulness—maintaining context that remains informative yet concise. This represents a design constraint rather than a prompting problem.</p>

<p>Birgitta Böckeler at Thoughtworks noted that “the number of options to configure and enrich a coding agent’s context has exploded,” with reliable output coming from teams treating configuration as genuine engineering rather than afterthought.</p>

<h2 id="the-four-layers-of-context">The Four Layers of Context</h2>

<p>Context contains distinct layers serving different functions. Teams achieving reliable large-scale output work across all four.</p>

<p><strong>Layer 1: The Spec Layer</strong></p>

<p>This remains underutilized in software development yet most directly mirrors strong human engineering teams. Senior engineers consult PRDs, acceptance criteria, and technical specifications before coding. Models should too.</p>

<p>Well-constructed PRDs provide the “why”—what problem gets solved, who benefits, what success resembles. Acceptance criteria prove more valuable still: explicit, testable statements defining completion. Technical specs translate this into implementation decisions—involved services, data models, architectural fit.</p>

<p>When teams omit this layer, models fill the gap with assumptions, often producing technically correct but contextually wrong output.</p>

<p>Example acceptance criteria:</p>

<pre><code class="language-plain">## Feature: User CSV export

**What it does:** Allows users to export transaction history as CSV.

**Acceptance criteria:**
- Export button appears only for paid plan users
- CSV includes: date, description, amount, category, status
- Amounts as raw numbers (no currency symbols)
- Empty state: export headers-only CSV, no error
- File name format: transactions-YYYY-MM-DD.csv using user's local timezone

**Out of scope:**
</code></pre>

<p>Six lines of acceptance criteria clarify what completion means, what to exclude, and prevent common scope creep that appears in AI-generated code.</p>

<p>The spec layer compounds over time. A living feature catalog documenting what was built, why, and how it interconnects becomes invaluable context. Without it, each task starts from scratch. With it, models understand their operational system.</p>

<p><strong>Layer 2: The Knowledge Layer</strong></p>

<p>This encompasses everything injected to convey your specific situation to the model. In coding contexts, typically a CLAUDE.md file in the repository root, loaded automatically. It contains architectural decisions, naming conventions, preferred and prohibited libraries, and common pattern handling.</p>

<p>This layer should read like documentation for a brilliant new hire, not command instructions. You provide background enabling good independent decisions rather than behavioral instructions.</p>

<p>Compare approaches:</p>

<pre><code class="language-plain">Weak:
You are a helpful coding assistant. Write clean code.

Strong:
You are a senior backend engineer in a TypeScript monorepo.
The codebase uses Zod for validation, Prisma for database access,
and enforces strict service/route handler separation.
Never use any as a type. Always handle errors explicitly.
</code></pre>

<p>Same model, completely different output. The difference lies not in instruction but context.</p>

<p><strong>Layer 3: Conversation History</strong></p>

<p>Models read everything preceding in conversations, shaping every subsequent response. Vague request histories calibrate models to that level. Starting fresh conversations for distinct tasks maintains context cleanliness. Extended single conversations degrade quality as cluttered, contradictory history accumulates. Long conversations also consume context window space, potentially causing models to drop earlier content without indication.</p>

<p><strong>Layer 4: The Retrieved Layer</strong></p>

<p>This encompasses anything pulled dynamically based on specific tasks—search results, code files, documentation snippets, tool outputs. Teams frequently err by retrieving excessively. More context does not equal better context. Irrelevant context actively harms by diluting signal and providing confusion sources.</p>

<p>Retrieval precision matters critically. A directly relevant 200-line function outperforms a 2,000-line file that is 90% noise. Teams dumping entire documentation sites into context while wondering why models confuse unrelated features have created retrieval problems, not model problems.</p>

<p>Consider a payment processing bug: retrieving “everything payment-related” means 14 files, 3,800 lines covering webhooks, invoicing, refunds, and subscriptions. The model reasons across all to find one relevant function. The same task with precision retrieval: the specific handler, two utility functions it calls, and its expected error type. Ninety lines. The model proceeds directly to the problem.</p>

<h2 id="the-six-failure-modes-that-repeat-across-teams">The Six Failure Modes That Repeat Across Teams</h2>

<p><strong>Skipping the spec layer entirely.</strong> Handing models vague task descriptions while expecting intent inference represents the biggest poor-output source. PRDs need not be lengthy. Even one-page documents with clear acceptance criteria dramatically improve output quality.</p>

<p><strong>Writing the knowledge layer instruction-style.</strong> The knowledge layer should read as documentation—background, context, conventions. Models process this differently than command lists, reflected in output.</p>

<p><strong>Not managing conversation length.</strong> One extended conversation commonly causes mid-session quality drops. Fresh conversations for distinct tasks function as quality control, not mere workflow preference.</p>

<p><strong>Retrieval without curation.</strong> Pulling every potentially relevant file differs from pulling the right files.</p>

<p><strong>Ignoring implicit structure.</strong> Context order and format matter. Google’s ADK team documented that context flooded with irrelevant data causes models to fixate on past patterns rather than immediate instructions. Placing most important constraints at input end consistently improves output.</p>

<p><strong>Conflating model capability with context quality.</strong> When output disappoints, most blame the model. Usually, context explains the problem. Before concluding a model cannot accomplish something, rebuild context from scratch and retry.</p>

<h2 id="how-to-build-this-in-four-weeks">How to Build This in Four Weeks</h2>

<p><strong>Week one: observe, don’t optimize.</strong> Document every failure: what was requested, what went wrong, what knowledge it lacked. This list seeds your knowledge layer.</p>

<p><strong>Week two: build the knowledge layer.</strong> Transform week-one patterns into a CLAUDE.md addressing them directly. Cover domain knowledge the model repeatedly misses: naming conventions, architectural constraints, library preferences, off-limits areas. Write documentation-style, not instructions.</p>

<p><strong>Week three: get serious about specs.</strong> Before meaningful tasks, write at minimum a brief PRD and acceptance criteria. Add technical specs for architecturally complex work. This need not be formal—markdown files with several sections suffice. Writing forces clarity that directly translates to superior output.</p>

<p><strong>Week four: test retrieval.</strong> Be deliberate about external content retrieval and volume. Run identical tasks with different retrieval strategies, comparing output quality. Less usually proves more.</p>

<p>Afterward, iterate. The knowledge layer is a living document. Update when catching new failure patterns.</p>

<p>This month-long process establishes foundations. Subsequently, you stop fighting the model and begin collaborating with it.</p>

<h2 id="what-this-has-to-do-with-specs">What This Has to Do With Specs</h2>

<p>The spec layer is not peripheral to context engineering; it is the most essential layer.</p>

<p>Spec-driven development’s core insight: agents perform only as well as their instructions, and the most important instructions are not those typed into chat boxes. They are pre-session documents—the PRD, acceptance criteria, technical constraints, architectural decisions.</p>

<p>When these documents live in repositories, they are machine-readable. Agents can access them. When they exist in Slack threads or Google Docs, they effectively do not exist from agent perspectives.</p>

<p>Teams shipping reliably at scale treat specs not as human documentation but as the primary mechanism through which human intent becomes legible to agents. This reframing affects how carefully these are written, how precisely completion is defined, and how consistently they are maintained as codebases evolve.</p>

<p>The prompt is not the variable. The context is.</p>

<h2 id="faq">FAQ</h2>

<p><strong>What is context engineering?</strong> Context engineering is deliberately designing everything flowing into a model’s context window—not merely the prompt, but task descriptions, conversation history, retrieved documents, tool outputs, memory artifacts, specs, and connecting structures.</p>

<p><strong>How is context engineering different from prompt engineering?</strong> Prompt engineering focuses on crafting individual well-worded instructions. Context engineering addresses the entire information environment in which models operate. Prompts exist within context; context engineering determines what context contains, how it is structured, and how it evolves across sessions.</p>

<p><strong>Why does context quality matter more than model quality?</strong> All teams access identical frontier models. The differentiator is not capability but environment. Poor-context models produce poor output regardless of capability. Rich, well-structured-context models produce substantially superior output from identical raw capability.</p>

<p><strong>What is the most underused layer of context?</strong> The spec layer. Most teams skip it, handing models vague task descriptions. Even short PRDs with clear acceptance criteria dramatically improve output quality by eliminating gaps models would fill with their own assumptions.</p>

<p><strong>What is context pollution?</strong> Context pollution involves excessive irrelevant, redundant, or conflicting information within the context window. It distracts models and degrades reasoning accuracy. Teams often mistakenly treat greater context as superior context. Retrieval precision—pulling most relevant information rather than complete dumps—represents one of the highest-leverage improvements available.</p>

<p><strong>How do specs relate to context engineering?</strong> Specs are the primary mechanism through which human intent becomes legible to agents. In well-designed context systems, specs function not as human documentation but as machine-readable repository files that agents read to understand what to build, what constitutes completion, and what constraints to respect. Poor specs commonly underlie poor agent output.</p>]]></content><author><name>Devplan Team</name></author><category term="Insights" /><summary type="html"><![CDATA[Winning teams don't write better prompts — they engineer better context. How to structure your information environment for maximum AI agent performance.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://devplan.com/assets/og/og-default.png" /><media:content medium="image" url="https://devplan.com/assets/og/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The Harness Is Everything: Why Your AI Coding Agent Keeps Failing</title><link href="https://devplan.com/blog/the-harness-is-everything/" rel="alternate" type="text/html" title="The Harness Is Everything: Why Your AI Coding Agent Keeps Failing" /><published>2026-03-17T06:00:00-07:00</published><updated>2026-03-17T06:00:00-07:00</updated><id>https://devplan.com/blog/the-harness-is-everything</id><content type="html" xml:base="https://devplan.com/blog/the-harness-is-everything/"><![CDATA[<h2 id="what-is-an-ai-agent-harness">What Is an AI Agent Harness?</h2>

<p>An AI agent harness is the complete designed environment in which a language model operates. It includes the tools the agent can call, how information is formatted and delivered into context, how history is compressed and managed across sessions, the guardrails that catch mistakes before they cascade, and the scaffolding that lets an agent hand off coherent work to its future self.</p>

<p>A harness is not a system prompt. It is not a wrapper around an API call. It is not a longer prompt or a better model. It is the infrastructure layer that determines what any model can actually accomplish, regardless of which model you use.</p>

<p>Ninad Pathak at Firecrawl published one of the most thorough breakdowns of the concept recently, covering the core components in detail: the tool layer, memory architecture, context compression, and verification loops.</p>

<p>The distinction between harness and model matters because most teams spend their time optimizing the wrong thing. They iterate on prompts, swap models, adjust temperature. The teams shipping reliably at scale are investing in environment design.</p>

<h2 id="why-ai-coding-agents-fail-and-it-is-not-the-model">Why AI Coding Agents Fail (And It Is Not the Model)</h2>

<p>The pattern is consistent across every serious team that has documented this publicly. When an AI coding agent produces bad output, the root cause almost always traces back to one of four environment failures.</p>

<p><strong>Context flooding.</strong> The agent receives too much information at once. Irrelevant data competes for attention with relevant data, and output quality degrades across every subsequent step.</p>

<p><strong>Missing feedback loops.</strong> The agent writes code but cannot observe whether it actually works from a user’s perspective. It optimizes for proxy metrics that do not reflect real correctness.</p>

<p><strong>No persistent state.</strong> The agent has no reliable way to know what was done in a previous session, what counts as done, or what the current project state actually is.</p>

<p><strong>Stale or informal context.</strong> Requirements, architectural decisions, and constraints live in Slack threads, Google Docs, or people’s heads. From the agent’s perspective, they do not exist.</p>

<p>Each of these is a harness problem, not a model problem. Kyle at HumanLayer makes this case with unusual directness in his piece on harness engineering for coding agents. His argument, drawn from a year of watching coding agents fail in production: bad agent output is almost never a model problem. It is a configuration problem.</p>

<h2 id="the-research-that-proved-it-the-64-gap">The Research That Proved It: The 64% Gap</h2>

<p>The clearest empirical proof of this came from research that the harness engineering community keeps coming back to. Researchers tested the same model on identical coding tasks—real GitHub issues from popular open source repositories—using two different environments.</p>

<p>With a standard bash shell interface, the system resolved 3.97% of issues. With a purpose-built agent harness, the same model resolved 12.47%. That is a 64% relative performance improvement from environment design alone. Same model. Same task. Same compute.</p>

<p>The harness achieved this through four specific decisions.</p>

<p><strong>Capped search results.</strong> Standard search commands can return thousands of lines. When agents get flooded, they thrash, issuing more searches, accumulating noise, and filling context with irrelevant data. The harness capped results at 50 and forced refinement when exceeded.</p>

<p><strong>A stateful file viewer with line numbers.</strong> The viewer maintained position across interactions and prepended explicit line numbers to every visible line. When an agent needs to edit specific lines, it should read those numbers directly rather than count them.</p>

<p><strong>An editor with integrated linting.</strong> Every edit triggered an automatic linter. Syntax errors were caught and rejected before being applied, with a clear error message. Without this, agents introduce a syntax error, run tests, see a seemingly unrelated failure, and spend ten steps chasing a ghost.</p>

<p><strong>Context compression.</strong> Older observations were collapsed into single-line summaries. The agent could always see recent, relevant state without being buried in the full uncompressed history of every command it had ever run.</p>

<p>The clearest proof in the literature that the bottleneck in AI agent performance is almost never the model. It is the environment.</p>

<h2 id="how-anthropic-solved-the-long-running-agent-problem">How Anthropic Solved the Long-Running Agent Problem</h2>

<p>Anthropic’s engineering team, building Claude Code, encountered a harder version of the same problem: tasks too large to complete inside a single context window.</p>

<p>Most real software projects do not fit in any context window. A production web application has hundreds of files, thousands of functions, a test suite, configuration, and dependencies. Human engineers navigate this through external memory, documentation, and accumulated context built over time. An agent starting a fresh session has none of that.</p>

<p>Internal experiments revealed two failure patterns consistent enough to become the design spec for their harness architecture.</p>

<p><strong>Attempting to do too much at once.</strong> Given a prompt like “build a clone of claude.ai,” the agent would try to one-shot the entire application, implementing features without completing or testing any of them, running out of context mid-implementation, and leaving the next session to start with a half-built app and no documentation of what state it was in.</p>

<p><strong>Declaring victory too early.</strong> After some features had been built, a subsequent agent would look around, see progress, and conclude the job was done. Not because it was unintelligent, but because it had no structured way to know what done actually meant for this project.</p>

<h2 id="the-initializer-and-coding-agent-architecture">The Initializer and Coding Agent Architecture</h2>

<p>The solution was a two-part harness. An initializer agent runs once and creates three things.</p>

<p>An <strong>init.sh</strong> script that reliably starts the development environment. Every subsequent session begins by running this script. The tokens saved on environment setup across dozens of sessions accumulate significantly.</p>

<p>A <strong>structured feature list</strong>—over 200 specific end-to-end feature descriptions, each initially marked as failing. This file is the project’s ground truth. An agent starting a new session reads it and knows exactly what has and has not been built. It cannot look at working code and conclude the job is done. The feature list tells it the truth. Stored as JSON rather than Markdown deliberately—models are less likely to casually overwrite JSON files. The rigid structure resists the kind of editing you do not want.</p>

<p>A <strong>claude-progress.txt</strong> file updated at the end of every session. Combined with git history, it gives every future agent a fast orientation without burning context on archaeology.</p>

<p>The coding agent that runs in every subsequent session has a tighter mandate: work on one feature at a time, leave the environment clean, and update the progress file and git history before the session ends.</p>

<h2 id="the-feedback-loop-failure-nobody-talks-about">The Feedback Loop Failure Nobody Talks About</h2>

<p>Anthropic also documented a failure mode that shows up in virtually every agentic coding project: agents marking features complete without verifying them end-to-end.</p>

<p>An agent writes code, runs a unit test, sees it pass, marks the feature done. But the feature does not work when a real user interacts with it through a browser. The gap between unit test success and real-world functionality is something human engineers navigate by actually running the application. An agent without browser automation cannot make that shift.</p>

<p>The fix was giving agents access to browser automation tools so the agent could navigate the application, click buttons, fill forms, and verify real user flows. The improvement was substantial.</p>

<p>The principle generalizes: the quality of an agent’s work is bounded by the quality of its feedback loops. If the agent cannot observe the consequences of its actions in the domain that matters, it will optimize for proxies that do not correlate with correctness.</p>

<h2 id="how-openai-shipped-a-million-lines-with-no-manual-code">How OpenAI Shipped a Million Lines With No Manual Code</h2>

<p>OpenAI’s Codex team started a repository with one constraint: no human-written code. Everything including application logic, tests, CI configuration, documentation, and observability tooling would be written by agents. Humans would steer. Agents would execute.</p>

<p>The result: approximately one million lines of code, roughly 1,500 merged pull requests, and three engineers averaging 3.5 PRs per engineer per day. As the team grew, per-engineer throughput increased. A real internal product with hundreds of daily users.</p>

<p>The central observation from their writeup: the engineering job changed entirely. When you are not writing code, you are designing environments, specifying intent, and building feedback loops. When something failed, the fix was almost never “try harder.” It was almost always “what structural piece of the environment is missing that is causing this failure?”</p>

<h2 id="the-repository-as-system-of-record">The Repository as System of Record</h2>

<p>One of the most consequential decisions was making the repository the source of truth for everything an agent needed to know. Anything in a Slack thread or a Google Doc is invisible to the agent. If the agent cannot access it in context, it effectively does not exist.</p>

<p>Early on, the team tried the one big AGENTS.md approach, a single large instruction file containing everything. It failed in four consistent ways. A giant instruction file crowds out the actual task and relevant code. When everything is marked important, nothing is. A monolithic manual rots instantly as the codebase evolves. And a single blob is nearly impossible to verify for freshness or coverage.</p>

<p>The solution was a structured docs/ directory as the system of record, with a short AGENTS.md of roughly 100 lines serving as a map to deeper truth elsewhere. Progressive disclosure: agents start with a small, stable entry point and are pointed toward more when they need it, rather than overwhelmed upfront.</p>

<h2 id="mechanical-architecture-enforcement">Mechanical Architecture Enforcement</h2>

<p>When agents are opening 3.5 PRs per engineer per day, human code review cannot be the primary quality mechanism. The solution was encoding architectural constraints as mechanical checks that run at the point of violation rather than days later in a PR comment.</p>

<p>Custom linters enforced dependency directions, boundary crossing, and interface consistency. The key principle: enforce invariants, not implementations. Care deeply about structural rules. Do not dictate how a specific function is built, as long as it satisfies its behavioral contract. Every linter error message was formatted specifically for injection into agent context, including the rule violated, the violation found, and the remediation steps, all in one actionable message.</p>

<h2 id="where-the-term-harness-engineering-came-from">Where the Term “Harness Engineering” Came From</h2>

<p>The term started spreading earlier this year. Charlie Guo’s piece synthesized converging practices from teams at OpenAI, Stripe, and others.</p>

<p>The core observation Guo made, and that the research supports, is that harness engineering is a discipline in the same way that infrastructure engineering is a discipline. It is not about any single tool or technique. It is about treating the agent’s environment as a first-class engineering concern rather than an afterthought to the model.</p>

<h2 id="the-five-patterns-that-repeat-across-every-high-performing-harness">The Five Patterns That Repeat Across Every High-Performing Harness</h2>

<p>Across all of these systems and teams, several design patterns appear consistently. They are not coincidences. They are engineering solutions to problems that consistently emerge when deploying agents at scale.</p>

<p><strong>Progressive disclosure.</strong> Give the agent the minimum it needs to orient itself, plus pointers to find more when it needs it. A short, focused entry point that maps to deeper context outperforms a comprehensive dump every time. It is also dramatically easier to keep accurate.</p>

<p><strong>Git worktree isolation.</strong> One agent, one worktree. Every serious orchestration system uses this. Git worktrees give each agent its own working directory, branch, and environment. Changes are validated in isolation before touching the main codebase.</p>

<p><strong>Spec first, repository as system of record.</strong> If it is not in the repository, it does not exist from the agent’s perspective. Specifications, requirements, architectural decisions, and constraints must be encoded into machine-readable files before execution begins. Documentation is no longer just for human readers. It is the mechanism through which human intent becomes legible to agents.</p>

<p><strong>Mechanical architecture enforcement.</strong> Encode architectural constraints as automated checks that run at the point of violation. Enforce invariants, not implementations. Allow significant freedom within them. The linter catches the violation and the error message remediates it. Human review focuses on judgment calls, not structural drift.</p>

<p><strong>Integrated feedback loops.</strong> Close the gap between action and consequence as tightly as possible. Syntax errors caught at edit time. Runtime errors surfaced through observability tools the agent can query. UI bugs caught through browser automation the agent can drive. For agents, errors not caught immediately accumulate in context and degrade every subsequent reasoning step.</p>

<h2 id="what-this-means-for-how-you-build">What This Means for How You Build</h2>

<p>When something is not working in your agent system, the harness mindset produces a different diagnostic than the default one.</p>

<p>Instead of “how do I write a better prompt?” ask “what information does the agent need that it currently cannot access?”</p>

<p>Instead of “why is the model making this mistake?” ask “what feedback loop is missing that would catch this before it propagates?”</p>

<p>Instead of “why is the agent not doing what I told it?” ask “what constraint in the environment is preventing it?”</p>

<p>This shift changes where engineering effort goes. A prompt fix solves one specific failure mode. A harness improvement prevents a category of failure modes, permanently, across every future session.</p>

<h2 id="the-minimal-harness-for-a-real-project">The Minimal Harness for a Real Project</h2>

<p>You do not need a full observability stack to benefit from this thinking. Four components cover most of it.</p>

<p><strong>A persistent progress file.</strong> The agent reads it at session start and writes it at session end. This alone prevents the “declare victory too early” failure and ensures continuity across context window boundaries.</p>

<p><strong>A structured task list with verifiable completion criteria.</strong> Not a vague project description. A specific, enumerated list of user-visible behaviors testable end-to-end. Status updates only after verification.</p>

<p><strong>Version control as a first-class session requirement.</strong> Every session ends with a commit and an updated progress file. Clean state is not a nice-to-have.</p>

<p><strong>Browser automation if you are building for the web.</strong> The difference between an agent that can only read code and one that can use the application it is building is the same as the difference between a developer who reads code and one who runs it.</p>

<h2 id="the-uncomfortable-bottom-line">The Uncomfortable Bottom Line</h2>

<p>If execution is a commodity, and the evidence suggests it increasingly is, the long-term competitive advantage in AI-driven development is not the model. It is the harness.</p>

<p>The teams that have figured this out built custom development environments for their specific codebases and domains. They built harness architectures enabling months of coherent incremental progress. They demonstrated dramatically better results from the same models through environment design alone. None of those advantages came from the model. They came from the environment.</p>

<p>The model is what thinks. The harness is what it thinks about.</p>

<h2 id="faq">FAQ</h2>

<p><strong>What is an AI agent harness?</strong> An AI agent harness is the complete designed environment in which a language model operates, including its tools, context structure, memory management, feedback loops, and session scaffolding. It determines what the model can actually accomplish, independent of the model’s raw capability.</p>

<p><strong>Why do AI coding agents fail on complex projects?</strong> The most common failure modes are context flooding (too much irrelevant information degrading output quality), missing feedback loops (the agent cannot observe whether its work actually functions), no persistent state across sessions, and requirements that exist outside the repository where the agent cannot access them.</p>

<p><strong>What is the difference between a harness and a prompt?</strong> A prompt is the input you send to the model in a single interaction. A harness is the entire system that determines what context the model receives, what tools it can use, how errors are caught, how state persists across sessions, and what constraints are enforced automatically. Prompts live inside harnesses.</p>

<p><strong>How does spec-driven development relate to harness engineering?</strong> Specs are the primary mechanism through which human intent becomes legible to agents. In a well-designed harness, specs are not just documentation for humans. They are machine-readable files in the repository that the agent reads to understand what to build, what counts as done, and what constraints to respect. Poor specs are one of the most common root causes of poor agent output.</p>

<p><strong>What is the minimal harness I can build today?</strong> Start with four things: a progress file the agent reads and writes each session, a structured feature list with verifiable completion criteria, git commits as a required end-of-session step, and browser automation if you are building a web product. That covers the majority of failure modes most teams run into.</p>]]></content><author><name>Devplan Team</name><email>info@devplan.com</email></author><category term="Insights" /><summary type="html"><![CDATA[Why your AI coding agent keeps failing: environment design, not model selection, determines agent success. How to build a harness that works.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://devplan.com/assets/og/og-default.png" /><media:content medium="image" url="https://devplan.com/assets/og/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">What Separates Good AI Dev Teams From Great Ones</title><link href="https://devplan.com/blog/what-separates-good-ai-dev-teams-from-great-ones/" rel="alternate" type="text/html" title="What Separates Good AI Dev Teams From Great Ones" /><published>2026-02-19T06:00:00-08:00</published><updated>2026-02-19T06:00:00-08:00</updated><id>https://devplan.com/blog/what-separates-good-ai-dev-teams-from-great-ones</id><content type="html" xml:base="https://devplan.com/blog/what-separates-good-ai-dev-teams-from-great-ones/"><![CDATA[<p>Steve Yegge recently published an observation that resonated across engineering teams: developers using AI coding tools most heavily experience the highest burnout, drowning in review queues and running faster just to stay in place.</p>

<p>Yet this outcome isn’t inevitable. While every team has access to the same tools—Cursor, Claude Code, Copilot—some teams are pulling ahead while others struggle. The gap isn’t in the AI tools themselves, but in what happens between the initial idea and the first prompt.</p>

<h2 id="where-most-teams-are-losing-time-they-dont-know-theyre-losing">Where Most Teams Are Losing Time They Don’t Know They’re Losing</h2>

<p>The typical workflow loses fidelity at each handoff. An idea gets written loosely in Notion or Jira. An engineer interprets it and prompts an AI agent. The agent fills remaining gaps with statistical guesses. Code enters review, where senior engineers must reconstruct intent and send it back. This cycle repeats two or three times.</p>

<p>Every handoff drops context. The original intent becomes increasingly distant from what the agent was actually told. The code that emerges looks functional but isn’t quite right, forcing experienced engineers to spend extensive time figuring out where it missed.</p>

<p>This review burden doesn’t stem from AI writing poor code—it comes from the gap between intent and specification.</p>

<h2 id="what-great-teams-do-differently">What Great Teams Do Differently</h2>

<p>High-performing teams recognize that leverage lives in preparation, not in the IDE. They treat the handoff from idea to agent as the most critical moment in development.</p>

<p>These teams ensure agents receive complete specifications before generation begins:</p>

<ul>
  <li>Real acceptance criteria, not implied expectations</li>
  <li>Explicit constraints</li>
  <li>Edge cases fully addressed</li>
  <li>Context grounded in actual codebase architecture</li>
</ul>

<p>With proper specifications, output aligns with intent on the first pass. Review becomes a straightforward criteria check rather than archaeological investigation. Senior engineers focus on high-leverage work. Shipping accelerates not through faster generation, but through closer initial alignment.</p>

<p>Building such specifications manually requires significant effort—pulling codebase context, thinking through edge cases, writing precise acceptance criteria demands discipline. Most teams skip this layer, treating it as overhead. This is the critical mistake.</p>

<h2 id="from-rough-idea-to-agent-ready-spec">From Rough Idea to Agent-Ready Spec</h2>

<p>The solution bridges the gap between idea and execution. Starting with rough intent, the process refines it into something an agent can work from. Codebase context loads automatically, grounding the spec in actual system architecture. Supporting materials, designs, and research fold in naturally.</p>

<p>The result transcends documentation—it becomes a specification that understands architecture, covers edge cases, and provides agents the constraints needed for sound decisions without guessing.</p>

<h2 id="the-cost-of-not-having-this-layer">The Cost of Not Having This Layer</h2>

<p>Teams lacking a specification layer don’t immediately recognize the damage. Problems accumulate gradually:</p>

<ul>
  <li>Review cycles lengthen beyond necessity</li>
  <li>Senior engineers increasingly spend time reconstructing AI intent rather than high-leverage work</li>
  <li>Junior developers lack clear contribution points, becoming passive observers</li>
  <li>Technical debt grows silently as agents guess at unspecified gaps</li>
  <li>Shipping velocity plateaus or drops despite faster generation speeds</li>
</ul>

<p>None of this announces itself as crisis—it simply makes everything incrementally slower and harder. Meanwhile, teams with proper AI coding processes compound their advantages continuously.</p>

<h2 id="comparison-with-and-without-specification-layers">Comparison: With and Without Specification Layers</h2>

<p><strong>Without a Spec Layer</strong></p>

<ul>
  <li>Ideas scattered across Notion, Jira, Slack</li>
  <li>Codebase context exists only in developers’ minds</li>
  <li>Specs remain vague with agent-filled gaps</li>
  <li>First-pass output technically acceptable but contextually misaligned</li>
  <li>Multiple review rounds with senior engineers pulled in</li>
  <li>Cognitive load falls on senior reviewers</li>
  <li>High technical debt from guessed solutions</li>
  <li>Context drops between tools</li>
  <li>Trajectory: slower and harder</li>
</ul>

<p><strong>Spec-Driven Approach</strong></p>

<ul>
  <li>Single location from idea through execution</li>
  <li>Codebase context pulled in automatically</li>
  <li>Structured specs grounded in actual architecture</li>
  <li>First-pass output aligned with intent</li>
  <li>Single-pass review using written criteria</li>
  <li>Cognitive load falls on spec writer at any level</li>
  <li>Low technical debt from pre-specified gaps</li>
  <li>Spec and execution remain integrated</li>
  <li>Trajectory: faster and cleaner</li>
</ul>

<h2 id="the-move-great-teams-are-making-now">The Move Great Teams Are Making Now</h2>

<p>AI coding represents software development’s future—that’s settled. The open question concerns which teams build processes enabling AI at scale versus those cleaning up debt from skipping this layer.</p>

<p>Leading teams work no harder or spend more on tools. They discovered that generation leverage lies not in the tools themselves, but in what precedes it: codebase context, genuine acceptance criteria, constraints reflecting actual system design, and an execution environment where specification intent directly translates to generated code.</p>

<p>This process transforms good AI teams into exceptional ones.</p>]]></content><author><name>Devplan Team</name><email>info@devplan.com</email></author><category term="Insights" /><summary type="html"><![CDATA[The difference between good and great AI development teams isn't better tools — it's mastering the specification phase before any code gets written.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://devplan.com/assets/og/og-default.png" /><media:content medium="image" url="https://devplan.com/assets/og/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Why Intent Is the New Bottleneck in AI Development</title><link href="https://devplan.com/blog/why-intent-is-the-new-bottleneck/" rel="alternate" type="text/html" title="Why Intent Is the New Bottleneck in AI Development" /><published>2026-01-21T06:00:00-08:00</published><updated>2026-01-21T06:00:00-08:00</updated><id>https://devplan.com/blog/why-intent-is-the-new-bottleneck</id><content type="html" xml:base="https://devplan.com/blog/why-intent-is-the-new-bottleneck/"><![CDATA[<h2 id="velocity-without-direction-is-just-expensive-rework">Velocity without direction is just expensive rework</h2>

<p>AI made execution cheap. A working feature can come together in an afternoon. But most teams are finding that speed alone doesn’t translate into shipping the right thing, and the data backs that up.</p>

<p>Bain’s 2025 Technology Report found that teams using AI assistants see only “10 to 15 percent productivity gains,” and the time saved rarely turns into business value. Their research also showed that writing and testing code accounts for just 25 to 35 percent of the development lifecycle. Speeding up that one slice without fixing the inputs just moves the bottleneck somewhere harder to see.</p>

<p>A frontend lead audited a feature that an agent had completed overnight. It worked. The buttons clicked. The data saved. But when he looked at the code, the agent had imported three different date-parsing libraries to handle a single timestamp and hard-coded the timezone to UTC-8 because the prompt didn’t specify otherwise.</p>

<p>The code wasn’t broken. But it was heavy, wrong in ways that wouldn’t surface until someone tried to extend it, and expensive to fix after the fact. He spent the next two days untangling dependencies that didn’t need to exist, which is roughly how long it would have taken to write the feature from scratch.</p>

<p>This pattern shows up constantly. One developer described giving up on a project after three months: “Every time I want to change a little thing, I kill 4 days debugging other things that go south.” The agent keeps fixing symptoms because it doesn’t know the root cause. The developer doesn’t know the root cause either, because they didn’t write the code.</p>

<p>The uncomfortable truth is that teams are spending less time typing and more time auditing code they didn’t author.</p>

<table>
  <thead>
    <tr>
      <th>What the agent sees</th>
      <th>What the agent doesn’t know</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>“Handle payment errors”</td>
      <td>Payment retries are legally prohibited for this transaction type</td>
    </tr>
    <tr>
      <td>Timestamp field in the schema</td>
      <td>Team uses UTC everywhere, never local timezones</td>
    </tr>
    <tr>
      <td>Multiple date libraries in package.json</td>
      <td>Only day.js is approved, the others are legacy</td>
    </tr>
    <tr>
      <td>Redux in older components</td>
      <td>Team migrated to Zustand six months ago</td>
    </tr>
    <tr>
      <td>No tests in the file</td>
      <td>Testing is required, the previous dev just skipped it</td>
    </tr>
  </tbody>
</table>

<p>The agent doesn’t know why you chose boring technology over clever technology, why you picked Postgres over Mongo, or why the payment flow needs to be idempotent. It ships its best guess, and its best guess is statistically reasonable but architecturally wrong for your specific system.</p>

<p>If this sounds like your team, the fix isn’t better prompting. It’s giving agents structured context before they start writing.</p>

<h2 id="where-intent-goes-to-die">Where intent goes to die</h2>

<p>Intent doesn’t disappear all at once. It leaks out at specific points in the workflow, and each leak compounds downstream.</p>

<table>
  <thead>
    <tr>
      <th>Where it leaks</th>
      <th>What happens</th>
      <th>What the agent does</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Planning</td>
      <td>Ticket describes outcome but not constraints</td>
      <td>Agent treats ambiguity as a design decision</td>
    </tr>
    <tr>
      <td>Context transfer</td>
      <td>Decisions live in Slack, Notion, and people’s heads</td>
      <td>Agent has no access, fills in blanks</td>
    </tr>
    <tr>
      <td>Accumulation</td>
      <td>Undocumented patterns pile up in the codebase</td>
      <td>Next agent treats accidental patterns as intentional</td>
    </tr>
  </tbody>
</table>

<p>The first leak happens in planning. A PM writes a ticket that says “User sees error on failed payment.” That ticket contains an outcome but not the constraints around it. It doesn’t say which error component to use, whether the system should retry, or what the logging behavior should be. A human engineer would ask follow-up questions. An AI agent treats the ambiguity as a design decision and makes one.</p>

<p>A team running a checkout flow learned this the hard way. Their agent added a “Retry” button to a payment screen for a transaction type that legally cannot be retried. The prompt didn’t say “no retries,” so the agent optimized for UX and guessed wrong. The feature passed QA because the testers were checking functionality, not legal compliance. It made it to staging before someone from the payments team caught it.</p>

<p>The second leak happens in context transfer. Architecture decisions, past trade-offs, and team preferences live in Slack threads, Notion docs, and people’s heads. None of that reaches the agent. A paper on vibe coding documented what happens when constraints are absent: a team asked an AI to fix display issues, and it responded by rewriting state management, adding new API endpoints, and creating debugging panels. The codebase grew by hundreds of lines. The root cause, a simple API mismatch, stayed unfixed because the agent lacked the constraint that would have pointed it to the actual problem.</p>

<p>The third leak is cumulative. Every project that runs without structured intent makes the next one harder, because the codebase now contains decisions nobody documented and patterns nobody chose deliberately. Six months later, a new agent working on a related feature treats those accidental patterns as intentional architecture and builds on top of them.</p>

<p>Anthropic’s engineering team wrote about a version of this problem: if a human cannot definitively say which tool to use for a task, an AI agent will not do better. The fix isn’t better models. It’s closing the gap between the person who understands the reasoning and the system that executes the code.</p>

<h2 id="spec-driven-development-is-the-missing-layer">Spec-driven development is the missing layer</h2>

<p>Spec-driven development has been getting a lot of attention since mid-2025, with GitHub’s Spec Kit, JetBrains’ Junie, AWS Kiro, and Augment all building some version of it. The core idea is the same across all of them: write a structured specification before any code gets written, and use that spec as the source of truth that agents work from.</p>

<p>The concept isn’t new. As Martin Fowler’s team at ThoughtWorks pointed out, specs have been used in software engineering for decades, from model-driven development to behavior-driven development. What’s different now is the audience. Specs used to be written for future developers. In AI-assisted development, specs are written for machines, and machines need a different kind of clarity than humans do.</p>

<p>Humans need explanations. Agents need prohibitions.</p>

<p>A good spec for an agent includes three layers that most PRDs skip entirely:</p>

<p><strong>Decision logs that include the losers.</strong> Not just “we chose Postgres” but “we chose Postgres over Mongo because we need ACID compliance for the payment ledger.” If you don’t feed that constraint to the agent next week, it will write code that assumes eventual consistency. Architecture Decision Records have been around since 2011, but the format needs to shift. The audience is no longer a human who can infer intent from context. It’s a machine that will do exactly what you don’t tell it not to do.</p>

<p><strong>Hard constraints that act as guardrails.</strong> These are the things that cannot change: no new npm packages without approval, use the internal UI library for all buttons, no external API calls from client-side code, payment flows must be idempotent. These constraints stop an agent from fixing one thing and breaking three others, which is the failure mode that showed up with the date-parsing libraries.</p>

<p><strong>Specificity about edge case behavior.</strong> Instead of “user sees error,” the spec says “if API returns 400, display Toast Component ID ERR_400, do not auto-retry, log to Sentry with payment_id.” Ambiguity in a spec is functionally the same as a prompt injection. It tells the agent “use your judgment,” and the agent’s judgment is a statistical average of every codebase it was trained on, not yours.</p>

<p>Here’s what the difference looks like in practice:</p>

<table>
  <thead>
    <tr>
      <th>Ticket</th>
      <th>Spec</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>“Handle error states on checkout”</td>
      <td>If API returns 400: display <code class="language-plaintext highlighter-rouge">&lt;Toast id="ERR_400"&gt;</code>, do not retry, log to Sentry with <code class="language-plaintext highlighter-rouge">payment_id</code></td>
    </tr>
    <tr>
      <td>“Add user authentication”</td>
      <td>Use JWT with RS256 signing, refresh token rotation, 15-min access token TTL, store refresh token in httpOnly cookie</td>
    </tr>
    <tr>
      <td>“Improve page load speed”</td>
      <td>Lazy-load below-fold images, split vendor bundle from app bundle, target LCP under 2.5s on mobile 4G</td>
    </tr>
    <tr>
      <td>“Fix the date display bug”</td>
      <td>All dates render in UTC, use day.js only, format as <code class="language-plaintext highlighter-rouge">YYYY-MM-DD HH:mm</code> in all admin views</td>
    </tr>
  </tbody>
</table>

<p>The left column is what most agents receive. The right column is what they need.</p>

<h2 id="making-the-intent-layer-stick">Making the intent layer stick</h2>

<p>Individual specs help on a per-project basis, but the real value shows up when the context compounds across projects. A spec for Feature A that documents why you rejected a particular approach becomes context that Feature B’s agent can reference three months later. The system remembers what was tried, what was rejected, and why, so each project makes the next one better.</p>

<p>This is where most teams hit a wall. Static docs rot. A Notion page written in January is outdated by March because nobody updates it when the architecture changes. The spec layer needs to be connected to the actual codebase and updated as decisions are made, not maintained as a separate artifact that drifts from reality.</p>

<p>An engineering manager at a 30-person SaaS company described the before and after. Before, her team’s agents were producing code that technically worked but kept introducing patterns the team had explicitly moved away from. An agent would use Redux in a component because the older parts of the codebase still had Redux, even though the team had migrated to Zustand six months earlier. Nobody had told the agent, and the codebase itself sent mixed signals.</p>

<p>After implementing structured specs with their codebase context attached, the agents started following the team’s actual conventions. Not because the model got smarter, but because the inputs got better. The specs told the agent which patterns to follow and which to ignore, and the codebase analysis gave it the information to distinguish between the two.</p>

<p>The pattern she described is exactly what Bain’s research predicted. The companies seeing 25 to 30 percent productivity gains aren’t the ones with better models. They’re the ones that redesigned the workflow around the model, feeding it structured context instead of raw tickets and hoping for the best.</p>

<h2 id="what-to-do-this-week">What to do this week</h2>

<p>You don’t need to overhaul your workflow to start closing the intent gap. Pick one project that’s about to kick off and try these three things:</p>

<p><strong>Write the decision log before the first line of code.</strong> Document what you chose, what you rejected, and why. Include the constraints that aren’t obvious from the ticket. “We need ACID compliance” is more useful to an agent than “use Postgres.”</p>

<p><strong>Define five hard constraints for your codebase.</strong> These are the rules that never change: approved libraries, required components, forbidden patterns. Put them somewhere the agent can access them, not in a Slack message from four months ago.</p>

<p><strong>Rewrite one vague ticket as a spec.</strong> Take a ticket that says something like “handle error states” and expand it with specific component IDs, retry behavior, and logging requirements. Run the agent against the spec instead of the ticket and compare the output.</p>

<p>If the agent produces better code from the spec than from the ticket, you’ve found your bottleneck. It was never the model. It was the input.</p>

<h2 id="the-teams-that-get-this-right-will-compound-the-advantage">The teams that get this right will compound the advantage</h2>

<p>Whether your team blends the PM and engineer roles or keeps them separate doesn’t matter as much as whether intent survives the handoff. Both approaches work when there’s a layer that carries context across people and tools.</p>

<p>The good news is that this isn’t a massive process overhaul. It starts with writing better inputs. A spec that includes constraints, decision history, and edge case behavior gives every agent run a better starting point, and each project that captures those decisions makes the next one faster.</p>

<p>The teams adopting spec-driven development now are building a compounding asset. Every documented decision, every logged constraint, every structured spec feeds into the next project. Six months from now, their agents are working from a rich, accurate picture of how the system works and why. That gap between teams who structure their intent and teams who don’t will only widen as agents take on more of the build.</p>

<p>The shift is small but the payoff is real: less time re-explaining the same things every sprint, less time auditing code that missed the point, and more time spent on the work that actually moves the product forward.</p>]]></content><author><name>Devplan Team</name><email>info@devplan.com</email></author><category term="Insights" /><summary type="html"><![CDATA[AI makes coding fast. But intent gets lost in handoffs, creating well-built solutions to the wrong problems. Here's how to fix the real bottleneck.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://devplan.com/assets/og/og-default.png" /><media:content medium="image" url="https://devplan.com/assets/og/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The Shift to Spec-Driven Development</title><link href="https://devplan.com/blog/the-shift-to-spec-driven-development/" rel="alternate" type="text/html" title="The Shift to Spec-Driven Development" /><published>2025-10-22T06:00:00-07:00</published><updated>2025-10-22T06:00:00-07:00</updated><id>https://devplan.com/blog/the-shift-to-spec-driven-development</id><content type="html" xml:base="https://devplan.com/blog/the-shift-to-spec-driven-development/"><![CDATA[<h2 id="the-problem">The Problem</h2>

<p>AI is now writing real code, but it still has no idea what we actually want. It produces results with confidence, even when they are misaligned or just plain wrong. In production environments, that overconfidence turns small mistakes into costly problems and wasted cycles.</p>

<p>AI coding tools rely on what we give them as input. When we hand them basic requirements without structure or boundaries, they have to infer architecture, dependencies, and intent. And they often guess incorrectly.</p>

<p>As codebases grow, the problem compounds. Agents hallucinate APIs, misread structure, or fix one issue by breaking three others. Teams waste hours reviewing AI output, rewriting code, and patching misunderstandings that never should have happened.</p>

<p>This is not a tooling problem. It is a <strong>context problem.</strong></p>

<h2 id="the-solution">The Solution</h2>

<p>Spec-Driven Development gives AI and humans a shared language for intent.</p>

<p>Specifications become the primary artifact, and code becomes their expression. Each project acts as a container for features and knowledge, holding the entire context for that area of the product or platform.</p>

<p>A Living Project contains two core specs:</p>

<ul>
  <li><strong>Requirements Spec</strong> – captures user intent, goals, acceptance criteria, and success metrics.</li>
  <li><strong>Tech Design Spec</strong> – maps product requirements to system design: APIs, data models, dependencies, integrations, and constraints.</li>
</ul>

<p>Inside each project, features represent releasable slices of functionality broken into engineering tasks small enough to map to a single pull request.</p>

<p>Organizations maintain a system-wide platform spec, a living document representing the current state of the entire product, initially generated from deep codebase understanding and updated automatically as projects evolve.</p>

<h2 id="core-principles">Core Principles</h2>

<ol>
  <li><strong>Specs as the Source of Truth</strong> – Functional and technical specs define the system. Code is their reflection.</li>
  <li><strong>Continuous Spec Integration</strong> – Specs evolve through updates, review, and versioning like code.</li>
  <li><strong>System Coherence</strong> – Project-level specs roll into a platform spec, keeping the product aligned.</li>
  <li><strong>Human Judgment, Machine Execution</strong> – Builders approve specs and guide direction; AI executes reliably.</li>
</ol>

<h2 id="why-this-matters">Why This Matters</h2>

<p>The software industry has reached a breaking point. Complexity has grown faster than our ability to manage it. Teams operate in fragmented systems with tickets in one tool, designs in another, specs in a third, code in a fourth, and AI sits awkwardly on top, trying to connect dots.</p>

<p>Spec-Driven Development creates a single layer of truth between human decision-making and AI execution, turning planning and implementation into a continuous, data-driven loop.</p>

<h2 id="the-future-of-building">The Future of Building</h2>

<p>A new contributor type is emerging: builders. They think like product managers and engineers but use different tools. Instead of handing off tickets, they define intent and guide AI systems to bring that intent to life in code.</p>

<p>As AI takes on more of the coding, builders spend their time on what humans are uniquely good at: understanding users, reasoning about systems, and making creative and strategic decisions. Specs become the shared language in that collaboration, and AI becomes the translation layer that turns ideas into software.</p>]]></content><author><name>Devplan Team</name><email>info@devplan.com</email></author><category term="Insights" /><summary type="html"><![CDATA[Why specifications have become the primary artifact in AI-assisted development — and how Spec-Driven Development gives teams a shared language for intent.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://devplan.com/assets/og/og-default.png" /><media:content medium="image" url="https://devplan.com/assets/og/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">How Planning Impacts AI Coding</title><link href="https://devplan.com/blog/how-planning-impacts-ai-coding/" rel="alternate" type="text/html" title="How Planning Impacts AI Coding" /><published>2025-08-07T07:00:00-07:00</published><updated>2025-08-07T07:00:00-07:00</updated><id>https://devplan.com/blog/how-planning-impacts-ai-coding</id><content type="html" xml:base="https://devplan.com/blog/how-planning-impacts-ai-coding/"><![CDATA[<h2 id="intro">Intro</h2>

<p>The development community holds varying opinions on AI’s real-world engineering impact. Some report massive productivity improvements, while others find reviewing AI-written code slows them down. This experiment measured how proper planning affects AI-assisted coding productivity.</p>

<h2 id="experiment">Experiment</h2>

<p>We tested whether carefully prepared requirements at the feature level produce better results than quick hand-written prompts. The task, based on an open-source repository, was implemented twice by each agent: once with simple high-level requirements, once with detailed specifications.</p>

<p>Simple requirements included:</p>

<ul>
  <li>GitHub repository change analysis functionality</li>
  <li>Automated periodic analysis for enrolled repositories</li>
  <li>Persisted reports available through API</li>
  <li>UI viewability</li>
</ul>

<p>Detailed requirements covered implementation aspects, design patterns, and architecture decisions. All agents received guidance when stuck but no additional requirement information during implementation.</p>

<h2 id="criteria">Criteria</h2>

<p>Solutions were evaluated across four dimensions:</p>

<ul>
  <li><strong>Correctness</strong>: Implementation alignment with proper design</li>
  <li><strong>Quality</strong>: Code maintainability and adherence to standards</li>
  <li><strong>Autonomy</strong>: How independently agents reached final solutions</li>
  <li><strong>Completeness</strong>: Satisfaction of explicit requirements</li>
</ul>

<p>Scores ranged from 1-5, with consistency across all dimensions more valuable than individual high scores for parallel execution capability.</p>

<h2 id="results">Results</h2>

<table>
  <thead>
    <tr>
      <th>Solution</th>
      <th>Correctness</th>
      <th>Quality</th>
      <th>Autonomy</th>
      <th>Completeness</th>
      <th>Mean ± SD</th>
      <th>Improvement</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Claude, Short</td>
      <td>2</td>
      <td>3</td>
      <td>5</td>
      <td>3</td>
      <td>3.75 ± 1.5</td>
      <td>20%</td>
    </tr>
    <tr>
      <td>Claude, Planned</td>
      <td>4+</td>
      <td>4</td>
      <td>5</td>
      <td>4+</td>
      <td>4.5 ± 0.4</td>
      <td>—</td>
    </tr>
    <tr>
      <td>Cursor, Short</td>
      <td>2-</td>
      <td>2</td>
      <td>5</td>
      <td>3</td>
      <td>3.4 ± 1.9</td>
      <td>20%</td>
    </tr>
    <tr>
      <td>Cursor, Planned</td>
      <td>5-</td>
      <td>4-</td>
      <td>4</td>
      <td>4+</td>
      <td>4.1 ± 0.5</td>
      <td>—</td>
    </tr>
    <tr>
      <td>Junie, Short</td>
      <td>1+</td>
      <td>2</td>
      <td>5</td>
      <td>3</td>
      <td>2.9 ± 1.6</td>
      <td>34%</td>
    </tr>
    <tr>
      <td>Junie, Planned</td>
      <td>4</td>
      <td>3</td>
      <td>4+</td>
      <td>—</td>
      <td>3.9 ± 0.6</td>
      <td>—</td>
    </tr>
  </tbody>
</table>

<h2 id="key-observations">Key Observations</h2>

<p><strong>High-quality planning significantly improves correctness and quality.</strong> AI assistants need clearly prepared product and technical requirements to deliver intended results and follow guidelines.</p>

<p><strong>Planning reduces score dispersion.</strong> Results became more consistent across all AI assistants with detailed, unambiguous requirements. Different agents often chose similar approaches, suggesting any capable coding assistant works well with proper specs.</p>

<p><strong>Smaller tasks work more autonomously.</strong> Claude Code completed detailed requirements without nudging, while Cursor and Junie required additional guidance. Breaking work into smaller chunks increases autonomous completion probability.</p>

<p><strong>Code reviews are major bottlenecks.</strong> Getting six AI runs near completion proved easier than reviewing two PRs. As AI coding scales, teams need larger features completed autonomously.</p>

<h2 id="recommendations-for-parallel-ai-execution">Recommendations for Parallel AI Execution</h2>

<ol>
  <li>
    <p><strong>Prepare detailed specifications</strong> outlining scope, acceptance criteria, test coverage, database changes, and architectural decisions. Remove ambiguity ahead of time. AI handles code placement well but needs guardrails for production-ready output.</p>
  </li>
  <li>
    <p><strong>Keep execution right-sized.</strong> Tasks should complete autonomously without constant oversight. Purpose-built tools help generate appropriately scoped tasks for parallel execution across multiple agents.</p>
  </li>
  <li>
    <p><strong>Review every change.</strong> Even with proper planning, code rarely reaches production-ready status on first pass. Expect AI to reach approximately 80% completion, requiring manual refinement before merging.</p>
  </li>
</ol>]]></content><author><name>Devplan Team</name><email>info@devplan.com</email></author><category term="Insights" /><summary type="html"><![CDATA[Rigorous upfront planning dramatically improves AI coding quality. See how structured specs affect output across leading AI coding assistants.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://devplan.com/assets/og/og-default.png" /><media:content medium="image" url="https://devplan.com/assets/og/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Using Devplan in Practice</title><link href="https://devplan.com/blog/using-devplan-in-practice/" rel="alternate" type="text/html" title="Using Devplan in Practice" /><published>2025-08-07T06:00:00-07:00</published><updated>2025-08-07T06:00:00-07:00</updated><id>https://devplan.com/blog/using-devplan-in-practice</id><content type="html" xml:base="https://devplan.com/blog/using-devplan-in-practice/"><![CDATA[<p>This walkthrough explains how Devplan is used in real day-to-day development. More than 90% of the code shipped runs through Devplan, making it foundational for fast execution and AI-enabled development benefits.</p>

<p>The goals are to create a repeatable, scalable system where AI can:</p>

<ul>
  <li>Get to a working solution independently</li>
  <li>Execute tasks in parallel</li>
  <li>Require minimal human oversight</li>
</ul>

<p>Without Devplan, the overhead of managing AI workflows can cancel out benefits. With it, the advantages are tremendous.</p>

<h2 id="1-define-product--technical-specs-with-devplan-agents">1. Define Product &amp; Technical Specs with Devplan Agents</h2>

<p>Projects start with Devplan’s agents helping define requirements. They ask clarifying questions, flag ambiguity, and scope work properly—grounded in codebase knowledge, past projects, and company structure.</p>

<p>This step is critical because the quality of AI questions surfaces misalignments or assumptions that would cause failures or multiple follow-ups. By the end, you have a clean, scoped project with resolved ambiguity.</p>

<h2 id="2-break-the-project-down-into-right-sized-features">2. Break the Project Down into Right-Sized Features</h2>

<p>Devplan automatically breaks each project into individual features or user stories, with one prompt per feature.</p>

<p>Your job is light validation:</p>

<ul>
  <li>Are features correctly sized (ideally half-day to 5-day chunks)?</li>
  <li>Are there too many or too few?</li>
  <li>Do acceptance criteria make sense?</li>
</ul>

<p>Thanks to planning in Step 1, this typically takes less than two minutes.</p>

<h2 id="3-run-prompts-into-your-ai-ide-manual-vs-devplan-cli">3. Run Prompts into Your AI IDE (Manual vs. Devplan CLI)</h2>

<p>Once features and prompts are ready, run them in your IDE of choice—Claude, Cursor, Junie, etc.</p>

<p><strong>Approach 1: Manual Execution</strong></p>

<p>Per feature:</p>

<ol>
  <li>Download the generated prompt and format it for your IDE</li>
  <li>Clone your repository or create a new worktree</li>
  <li>Open your IDE manually in the correct folder</li>
  <li>Prompt the AI to begin coding</li>
</ol>

<p>Doing this 6–10 times per day becomes tedious, repetitive, and error-prone.</p>

<p><strong>Approach 2 (recommended): Automated Execution with Devplan CLI</strong></p>

<p>With Devplan CLI, overhead disappears. Spin up a feature-ready workspace with one command:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>devplan clone -c XX -p YYYY -y -i cursor -f ZZZZ
</code></pre></div></div>

<p>This one-liner:</p>

<ul>
  <li>Creates a scoped cloned folder for the feature</li>
  <li>Launches your IDE in correct context</li>
  <li>Automatically references the correct prompt file</li>
</ul>

<p>Then tell your AI agent: “Implement current feature.”</p>

<p>Before the CLI, time and energy were lost getting into features and switching between terminal, prompts, and IDEs. Parallel execution felt clunky, and small errors led to broken states. With the CLI, feature execution is fast, consistent, and repeatable—making scale possible.</p>

<h2 id="4-review-and-polish-the-output">4. Review and Polish the Output</h2>

<p>This is the last human step before shipping. The amount of work drops dramatically if planning and prompting were done well.</p>

<p>Once the AI has written code:</p>

<ul>
  <li>Manually review the output</li>
  <li>Fix issues or edge cases</li>
  <li>Test to ensure it meets standards</li>
</ul>

<p>Without this system, far fewer AI-generated features could complete per day. Devplan turns isolated prompts into a real production workflow.</p>

<p>Devplan makes AI-assisted development planning <strong>8–10x faster</strong> compared to manually managing specs, prompts, repos, and execution. Overall coding execution is <strong>2-3x faster</strong>. More importantly, it makes the workflow scalable.</p>

<h2 id="requirements-adjustments">Requirements Adjustments</h2>

<p>When an AI-coding agent goes sideways, it’s often easier to restart with corrected requirements. This workflow allows full restarts in minutes or seconds.</p>

<p>Go back to Step 1 and update the PRD or tech design doc. Then regenerate features and prompts with a single click in the Build Plan. Finally, use the CLI to restart with updated requirements—usually under 2 minutes total.</p>

<p>Centralizing requirements means every change persists, even if the repo is replaced or you switch AI IDEs. Changes in rule files won’t carry over to the next feature and may be lost if you switch tools.</p>

<h2 id="conclusion">Conclusion</h2>

<p>Some articles suggest AI may be a net loss for productivity. Indeed, without smart usage or good tooling, this may be true. For professional engineers who are already efficient, minimizing overhead while empowering AI is critical. Every minute of overhead and context switch matters. Used well, AI can make engineers more productive and the job itself more enjoyable.</p>]]></content><author><name>Devplan Team</name><email>info@devplan.com</email></author><category term="Product" /><summary type="html"><![CDATA[How Devplan powers real AI-assisted development: structured specs, automated workflows, and efficient feature execution at scale.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://devplan.com/assets/og/og-default.png" /><media:content medium="image" url="https://devplan.com/assets/og/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">How to Use AI to Code for Beginners</title><link href="https://devplan.com/blog/how-to-use-ai-to-code-for-beginners/" rel="alternate" type="text/html" title="How to Use AI to Code for Beginners" /><published>2025-07-06T06:00:00-07:00</published><updated>2025-07-06T06:00:00-07:00</updated><id>https://devplan.com/blog/how-to-use-ai-to-code-for-beginners</id><content type="html" xml:base="https://devplan.com/blog/how-to-use-ai-to-code-for-beginners/"><![CDATA[<p>You’ve got an idea that’s been sitting in your notes or bouncing around your brain for months. You may have dabbled with code or played with ChatGPT, but haven’t yet built a complete, working product.</p>

<p>Whether you’re a designer, PM, marketer, ops lead, or curious professional, you don’t need to become a software engineer to build a working MVP. You just need a clear plan and the right tools.</p>

<p>This guide will help you:</p>

<ul>
  <li>Define what you’re building</li>
  <li>Set up a modern web development environment</li>
  <li>Understand how web apps work</li>
  <li>Start coding with real tools</li>
  <li>Deploy your project live</li>
  <li>Know where to go when you get stuck</li>
</ul>

<h2 id="step-0-define-what-youre-building-first">Step 0: Define What You’re Building First</h2>

<p>Before coding, you need a plan. Devplan breaks this down into 3 parts:</p>

<h3 id="1-prd-product-requirements-document">1. PRD (Product Requirements Document)</h3>

<p>Write what your product does from the user’s perspective. Focus on functionality, not implementation. Example:</p>

<ul>
  <li>“Users can sign up with email and password”</li>
  <li>“They see a personalized dashboard after login”</li>
</ul>

<h3 id="2-technical-design">2. Technical Design</h3>

<p>Define what components, pages, logic, and data are needed to implement the PRD. This gives structure to your code before you write it.</p>

<h3 id="3-build-plan">3. Build Plan</h3>

<p>Devplan turns your tech design into scoped tasks. Each task comes with a pre-written AI coding prompt. You’ll use these directly inside your IDE (Cursor is recommended).</p>

<p>After generating your plan:</p>

<ul>
  <li>Copy/paste the prompts into Cursor</li>
  <li>Or download the plan and run through them as you go</li>
</ul>

<h2 id="tech-stack-overview">Tech Stack Overview</h2>

<p>Here’s the modern web dev stack:</p>

<table>
  <thead>
    <tr>
      <th>Layer</th>
      <th>Tool</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Frontend framework</td>
      <td>Next.js</td>
    </tr>
    <tr>
      <td>UI styling</td>
      <td>Tailwind CSS</td>
    </tr>
    <tr>
      <td>Language</td>
      <td>TypeScript</td>
    </tr>
    <tr>
      <td>Runtime</td>
      <td>Node.js</td>
    </tr>
    <tr>
      <td>Package manager</td>
      <td>npm</td>
    </tr>
    <tr>
      <td>Deployment</td>
      <td>Vercel</td>
    </tr>
    <tr>
      <td>Editor</td>
      <td>Cursor (AI-native IDE)</td>
    </tr>
    <tr>
      <td>Version control</td>
      <td>Git + GitHub</td>
    </tr>
  </tbody>
</table>

<p>This is a professional-grade stack used by teams at real startups. You’re not building a toy app.</p>

<h2 id="step-1-set-up-your-environment">Step 1: Set Up Your Environment</h2>

<p>Open your terminal. You’ll be using it often—it’s where most real development happens.</p>

<h3 id="1-install-nodejs--npm">1. Install Node.js + npm</h3>

<p>Go to <a href="https://nodejs.org/">https://nodejs.org</a>. Download the package and install it as you would any other app.</p>

<p>Check that it worked:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>node -v
npm -v
</code></pre></div></div>

<p>You should see version numbers.</p>

<h3 id="2-install-git">2. Install Git</h3>

<p>Download from <a href="https://git-scm.com/downloads">https://git-scm.com/downloads</a>. Install with default settings.</p>

<p>Check it:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git --version
</code></pre></div></div>

<h2 id="step-2-install-and-use-cursor">Step 2: Install and Use Cursor</h2>

<p><a href="https://www.cursor.sh/">Cursor</a> is a developer environment based on VS Code with built-in AI. It’s perfect for working with Devplan’s AI prompts and helping you through roadblocks.</p>

<p>Install it and open the app.</p>

<p>Inside Cursor:</p>

<ul>
  <li>Create or open your project folder</li>
  <li>Use the built-in terminal (View &gt; Terminal or <code class="language-plaintext highlighter-rouge">Ctrl+`</code>)</li>
  <li>Use the AI agent panel (right-hand side) to paste in prompts or ask for help</li>
</ul>

<h2 id="step-3-create-a-new-project-with-nextjs">Step 3: Create a New Project with Next.js</h2>

<p>In the Cursor terminal, run the Next.js scaffold command and choose these options:</p>

<ul>
  <li>TypeScript → Yes</li>
  <li>Tailwind CSS → Yes</li>
  <li>App Router → Yes</li>
  <li>Customize src directory → No</li>
  <li>Import alias → No</li>
  <li>Install dependencies → Yes</li>
</ul>

<p>When complete:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd your-app-name
npm run dev
</code></pre></div></div>

<h3 id="whats-happening-here">What’s happening here?</h3>

<ul>
  <li>You’re starting a local server on your machine</li>
  <li>Your app runs at <code class="language-plaintext highlighter-rouge">http://localhost:3000</code> — this is only visible to you</li>
  <li>Next.js watches your files — every time you save, it auto-refreshes the browser</li>
</ul>

<p>Open <code class="language-plaintext highlighter-rouge">app/page.tsx</code>, change some text, and save. Watch the browser update instantly.</p>

<h2 id="step-4-get-comfortable-with-the-command-line">Step 4: Get Comfortable with the Command Line</h2>

<p>You’ll be using terminal commands a lot. Here are a few you’ll use regularly:</p>

<table>
  <thead>
    <tr>
      <th>Command</th>
      <th>Purpose</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">cd folder-name</code></td>
      <td>Change directory</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">ls</code> (or <code class="language-plaintext highlighter-rouge">dir</code> on Windows)</td>
      <td>List files</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">npm install</code></td>
      <td>Install dependencies</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">npm run dev</code></td>
      <td>Start local dev server</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Ctrl+C</code></td>
      <td>Stop the current process</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">clear</code></td>
      <td>Clean up the terminal screen</td>
    </tr>
  </tbody>
</table>

<p>When things go wrong, most errors will show up in the terminal. Read it carefully—it usually tells you what broke.</p>

<h2 id="step-5-start-building-features-in-cursor">Step 5: Start Building Features in Cursor</h2>

<p>Once your Devplan Build Plan is ready:</p>

<ol>
  <li>Open Cursor and your project folder</li>
  <li>Go to your Devplan task list</li>
  <li>Copy the AI prompt for the first task</li>
  <li>Paste it into Cursor’s AI panel</li>
  <li>Cursor will generate code inside the file it thinks you need. Review it before saving</li>
  <li>Use the dev server to check progress at <code class="language-plaintext highlighter-rouge">http://localhost:3000</code></li>
</ol>

<p>Repeat for each task in your plan.</p>

<p>Tips:</p>

<ul>
  <li>If something breaks or errors show up: copy/paste from the browser and say “I just added this code and now I’m getting this error. What’s wrong?”</li>
  <li>Let the AI do some of the heavy lifting, but try to read and understand the code it writes.</li>
</ul>

<h2 id="step-6-common-errors-and-fixes">Step 6: Common Errors and Fixes</h2>

<table>
  <thead>
    <tr>
      <th>Error</th>
      <th>Cause</th>
      <th>Fix</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Command not found: npx</code></td>
      <td>Node.js not installed properly</td>
      <td>Reinstall Node.js, restart Cursor</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">EADDRINUSE: Port 3000</code></td>
      <td>Dev server already running</td>
      <td>Stop with <code class="language-plaintext highlighter-rouge">Ctrl+C</code>, try again</td>
    </tr>
    <tr>
      <td>Red squiggles in Cursor</td>
      <td>Lint/type errors</td>
      <td>Hover and let Cursor suggest a fix</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Cannot find module</code></td>
      <td>Import path or file doesn’t exist</td>
      <td>Double-check file names and paths</td>
    </tr>
    <tr>
      <td>Tailwind styles don’t apply</td>
      <td>Misconfigured setup</td>
      <td>Restart dev server after config changes</td>
    </tr>
    <tr>
      <td>Broken layout</td>
      <td>CSS or HTML errors</td>
      <td>Use devtools in browser to inspect</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">npm install</code> errors</td>
      <td>Conflicting dependencies</td>
      <td>Delete <code class="language-plaintext highlighter-rouge">node_modules</code>, run <code class="language-plaintext highlighter-rouge">npm install</code> again</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">ReferenceError</code>, <code class="language-plaintext highlighter-rouge">undefined</code>, etc.</td>
      <td>JS bugs</td>
      <td>Read stack trace, debug in browser console or Cursor agent</td>
    </tr>
    <tr>
      <td>App crashes on build</td>
      <td>Mismatched imports or component nesting</td>
      <td>Use <code class="language-plaintext highlighter-rouge">console.log()</code> to trace it, ask Cursor to help debug</td>
    </tr>
    <tr>
      <td>Confused by what a file is doing</td>
      <td>Too much AI-generated code</td>
      <td>Ask Cursor: “Explain what this file does and how it works”</td>
    </tr>
  </tbody>
</table>

<h2 id="step-7-deploy-to-vercel">Step 7: Deploy to Vercel</h2>

<p>Once your app works locally, deploy it:</p>

<h3 id="1-push-your-project-to-github">1. Push your project to GitHub</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git init
git add .
git commit -m "initial commit"
gh repo create your-app-name --public --source=. --remote=origin --push
</code></pre></div></div>

<p>If you don’t have GitHub CLI (<code class="language-plaintext highlighter-rouge">gh</code>) installed, you can create a repo on the GitHub site and push it manually.</p>

<h3 id="2-deploy-to-vercel">2. Deploy to Vercel</h3>

<ul>
  <li>Go to <a href="https://vercel.com/">https://vercel.com</a></li>
  <li>Sign in with GitHub</li>
  <li>Import your repo</li>
  <li>Click <strong>Deploy</strong></li>
</ul>

<p>Vercel gives you a public URL in seconds. Push to GitHub again anytime you want to update the live site.</p>

<h2 id="summary-what-you-just-set-up">Summary: What You Just Set Up</h2>

<table>
  <thead>
    <tr>
      <th>Step</th>
      <th>Outcome</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Devplan</td>
      <td>Structured PRD, Tech Design, and Build Plan with AI prompts</td>
    </tr>
    <tr>
      <td>Cursor</td>
      <td>AI-native IDE with terminal + prompt-based code generation</td>
    </tr>
    <tr>
      <td>Next.js app</td>
      <td>Full frontend app running locally at <code class="language-plaintext highlighter-rouge">localhost:3000</code></td>
    </tr>
    <tr>
      <td>Command line</td>
      <td>Used to install, run, and debug</td>
    </tr>
    <tr>
      <td>Vercel deploy</td>
      <td>Live app online, ready to share</td>
    </tr>
  </tbody>
</table>

<h2 id="final-notes">Final Notes</h2>

<p>You will hit errors. That’s part of it. But now you have:</p>

<ul>
  <li>A plan (Devplan)</li>
  <li>An assistant (Cursor)</li>
  <li>A working stack (Next.js, Tailwind, Node, Git)</li>
  <li>A feedback loop (localhost → fix → refresh → repeat)</li>
</ul>

<p>Take it one feature at a time. Each small thing you ship teaches you how real software is built.</p>]]></content><author><name>Devplan Team</name><email>info@devplan.com</email></author><category term="Engineering" /><summary type="html"><![CDATA[A step-by-step guide to building your first web app with AI coding tools — from idea to working product, no experience required.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://devplan.com/assets/og/og-default.png" /><media:content medium="image" url="https://devplan.com/assets/og/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Outcome-Based Agile</title><link href="https://devplan.com/blog/outcome-based-agile/" rel="alternate" type="text/html" title="Outcome-Based Agile" /><published>2025-06-23T06:00:00-07:00</published><updated>2025-06-23T06:00:00-07:00</updated><id>https://devplan.com/blog/outcome-based-agile</id><content type="html" xml:base="https://devplan.com/blog/outcome-based-agile/"><![CDATA[<p>Most teams today operate in tension between agile process and business pressure. Customers want dates. Sales and marketing need launch timelines. Leadership needs coordination. And engineers just want clarity, not chaos.</p>

<p>But too often, teams are buried in ceremonies, documentation, and shifting priorities, while the core question of “<strong>what are we delivering, and when?</strong>” remains hard to answer. The result is misalignment, burnout, and wasted energy trying to connect team activity with business expectations. Traditional Agile tends to emphasize sprint plans, story points and velocity, but this is not the language of customers or the business. Outcome-Based Agile puts outcomes and deliverables at the center, making impact, not activity, the measure of success.</p>

<p>Outcome-Based Agile is a pragmatic delivery model based on decades of experience at top companies where teams plan and deliver a meaningful outcome in a set timeframe. It gives the business predictability and gives teams the flexibility to build smart and tie their work directly to customer impact.</p>

<p>The most successful projects don’t wing it. They <strong>invest in planning upfront</strong>, then <strong>execute fast and clean</strong>. That’s the core principle here: define the outcome, shape the scope, then build with confidence and have product, engineering, and go-to-market teams all in sync.</p>

<h2 id="what-outcome-based-agile-is">What Outcome-Based Agile Is</h2>

<p>In Outcome-Based Agile, the project is the unit of planning, not the sprint. Impact-driven projects are the language of business and the outcome that customers care about. This is why projects are what is tracked inside of most major tech companies, not sprints (even if a given team is working in sprints). Projects make outcomes visible, provide containers for planning and tracking, and align cross-functional teams like sales, marketing, and support around a shared timeline. Teams still ship incrementally throughout the project behind feature flags, so they can test and validate early, but launches are what teams are driving toward. Smaller issues or standalone tickets still fit into the plans, but are typically tracked independent of projects and have dedicated time allocated to them (e.g. 20% of time for non-project work).</p>

<p>Simply put, Outcome-Based Agile is:</p>

<ul>
  <li>A clear set of customer-aligned deliverables with target timeframes</li>
  <li>A defined set of features and scope in each that can flex during execution</li>
  <li>Continuous development shipped incrementally and launched aligned with the business</li>
</ul>

<h2 id="how-it-works">How It Works</h2>

<ol>
  <li><strong>Set the Outcome.</strong> Define the business impact. <em>Example: “Increase activation by 15%.”</em></li>
  <li><strong>Explore the Problem Space.</strong> Evaluate different solutions for this problem space. <em>Example: “Streamline our onboarding process.” or “Add demo mode.” or “Add inline tutorials.”</em></li>
  <li><strong>Plan the Scope.</strong> Write a PRD or product brief for the chosen solution(s). Create prototypes with AI to illustrate the solution in action. As a team, align on specific requirements, project scope, UX flows and technical approach. Break work into user stories, and create high-level estimates.</li>
  <li><strong>UX and Tech Design.</strong> UX and tech design is created and reviewed by the team. Refine the PRD, break user stories into tasks, update estimates and prepare prompts for AI coding.</li>
  <li><strong>Build + QA Continuously.</strong> Engineers and AI agents execute together on scoped stories. Updates are shipped behind flags to test safely before launch.</li>
  <li><strong>Stakeholder Visibility.</strong> Real-time updates based on git check-in and team demos to show progress, risks, and trade-offs. Business stakeholders stay aligned, not surprised.</li>
  <li><strong>Launch + Measure.</strong> Launch the deliverable in coordination with marketing and sales. Track the impact and capture learnings.</li>
</ol>

<h2 id="why-it-works">Why It Works</h2>

<ul>
  <li>Aligns delivery with business impact</li>
  <li>Ships continuously, but launches intentionally</li>
  <li>Keeps teams focused, autonomous and outcome-focused</li>
</ul>

<h2 id="how-devplan-supports-it">How Devplan Supports It</h2>

<p>Devplan gives modern teams the structure to run Outcome-Based Agile with AI-native workflows:</p>

<ul>
  <li><strong>Contextually-aware agents for PRDs and user story creation</strong></li>
  <li><strong>Data-driven automated estimates and confidence scores with risk identification</strong></li>
  <li><strong>Built-in prototype support with design guidance</strong></li>
  <li><strong>Agent-guided technical design for key architecture decisions</strong></li>
  <li><strong>Task breakdown optimized for AI agents</strong></li>
  <li><strong>CLI-based developer workflow to pull in detailed instructions to IDE</strong></li>
  <li><strong>Automated stakeholder updates</strong> <em>(coming soon)</em></li>
</ul>

<p>Outcome-Based Agile is already how high-functioning, established teams at top companies build and launch. We just gave it a name and supercharged it with AI inside of Devplan.</p>]]></content><author><name>Devplan Team</name><email>info@devplan.com</email></author><category term="Product" /><summary type="html"><![CDATA[A pragmatic alternative to sprint theater: how to run a delivery process that prioritizes business outcomes and customer impact over velocity metrics.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://devplan.com/assets/og/og-default.png" /><media:content medium="image" url="https://devplan.com/assets/og/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Introducing the Devplan CLI</title><link href="https://devplan.com/blog/introducing-the-devplan-cli/" rel="alternate" type="text/html" title="Introducing the Devplan CLI" /><published>2025-05-19T06:00:00-07:00</published><updated>2025-05-19T06:00:00-07:00</updated><id>https://devplan.com/blog/introducing-the-devplan-cli</id><content type="html" xml:base="https://devplan.com/blog/introducing-the-devplan-cli/"><![CDATA[<p>The Devplan CLI is your new command-line companion for bringing structured, AI-assisted product development directly into your workflow.</p>

<p>Whether you’re using Cursor or JetBrains Junie, the Devplan CLI gives you the power to bridge product planning and real code execution, right inside your AI-enabled IDE.</p>

<p>You already know the pain of jumping between docs, tickets, Slack threads, and your editor. The CLI eliminates that mess. Go from rough idea to production-ready feature in record time, with confidence that what you’re building is scoped, aligned, and complete.</p>

<h2 id="what-is-it">What Is It?</h2>

<p>The Devplan CLI is a command-line interface built in Go that connects to Devplan’s backend via secure, protobuf-based APIs. It pulls custom rules files along with detailed project requirements, test cases, architectural guidance and edge cases built specifically for coding agent and delivers the output directly into your IDE.</p>

<p>With the CLI, you can:</p>

<ul>
  <li>Fetch scoped work directly from your Devplan projects</li>
  <li>Inject coding agent-ready instructions into your IDE of choice</li>
  <li>Keep features and requirements in sync from product definition to engineering development</li>
</ul>

<p>In short, it’s the orchestration layer between product thinking and code execution.</p>

<h2 id="how-do-you-use-it">How Do You Use It?</h2>

<p>Here’s how to use it:</p>

<ul>
  <li><strong>Authenticate once</strong>, stay logged in securely</li>
  <li><strong>Select your IDE, company, project, and feature</strong> interactively</li>
  <li><strong>Sync with your Git repo</strong> and bring in context-aware tasks</li>
  <li><strong>Focus on a feature</strong>, and Devplan guides you with scoped instructions</li>
  <li><strong>Pull clean, AI-generated plans</strong> directly into your local dev workflow</li>
</ul>

<h2 id="getting-started-in-60-seconds">Getting Started in 60 Seconds</h2>

<h3 id="1-install-the-cli">1. Install the CLI</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/bin/bash -c "$(curl -fsSL https://app.devplan.com/api/cli/install)"
</code></pre></div></div>

<h3 id="2-authenticate-with-devplan">2. Authenticate with Devplan</h3>

<p>This sets up your credentials securely and gives the CLI access to Devplan projects.</p>

<h3 id="3-initialize-in-your-repo">3. Initialize in Your Repo</h3>

<p>Navigate to your local repo, then you’ll be prompted via terminal UI to pick your company, project, feature, and IDE.</p>

<h3 id="4-pull-instructions-into-your-ide">4. Pull Instructions into Your IDE</h3>

<p>In your coding agent input, type the command to pick up and run the latest scoped feature instructions.</p>

<h2 id="full-list-of-commands">Full List of Commands</h2>

<table>
  <thead>
    <tr>
      <th>Command</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">auth</code></td>
      <td>Authenticate with Devplan to enable all other functionality securely.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">clean</code></td>
      <td>Clean up individual repositories from your Devplan workspace.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">clone</code></td>
      <td>Clone a repository and immediately focus on a feature for fast onboarding.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">focus</code></td>
      <td>Focus on a specific feature from your selected project.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">help</code></td>
      <td>View help content and usage examples for any command.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">self</code></td>
      <td>Display information about your currently authenticated user.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">update</code></td>
      <td>Update your CLI to the latest version with one command.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">version</code></td>
      <td>Print the current CLI version (useful for debugging or CI logs).</td>
    </tr>
  </tbody>
</table>

<h2 id="try-it-now">Try It Now</h2>

<p>If you’re already using Devplan, just run the install command above, then authenticate and initialize in your repository. You’ll be moving your next feature into production faster than ever.</p>

<p>We’ll be sharing more CLI tips, workflows, and advanced usage patterns soon!</p>]]></content><author><name>Devplan Team</name><email>info@devplan.com</email></author><category term="Announcements" /><summary type="html"><![CDATA[The Devplan CLI bridges product planning and AI code execution — bring structured, context-aware development directly into your IDE workflow.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://devplan.com/assets/og/og-default.png" /><media:content medium="image" url="https://devplan.com/assets/og/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>