How Planning Impacts AI Coding

Intro

The development community holds varying opinions on AI’s real-world engineering impact. Some report massive productivity improvements, while others find reviewing AI-written code slows them down. This experiment measured how proper planning affects AI-assisted coding productivity.

Experiment

We tested whether carefully prepared requirements at the feature level produce better results than quick hand-written prompts. The task, based on an open-source repository, was implemented twice by each agent: once with simple high-level requirements, once with detailed specifications.

Simple requirements included:

GitHub repository change analysis functionality
Automated periodic analysis for enrolled repositories
Persisted reports available through API
UI viewability

Detailed requirements covered implementation aspects, design patterns, and architecture decisions. All agents received guidance when stuck but no additional requirement information during implementation.

Criteria

Solutions were evaluated across four dimensions:

Correctness: Implementation alignment with proper design
Quality: Code maintainability and adherence to standards
Autonomy: How independently agents reached final solutions
Completeness: Satisfaction of explicit requirements

Scores ranged from 1-5, with consistency across all dimensions more valuable than individual high scores for parallel execution capability.

Results

Solution	Correctness	Quality	Autonomy	Completeness	Mean ± SD	Improvement
Claude, Short	2	3	5	3	3.75 ± 1.5	20%
Claude, Planned	4+	4	5	4+	4.5 ± 0.4	—
Cursor, Short	2-	2	5	3	3.4 ± 1.9	20%
Cursor, Planned	5-	4-	4	4+	4.1 ± 0.5	—
Junie, Short	1+	2	5	3	2.9 ± 1.6	34%
Junie, Planned	4	3	4+	—	3.9 ± 0.6	—

Key Observations

High-quality planning significantly improves correctness and quality. AI assistants need clearly prepared product and technical requirements to deliver intended results and follow guidelines.

Planning reduces score dispersion. Results became more consistent across all AI assistants with detailed, unambiguous requirements. Different agents often chose similar approaches, suggesting any capable coding assistant works well with proper specs.

Smaller tasks work more autonomously. Claude Code completed detailed requirements without nudging, while Cursor and Junie required additional guidance. Breaking work into smaller chunks increases autonomous completion probability.

Code reviews are major bottlenecks. Getting six AI runs near completion proved easier than reviewing two PRs. As AI coding scales, teams need larger features completed autonomously.

Recommendations for Parallel AI Execution

Prepare detailed specifications outlining scope, acceptance criteria, test coverage, database changes, and architectural decisions. Remove ambiguity ahead of time. AI handles code placement well but needs guardrails for production-ready output.
Keep execution right-sized. Tasks should complete autonomously without constant oversight. Purpose-built tools help generate appropriately scoped tasks for parallel execution across multiple agents.
Review every change. Even with proper planning, code rarely reaches production-ready status on first pass. Expect AI to reach approximately 80% completion, requiring manual refinement before merging.