How Planning Impacts AI Coding
Intro
The development community holds varying opinions on AI’s real-world engineering impact. Some report massive productivity improvements, while others find reviewing AI-written code slows them down. This experiment measured how proper planning affects AI-assisted coding productivity.
Experiment
We tested whether carefully prepared requirements at the feature level produce better results than quick hand-written prompts. The task, based on an open-source repository, was implemented twice by each agent: once with simple high-level requirements, once with detailed specifications.
Simple requirements included:
- GitHub repository change analysis functionality
- Automated periodic analysis for enrolled repositories
- Persisted reports available through API
- UI viewability
Detailed requirements covered implementation aspects, design patterns, and architecture decisions. All agents received guidance when stuck but no additional requirement information during implementation.
Criteria
Solutions were evaluated across four dimensions:
- Correctness: Implementation alignment with proper design
- Quality: Code maintainability and adherence to standards
- Autonomy: How independently agents reached final solutions
- Completeness: Satisfaction of explicit requirements
Scores ranged from 1-5, with consistency across all dimensions more valuable than individual high scores for parallel execution capability.
Results
| Solution | Correctness | Quality | Autonomy | Completeness | Mean ± SD | Improvement |
|---|---|---|---|---|---|---|
| Claude, Short | 2 | 3 | 5 | 3 | 3.75 ± 1.5 | 20% |
| Claude, Planned | 4+ | 4 | 5 | 4+ | 4.5 ± 0.4 | — |
| Cursor, Short | 2- | 2 | 5 | 3 | 3.4 ± 1.9 | 20% |
| Cursor, Planned | 5- | 4- | 4 | 4+ | 4.1 ± 0.5 | — |
| Junie, Short | 1+ | 2 | 5 | 3 | 2.9 ± 1.6 | 34% |
| Junie, Planned | 4 | 3 | 4+ | — | 3.9 ± 0.6 | — |
Key Observations
High-quality planning significantly improves correctness and quality. AI assistants need clearly prepared product and technical requirements to deliver intended results and follow guidelines.
Planning reduces score dispersion. Results became more consistent across all AI assistants with detailed, unambiguous requirements. Different agents often chose similar approaches, suggesting any capable coding assistant works well with proper specs.
Smaller tasks work more autonomously. Claude Code completed detailed requirements without nudging, while Cursor and Junie required additional guidance. Breaking work into smaller chunks increases autonomous completion probability.
Code reviews are major bottlenecks. Getting six AI runs near completion proved easier than reviewing two PRs. As AI coding scales, teams need larger features completed autonomously.
Recommendations for Parallel AI Execution
-
Prepare detailed specifications outlining scope, acceptance criteria, test coverage, database changes, and architectural decisions. Remove ambiguity ahead of time. AI handles code placement well but needs guardrails for production-ready output.
-
Keep execution right-sized. Tasks should complete autonomously without constant oversight. Purpose-built tools help generate appropriately scoped tasks for parallel execution across multiple agents.
-
Review every change. Even with proper planning, code rarely reaches production-ready status on first pass. Expect AI to reach approximately 80% completion, requiring manual refinement before merging.