- Research
- Specification
- Blueprint
- Roadmap
Research
Step 1 Context
Description
User
Job to be done
Research
APIs
Code libraries
Specification
Step 2 - o4 mini
Ask me one question at a time so we can develop a thorough, step-by-step spec for an idea. Each question should build on my previous answers, and our end goal is to have a detailed specification I can hand off to a developer. Let’s do this iteratively and dig into every relevant detail.
Remember, only one question at a time.
Here’s the idea:
IDEA
Step 3
Now that we’ve wrapped up the brainstorming process, can you compile our findings into a comprehensive, developer-ready specification? Include all relevant requirements, architecture choices, data handling details, error handling strategies, and a testing plan so a developer can immediately begin implementation.
Blueprint
Step 4 -Claude 3.7
Draft a detailed, step-by-step blueprint for building this project. Then, once you have a solid plan, break it down into small, iterative chunks that build on each other. Look at these chunks and then go another round to break it into small steps. Review the results and make sure that the steps are small enough to be implemented safely with strong testing, but big enough to move the project forward. Iterate until you feel that the steps are right sized for this project.
From here you should have the foundation to provide a series of prompts for a code-generation LLM that will implement each step in a test-driven manner. Prioritize best practices, incremental progress, and early testing, ensuring no big jumps in complexity at any stage. Make sure that each prompt builds on the previous prompts, and ends with wiring things together. There should be no hanging or orphaned code that isn’t integrated into a previous step.
Make sure to separate each prompt section and use prompting best practices. Use markdown.
Each prompt should be tagged as text using code tags. The goal is to output prompts, but context is important as well.
Make sure to write out all the prompts, shorten them if needed to ensure we have every prompt needed to build this entire project. Each prompt should stand alone and not reference other prompts.
Road Map
Step 5
Can you make a ’todo.md’ that I can use as a checklist? Be thorough.
Effectent use of github copilot agent pro
| Action / Mode | Base Model (0 cost) | When to Escalate (Signal) | Escalation Model (Cost) |
|---|---|---|---|
| 1. Assign story / issue to Copilot | GPT-4o / GPT-4.1 / GPT-5 mini | Misunderstands scope, wrong architecture, misses constraints | Gemini 2.5 Pro (×1) or Claude Sonnet 4.5 (×1) |
| 2. Resolve CI/CD workflow failures | GPT-5 mini / GPT-4.1 | Log analysis is shallow, wrong root cause, circular fixes | Claude Sonnet 4.5 (×1) or Gemini 2.5 Pro (×1) |
| 3. Interactive debugging (MCP, local) | GPT-4o / GPT-5 mini | Fails to track state, can’t reason across functions/files | Claude Sonnet 4.5 (×1) or GPT-5 (×1) |
| 4. Product planning & backlog management | GPT-4o / GPT-4.1 | Output is generic, poor prioritization, weak structure | Gemini 2.5 Pro (×1) or Claude Sonnet 4.5 (×1) |
| 5. Test suite development | GPT-4o / Grok Code Fast (×0.25) | Misses edge cases, bad mocks, shallow coverage | GPT-5 mini (0) or Claude Sonnet 4.5 (×1) |
| 6. Large refactor / multi-file changes | GPT-5 mini / GPT-4.1 | Loses global coherence, breaks invariants, wrong decomposition | GPT-5 (×1) or Claude Opus 4.5 (×3) |
| 7. Architecture / system design | GPT-4o / GPT-5 mini | Lacks tradeoff analysis, no dependency reasoning | Gemini 2.5 Pro (×1) or Claude Opus 4.5 (×3) |
| 8. Repetitive boilerplate / scaffolding | Grok Code Fast (×0.25) | Needs domain reasoning or cross-module awareness | GPT-4o (0) or GPT-5 mini (0) |
Below is a task-by-task, zero-multiplier playbook:
-
which 0-cost model to default to
-
why it’s the best fit
-
and how to prompt it so it behaves like a specialist instead of a chatbot.
🧭 Zero-Multiplier Model Map + Prompting Tactics
| Action / Mode | Best 0× Model | Why This One Wins | How To Prompt It (Pattern) |
|---|---|---|---|
| 1. Assign story / issue to Copilot | GPT-4.1 | Best balance of instruction-following + code gen | ”You are implementing exactly this story. Restate requirements, list files to change, then implement. Do not invent scope.” |
| 2. Resolve CI/CD workflow failures | GPT-5 mini | Better multi-step reasoning over logs/config | ”Given this error and this workflow, enumerate 3 likely causes, rank them, propose minimal fix for top one.” |
| 3. Interactive debugging (MCP, local) | GPT-5 mini | Tracks state & logic better across functions | ”Simulate execution path. At each step state variable values. Find first divergence from expected behavior.” |
| 4. Product planning & backlog management | GPT-4o | Best at structuring, summarizing, clustering ideas | ”Given this context, produce: (1) Epics, (2) Prioritized stories, (3) Acceptance criteria, (4) Dependencies.” |
| 5. Test suite development | GPT-4.1 | Strong at code patterns + test idioms | ”Given this function, list edge cases first, then generate table-driven tests covering them.” |
| 6. Large refactor / multi-file changes | GPT-5 mini | Better coherence across multiple components | ”First propose a refactor plan with steps and invariants. Wait. Then apply step 1 only.” |
| 7. Architecture / system design | GPT-4o | Best at synthesis and narrative structure | ”Design 3 options. For each: tradeoffs, risks, migration path. Then recommend one.” |
| 8. Boilerplate / scaffolding | GPT-4.1 | Fast, consistent, low hallucination | ”Generate only the skeleton. No business logic. Follow this existing style.” |
🎯 Why These Pairings Work (Mental Models)
- GPT-4.1 = “Senior implementer”
Great at: code, tests, scaffolding, following instructions precisely. - GPT-4o = “Product + architecture synthesizer”
Great at: structuring messy thoughts, plans, backlogs, options, narratives. - GPT-5 mini = “Junior architect / debugger”
Great at: step-by-step reasoning, tracing logic, cross-file understanding.
🧠 Prompting Patterns That 2–3× Their Effectiveness
- Force a
Two-Phase Response
Phase 1: Analyze and plan. Phase 2: Execute. Do not mix.
This dramatically improves coherence, especially for GPT-5 mini.
- Always Ask for
Structure Before Code
Bad: “Fix this.” Good: “First explain what is wrong and propose a minimal fix. Then implement.”
- Use
Role + Constraints
You are a cautious senior engineer. Constraints:
- Do not change public APIs
- Do not add dependencies
- Do not refactor unrelated code
This reduces “helpful but expensive” rewrites.
- Use
Invariants for Refactors
Invariants:
- All tests must still pass
- Behavior X must remain unchanged
- Only touch these files
This keeps free models from going wild.
- Use
Edge-Case-First for Tests
First list all edge cases. Then generate tests. Then show coverage gaps.
- Use
Ranking & Elimination for Debugging
List 5 hypotheses. Eliminate 3. Deep dive on the top 2.
Propose a minimal experiment.
This turns GPT-5 mini into a systematic debugger.
🪜 The Escalation Ladder (Internalized)
For any task:
- Start with best 0× model for that task
- If it:
- loses state
- loops
- misses global structure
- Then escalate to Claude Sonnet / Gemini / GPT-5
🧠 The Core Idea
Premium models are not for “better answers”.
They are for “bigger cognitive workspaces”. If you structure the problem well, the 0× models handle 80–90% of your workload.