Spec Before Code: How I Built a 9-Skill Framework for AI-Driven Development

The Problem Nobody Talks About

AI coding agents are genuinely impressive. They can scaffold a React app, write tests, debug race conditions, and explain architecture decisions — all in plain English.

But here's what happens after the first session ends.

You come back the next day, open a new conversation, and the agent has no idea what you built yesterday. No memory of the folder structure you agreed on. No recollection of the TypeScript conventions you locked in. No awareness that you already have a storage abstraction layer and absolutely do not want anything touching localStorage directly.

You describe the project again. The agent makes reasonable assumptions. Some of them are wrong. By the third feature, the codebase is inconsistent — three different patterns across three files, a scope change absorbed silently mid-task, and a bug introduced because nobody wrote a regression test before the fix.

This isn't a Claude problem. This is a structural problem. Agents improvise. They forget. They drift.

I got tired of it. So I built a fix.

What Is a Skill?

Think of it like handing a new assistant a printed instruction manual before they start work.

The manual says: here's what to ask first. Here's what not to touch. Here's how to know when you're done.

A skill is that manual. It's a named, step-by-step workflow that loads into Claude's context at runtime — replacing improvisation with a repeatable, auditable process. Nine skills. Each one handles a specific moment in the development lifecycle. Together they form a complete framework for building with AI agents without losing control.

The Nine Skills

01 brainstorm: Shape the idea before writing a line of spec

Every project starts with a fuzzy idea. brainstorm: turns it into a validated, buildable brief through structured discovery.

It won't let you skip the hard questions. What is the actual problem? Who has it? If this could only do one thing perfectly, what would it be? What does v1 genuinely need — not what would be nice, what's actually required?

Two paths: a personal project flow (8 steps) and a client project flow (13 steps). The client path adds stakeholders, measurable success metrics, constraints, IP ownership, and delivery definition — because those are the questions that cause chaos mid-build if you skip them.

A fast path exists for well-formed ideas. If your message already contains the problem, user, core value, features, stack, and constraints — it detects this and goes straight to BRIEF.md.

The output is a single document that every downstream skill reads. Nothing gets asked twice.

The question it always asks: If this could only do ONE thing perfectly, what would it be? Not a feature — the outcome.

02 architect: Lock every code pattern before a single line of spec exists

Without deliberate architecture decisions, agents make reasonable but unconsidered defaults. Folder structure just happens. State management is inconsistent across features. By Phase 2 it's too late to fix without a full refactor.

architect: is a recommendation-first conversation. It presents options, you say "sounds good" or override. Folder structure, state management, TypeScript settings, component library, testing philosophy, git conventions — all decided explicitly before any code exists.

The output is ARCHITECTURE.md — the agent's constitution for the project. Every task from Phase 1 through Phase 3 is measured against it. The review: skill uses it as its primary reference for drift detection.

Good architecture on a small project costs 20 minutes. Bad architecture on a small project costs days of refactoring.

03 new: Generate the full project kit in one shot

After brainstorm: and architect:, new: reads both documents and generates everything the agent needs to work autonomously.

SPEC.md with acceptance criteria
An interactive HTML wireframe covering all screens
PLAN.md with phases, data model, and API contract
TASKS.md with phase gates built in from day one
CLAUDE.md at the project root — the file the agent reads at the start of every single session
A CODEBASE.md placeholder that gets populated after Phase 1 completes

The key is TASKS.md. It's the only file you ever need to open. Plain English throughout. Updated after every task. One word to keep building: next.

After new: runs, the agent is never blank again. Every future session starts by reading five files in order. Context is preserved automatically.

04 feature: Spec before code, every time, for every addition

Features should never be added informally. "Just quickly add dark mode" mid-task produces a codebase that doesn't match its documentation, a data model that wasn't planned for, and a change log that has no record of why things shifted.

feature: runs spec before code every time. It checks for an existing incomplete feature with the same name before creating a duplicate. It reads the current SPEC.md to ensure the new feature doesn't contradict existing acceptance criteria. It asks a graduated UI question — full wireframe, brief note, or skip — so you're not generating wireframes for backend-only work.

Every task gets a size estimate (S/M/L) and a total hours estimate upfront. Every task has exact file paths, acceptance criteria, and a self-verify step. A conformance checklist at the end that must be fully ticked before the feature is shippable.

The output is a dated folder in vibe/features/ with three files: FEATURE_SPEC.md, FEATURE_PLAN.md, and FEATURE_TASKS.md. TASKS.md and CLAUDE.md are updated so the agent knows what's active next session.

05 design: Logic first. Design after. Never mixed.

The line between logic and aesthetics blurs when you're building fast. A developer styles a component while building its state management. The design tokens are inconsistent. DESIGN_SYSTEM.md doesn't exist. The next session invents new colours from scratch.

design: enforces a hard separation. The vibe agent owns logic, data flow, spec compliance, and tests. design: owns aesthetics, layout, feel, and interactions. These never mix. If a design change requires a logic change — it stops, flags it, logs it, and refuses to proceed.

It reads DESIGN_SYSTEM.md first every time. Detects conflicts between DESIGN_SYSTEM.md and SPEC.md before starting. For three or more components, generates a design plan and waits for one approval before executing everything in order.

Dark mode is a dual-token system. Every color needs a light and dark value, implemented at the token level — not by overriding component styles. That's the difference between a maintainable design system and a pile of one-off overrides.

06 bug: Regression test first. Root cause confirmed. Smallest fix.

The first thing bug: does is triage. Not diagnose, not fix — triage.

Trivial: root cause is immediately obvious, fix is 1–2 lines in one file, no regression risk. Three tasks, no folder created, inline in TASKS.md. Done in minutes.
Significant: root cause needs investigation, multiple files involved, regression risk. Full workflow kicks in. A vibe/bugs/[date-slug]/ folder is created. BUG_SPEC.md, BUG_PLAN.md, BUG_TASKS.md generated. Diagnosis before fix. Regression test written and confirmed to fail before a single line of fix code is written.

The principle: smallest possible change. If you notice other bugs while fixing this one — log them separately. Never fix other bugs "while you're there."

07 change: Impact shown before any file is touched

Scope changes are inevitable. They're also the most common cause of silent codebase drift.

Someone says "actually also add a note field to check-ins" mid-task. The agent does it. The storage model changes without a retrofit task. SPEC.md still shows the old acceptance criteria. DECISIONS.md has no record. The review: skill finds a data model that doesn't match the spec with no explanation.

change: stops everything. Assesses impact. Shows you the build stage, what files change, which tasks need retrofitting, what already-built code is affected, and the estimated effort. Waits for your confirmation. Never touches a file without it.

Every change gets a D-ID in DECISIONS.md. If you later say "revert the note field change we made last week" — change: finds the D-ID, identifies what was built since then, and adds retrofit tasks to undo it. The history is always there.

08 review: Mandatory gate. No phase progresses without passing.

Issues caught at Phase 1 are cheap — fix once, never propagates. Issues caught at Phase 3 are expensive — retrofit across every feature. Issues caught at deploy are very expensive — production risk.

review: runs automated checks first. Tests, linter, TypeScript, npm audit. Machines verify before the agent touches anything.

Then architecture drift. Any violation of ARCHITECTURE.md is automatically P0 — the most severe finding. File path and line number required. No vague feedback. If it can't be backed by evidence, it doesn't go in the report.

P0 — blocks the gate, generates fix tasks in TASKS.md immediately
P1 — logged to a backlog, must be resolved before deploy
P2/P3 — logged and escalating; unresolved P1s escalate to P0 on next review

Final phase review: P0 and P1 both zero before deploy. No exceptions. Score formula: 10.0 − 1.0 per P0 − 0.5 per P1 − 0.2 per P2.

09 progress: One command. Full picture.

A read-only ASCII dashboard generated from TASKS.md and git history. No files modified. Always safe to run.

Overall percentage complete with phase-by-phase progress bars
Phase gate status — reviewed, in progress, pending, or blocked
Active work with individual task checkboxes and a ← next marker
Backlog count, active bugs, scope changes logged, commits this week
Last git commit and next task in plain English

For a non-technical audience it's the clearest possible answer to "where are we?" For you it's the sanity check before a new session.

The File System: Two Audiences

The entire framework is built around one principle: one file for you, everything else for the agent.

vibe/TASKS.md is the only file you ever need to open. Plain English. Updated after every single task. Tells you exactly what was just built, what's next, and what the phase gates look like.

Everything else — ARCHITECTURE.md, SPEC.md, SPEC_INDEX.md, PLAN.md, CODEBASE.md, DECISIONS.md, DESIGN_SYSTEM.md, the features and bugs folders — is for the agent. You never need to open any of it.

The agent reads five files at the start of every session, in order:

CLAUDE.md
vibe/CODEBASE.md
vibe/ARCHITECTURE.md
vibe/SPEC_INDEX.md
vibe/TASKS.md

Then it confirms the first task before writing any code. Context is fully restored. Every decision made in previous sessions is available. The agent never starts blind again.

The Four Principles Underneath Everything

Spec before code. No task starts without acceptance criteria. The agent re-reads the spec after every task and ticks every criterion before marking done.
Context preservation. Agents forget between sessions. The framework maintains five documents that are read at every session start — so the agent never re-discovers what exists or what decisions have been made.
Incremental verifiable progress. One task at a time. Confirm → build → verify → commit → signal done. No batching. No running ahead.
Drift prevention. DECISIONS.md logs every deviation. change: stops the build and assesses impact. review: checks architecture at every phase gate. Nothing is absorbed silently.

What Actually Changed

Before the framework, a typical session looked like this: describe the project from scratch, agent makes assumptions, some wrong, patterns inconsistent across features, scope change absorbed silently, bug fixed by guessing, no regression test, same bug reappears.

After the framework: first session reads five documents and already knows everything. Every feature starts with a spec. Every pattern decision is documented and enforced. Every scope change is logged with full impact assessment. Every bug has a regression test written before the fix. Every phase has a gate.

The codebase is consistent. The history is auditable. The agent is a collaborator, not a guesser.

Try It

The full interactive presentation is live at vibe-skill-presentation.vercel.app — 17 slides covering all nine skills with real conversation examples and the complete end-to-end project arc from first brainstorm to a reviewed, ship-ready build.

If you're building anything serious with AI agents, the framework is worth understanding. Even if you only adopt two or three skills, the discipline around spec-before-code and drift prevention will change how you work.

Spec before code. Context preserved. Always.