Every blog post on this site was built by an agent. The TTS system that reads this to you was upgraded by an agent. The Cambridge YLE curriculum for CubLearn — 22 files, 8 game lessons, 14 pronunciation exercises — shipped in a single afternoon, driven by agents.
The footer of each post says:
Built with agentic engineering. Verified by a QA agent. Reviewed by a human. Shipped to production.
That is not marketing copy. That is the actual workflow. This series documents it from the ground up — what it means, how it works, and what it takes to make agents ship code you can trust.
Why “Vibe Coding” Is Not the Destination
Vibe coding works. For demos. For MVPs nobody will maintain. For the first three days of a side project before you realize the codebase is held together with hallucinated APIs and hardcoded values.
The Stack Overflow 2025 survey asked 49,000 developers about AI coding tools. The numbers tell the story:
- 84% use or plan to use AI coding tools
- 45% say debugging AI code takes longer than writing it themselves
- 29% actually trust the output
That gap — 84% adoption, 29% trust — is the problem vibe coding creates. Developers are using AI, but spending more time cleaning up after it than it saves them.
The answer is not to use AI less. It is to use it differently.
What Agentic Engineering Actually Is
Agentic engineering means engineering the system that the agent operates within — not just prompting the agent and hoping.
The shift is from:
- “AI, write me code” (vibe coding)
To:
- “AI, here is the specification. Here are the constraints. Here is the verification suite. Now build it — and prove it works.”
The difference is not the prompt. It is everything around the prompt: the harness, the plan mode, the QA loop, the automated tests, the deployment gates.
The agent writes the code. You design the system that makes the code worth writing.
The Workflow in Practice
Here is exactly how this portfolio site gets built and updated:
1. Specification First
Before any agent touches code, the task is defined:
- What feature are we adding?
- What files will be modified?
- What does success look like (tests, behavior, output)?
- What must NOT break?
This lives in CLAUDE.md — the persistent project memory that loads into every session. Key decisions, architecture choices, naming conventions, “do not touch” warnings. 400 lines of context that keep the agent from re-learning the codebase every session.
2. Plan Mode — Research Before Code
Claude Code’s plan mode is non-negotiable. Before writing a single line, the agent:
- Reads all relevant files (
grep,read, knowledge graph traversal) - Maps the call chain (what calls what, what breaks if X changes)
- Produces a written plan with specific file paths and function names
- Lists everything that could go wrong
The plan gets reviewed. Not always by me — more on that in Part 3.
3. Code Generation — Constrained and Monitored
With an approved plan, the coding agent executes. Key constraints in our workflow:
- Modular architecture — each module is small enough to fit in one context window. The agent edits one module at a time.
- Coding conventions — enforced via CLAUDE.md and custom linting hooks. The agent cannot ship code that violates our style (snake_case vs camelCase, async patterns, error handling conventions).
- Hook system — pre-commit hooks run automatically. The agent cannot bypass them.
4. QA Agent Review — The Brutal Filter
After code generation, a dedicated QA agent reviews the output against a structured checklist:
- Test coverage — does the code have tests? Do they actually test the behavior?
- Wiring verification — is every new function actually called somewhere in the application flow?
- Convention compliance — does the code follow our established patterns?
- Edge cases — what happens with empty input? Null values? Network failure?
- Plan fidelity — does the implementation match what was planned?
The QA agent has rejected plans 4-5 times before approving them. It runs 10x slower than raw code generation. The quality difference is dramatic.
5. Human Review — The 5-Minute Scan
After QA approval, I review the diff. Not line by line — I trust the QA agent for that. I am looking for:
- Does this make architectural sense?
- Is anything surprising in the diff?
- Are there any security or data concerns the QA agent might miss?
This takes 5 minutes for most changes. Sometimes 30 seconds.
6. CI/CD — The Automated Gate
GitHub CI runs on every push:
- TypeScript compilation
- Unit tests
- Integration tests (where applicable)
- Build validation
If any step fails, nothing deploys. The agent’s code must pass the same gates as human code.
7. Smoke Tests on Deploy — The Final Check
After Cloudflare Pages deploys:
- Automated smoke tests hit key endpoints
- Docker logs are checked for runtime errors
- Core user journeys are verified (blog post renders, TTS works, newsletter subscribe works)
If smoke tests fail: automatic rollback.
If they pass: traffic routes to the new deployment.
A Real Example: Upgrading TTS in 30 Minutes
Here is what this workflow looks like in practice. Last night, I asked an agent to upgrade the site’s Text-to-Speech from Journey voices to Chirp3-HD voices with 30-day caching.
The spec I provided:
- Upgrade to Google Chirp3-HD voices (
en-US-Chirp3-HD-Charon,vi-VN-Chirp3-HD-Charon) - Add Cloudflare Cache API with SHA-256 hash keys and 30-day TTL
- Auto-detect page language (
document.documentElement.lang) for voice selection - Update TTSPlayer.astro with 4 targeted patches
What the agent did:
- Read the existing
functions/api/tts.ts(108 lines) - Read
functions/api/tts-config.ts(37 lines) - Read the relevant sections of
TTSPlayer.astro(2328 lines, searched forJourney,voice =,/api/tts) - Generated 3 new files and applied 4 surgical patches
- Verified: no em-dashes, no smart quotes, no breaking changes to the audio player interface
Total wall-clock time: 28 minutes from spec to pushed commit.
The old approach (pre-agentic workflow): Read docs, write code, test locally, debug CORS headers, fix the base64 decode, fix the language code extraction, finally push. Probably 3-4 hours.
What This Series Covers
This is Part 1 of a 5-part series. Here is what is coming:
Part 2 — “Plan Mode or Bust” Why every coding session must start with a research phase. How plan mode works in Claude Code. Real examples of plans that got rejected and why.
Part 3 — “The QA Agent” The methodology behind automated code verification. What the QA agent checks that humans miss. The wiring verification problem (code that gets written but never called). Real rejection logs.
Part 4 — “Evidence-Based Debugging” How to give your agent eyes. Chrome DevTools MCP, Docker log streaming, Serena semantic code understanding. The difference between an agent that guesses and an agent that measures.
Part 5 — “From Agent to Production” The full CI/CD pipeline for agentic code. Smoke tests, rollback strategy, deployment gates. How to sleep while your agents ship.
The Real Metric
I used to spend 60-70% of my AI coding time on testing and debugging.
Now I spend it on specification and review.
The agents do more. I direct more. The software is better.
That is the promise of agentic engineering. Not magic — engineering.
This is Part 1 of the “Built with Agentic Engineering” series. Next: Plan Mode or Bust — Why Agents Must Think Before They Code