From Agent to Production -- The Pipeline That Lets You Sleep

The goal of agentic engineering is not to write code faster. It is to ship reliable software with less human intervention per deployment.

This is the final post in the series. We have covered the plan, the code, the QA review, and the debugging tools. Now: how does all of this actually reach production?

Here is the complete deployment pipeline for two real projects — this portfolio site (luonghongthuan.com) and CubLearn — with the exact configuration, the failure modes we have hit, and the monitoring that catches what the QA agent misses.

The Full Pipeline

SPEC
  |
  v
Plan Mode (agent reads codebase, produces plan)
  |
  v
QA Review Pass 1 (plan audit -- reject or approve)
  |
  v
Code Generation (coding agent executes plan)
  |
  v
QA Review Passes 2-4 (wiring + tests + conventions)
  |
  v
Human Review (5-minute diff scan)
  |
  v
git push -> GitHub
  |
  v
GitHub CI (TypeScript compile + unit tests + build)
  |
  +-- FAIL: notification + block deploy
  |
  v (pass)
Cloudflare Pages Build (Astro build + Wrangler deploy)
  |
  +-- FAIL: notification + no traffic switch
  |
  v (pass)
Smoke Tests (automated endpoint checks)
  |
  +-- FAIL: automatic rollback to previous deployment
  |
  v (pass)
Traffic Switch (new deployment receives 100% of traffic)
  |
  v
Log Monitoring (Wrangler + Docker + alerts for error spikes)

Every step is automated except the human review. The human review takes 5 minutes. The rest runs without me.

GitHub CI: The First Gate

Before anything deploys, GitHub CI validates the code.

Configuration for luonghongthuan.com (.github/workflows/ci.yml):

name: CI
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: TypeScript check
        run: npx tsc --noEmit

      - name: Lint
        run: npm run lint

      - name: Build validation
        run: npm run build
        env:
          SKIP_CONTENT_VALIDATION: 'true'

      - name: Unit tests
        run: npm test

What each step catches:

TypeScript check: Type errors that the agent introduced (wrong return types, missing null checks, incorrect generic parameters)
Lint: Convention violations that slipped past QA review (unused imports, any types, console.log in production code)
Build validation: Astro-specific errors (broken content collection schemas, invalid frontmatter, missing component props)
Unit tests: Behavioral regressions in utility functions

The CI run takes approximately 3 minutes. If it fails, the push is marked as failed and no deployment happens.

What CI does NOT catch:

UI rendering issues (no browser in CI)
Integration failures with external services
Edge cases in Cloudflare Workers that only appear under real network conditions
Performance regressions

This is why smoke tests exist.

Cloudflare Pages Build

For luonghongthuan.com, Cloudflare Pages is connected to the GitHub repository and builds automatically on every push to main.

Build configuration in the Cloudflare Pages dashboard:

Build command: npm run build
Output directory: dist/
Node version: 20

The build produces:

Static HTML for all blog posts and pages
Cloudflare Workers for API functions (functions/api/)
Assets bundled and hashed for cache busting

Build failure modes we have hit:

Markdown encoding errors — em-dashes and smart quotes in markdown content caused CF Workers 1101 errors at build time. Fixed by adding a pre-build markdown linter.
Missing environment variables — the TTS function requires GOOGLE_TTS_API_KEY. If this secret is not set in Cloudflare’s dashboard, the build succeeds but the function fails at runtime. Fixed by adding a startup check in the function.
Stale cache serving old content — after a successful build, Cloudflare CDN sometimes serves the old cached version. Fixed by verifying content after deploy with a cache-busted request (?v=${timestamp}).

Smoke Tests: The Final Verification

After every deployment, automated smoke tests run against the live site.

My smoke test suite for luonghongthuan.com:

// scripts/smoke-tests.ts
const BASE_URL = 'https://luonghongthuan.com';

const tests = [
  // Core pages
  { url: '/', check: (html: string) => html.includes('Luong Hong Thuan') },
  { url: '/en/blog/', check: (html: string) => html.includes('blog-post') },
  { url: '/vi/blog/', check: (html: string) => html.includes('bai-viet') },

  // API endpoints
  {
    url: '/api/tts-config',
    check: (json: object) => json.provider !== undefined,
  },
  {
    url: '/api/tts',
    method: 'POST',
    body: { text: 'hello', voice: 'en-US-Chirp3-HD-Charon', lang: 'en' },
    check: (res: Response) => res.headers.get('Content-Type') === 'audio/mpeg',
  },

  // TTS cache verification
  {
    url: '/api/tts',
    method: 'POST',
    body: { text: 'smoke test phrase', voice: 'en-US-Chirp3-HD-Charon', lang: 'en' },
    check: (res: Response) => {
      const first = res.headers.get('X-TTS-Cache');
      // First request should be MISS
      return first === 'MISS';
    },
  },
  {
    url: '/api/tts',
    method: 'POST',
    body: { text: 'smoke test phrase', voice: 'en-US-Chirp3-HD-Charon', lang: 'en' },
    check: (res: Response) => {
      // Second request with same content should be HIT
      return res.headers.get('X-TTS-Cache') === 'HIT';
    },
  },

  // Newsletter
  {
    url: '/api/subscribe',
    method: 'POST',
    body: { email: 'smoke-test@example.com', dry_run: true },
    check: (json: object) => json.success === true,
  },
];

async function runSmoke() {
  let failed = 0;
  for (const test of tests) {
    const result = await runTest(test);
    if (!result.pass) {
      console.error(`FAIL: ${test.url} -- ${result.error}`);
      failed++;
    }
  }
  if (failed > 0) {
    console.error(`${failed} smoke tests failed -- triggering rollback`);
    await triggerRollback();
    process.exit(1);
  }
  console.log('All smoke tests passed');
}

Smoke tests run automatically via a Cloudflare Workers cron trigger, 5 minutes after each deployment completes.

If smoke tests fail: The triggerRollback() call uses the Cloudflare Pages API to revert to the previous deployment. No traffic remains on the failing version for more than 5 minutes.

Log Monitoring: What Happens After Deploy

Even when smoke tests pass, production can surprise you. Log monitoring is the ongoing safety net.

Wrangler log streaming:

# Real-time logs from Cloudflare Workers
wrangler tail --format=pretty

# Filter to just errors
wrangler tail --format=json | jq 'select(.outcome == "exception")'

Alerts I have configured:

Error rate > 1% of requests in any 5-minute window — Slack notification
TTS API errors (Google 429, 503) — immediate Slack notification
Cache MISS rate > 80% over 1 hour — investigation trigger (may indicate cache eviction problem)
Build time > 120 seconds — investigation trigger (may indicate content growth issue)

What monitoring has caught:

A Vietnamese blog post with corrupted encoding causing the TTS function to return 502 on every request for that page. Caught 11 minutes after deploy (before most users encountered it). Fixed in 20 minutes.
The Cloudflare CDN cache serving stale HTML after a deploy. Monitoring showed cf-cache-status: HIT on pages that should have been MISS after the new deploy. Triggered a manual cache purge.

CubLearn: The Education App Pipeline

CubLearn has a more complex pipeline because it runs as a full Next.js application (not static site).

Stack: Next.js 16, React 19, Cloudflare Workers (API layer), Cloudflare D1 (database)

Pipeline:

Push to main
  |
  v
GitHub CI
  - TypeScript check (Next.js + custom game-engine package)
  - ESLint + Prettier
  - Unit tests (game engine, content utilities)
  - E2E tests (Playwright -- key user journeys)
  |
  v
Cloudflare Workers deploy (API layer)
  - D1 database migrations run automatically
  |
  v
Next.js build + Cloudflare Pages deploy (frontend)
  |
  v
Smoke tests
  - App loads and renders
  - Game engine initializes
  - Lesson content loads
  - Pronunciation exercises work
  |
  v
Traffic switch

The YLE integration deployment:

When we added 8 game lessons and 3 memory card sets (from Part 2 of this series), here is what the CI logs showed:

TypeScript check: PASS (0 errors)
ESLint: PASS
Unit tests: 47 passed, 0 failed
  -- english-lessons.test.ts: 12 tests (all new YLE lessons validated)
  -- memory-pairs.test.ts: 8 tests (all new card sets validated)
  -- pronunciation.test.ts: 6 tests (all new exercises validated)
E2E tests: 3 passed (game loads, quiz works, memory game works)
Build: PASS (47.2 seconds)
Deploy: PASS
Smoke tests: PASS

Total from push to live: 4 minutes 20 seconds.

The Human Role in This Pipeline

I said at the start: every step is automated except the human review. Let me be specific about what that means.

The human review is NOT:

Line-by-line code review (that is the QA agent’s job)
Test verification (that is CI’s job)
Bug hunting (that is the QA agent + smoke tests)

The human review IS:

Sanity check: does the diff match what was requested?
Scope check: is anything in the diff that was NOT in the spec?
Judgment: is there a reason an automated system would not catch this?
Authorization: I am the one who decides this ships

The 5-minute review is about judgment, not inspection. The inspection has already happened.

When the review is done, I push. The pipeline takes it from there.

Start Here: The Minimum Viable Pipeline

You do not need everything in this post to start. The minimum viable pipeline is:

GitHub CI with TypeScript check and build validation — catches most breaking changes before they deploy
One smoke test on your most critical endpoint — proves the application is alive after deploy
git revert discipline — when something breaks, revert immediately, fix in a branch

That is enough to be meaningfully safer than pushing directly to production.

Add QA agent review, monitoring, and automatic rollback as your project grows. Build the pipeline as you learn what breaks.

The goal is not a perfect pipeline on day one. The goal is a pipeline that gets better every time something goes wrong.

The Series: Complete

This is the final post in “Built with Agentic Engineering.”

We started with the manifesto: what agentic engineering means and why vibe coding is not the destination.

We covered plan mode: why research before code is not optional, how specifications prevent stubs, and how the QA agent audits plans before a line is written.

We covered the QA agent: wiring verification, test coverage auditing, convention compliance, and the real data on what it costs versus what it saves.

We covered evidence-based debugging: Chrome DevTools MCP, Docker logs, Serena semantic understanding, and why measuring beats guessing.

And we covered the pipeline: CI, build, smoke tests, monitoring, rollback.

The line at the bottom of every post means something real:

Built with agentic engineering. Verified by a QA agent. Reviewed by a human. Shipped to production.

It is not a workflow for everyone or every project. But if you are building software that matters — software that people will rely on, that has to work every time, that someone will maintain for years — it is worth building the system that builds the software.

The agent is capable. The system is the craft.

This is Part 5 of the “Built with Agentic Engineering” series. Previous: Evidence-Based Debugging Start from the beginning: The New Way Software Gets Made

Export for reading

From Agent to Production -- The Pipeline That Lets You Sleep

The Full Pipeline

GitHub CI: The First Gate

Cloudflare Pages Build

Smoke Tests: The Final Verification

Log Monitoring: What Happens After Deploy

CubLearn: The Education App Pipeline

The Human Role in This Pipeline

Start Here: The Minimum Viable Pipeline

The Series: Complete

Comments

On this page

From Agent to Production -- The Pipeline That Lets You Sleep