Every time I think I know what Cloudflare offers, I open the dashboard and find something I’d never noticed. Last month it was AutoRAG — a managed RAG pipeline that didn’t exist six months ago. Before that it was Browser Rendering, a headless Chrome service you can call from a Worker. The month before that, AI Gateway appeared and silently became one of the most useful proxy tools I use.

The problem isn’t that Cloudflare hides these services. It’s that there are now 40+ products across the dashboard, and it’s genuinely hard to know which ones are relevant to what you’re building, which are free, and which require a credit card.

This post is my answer to that problem. It’s a reference guide — the kind you bookmark and come back to when you’re starting a new project or wondering “does Cloudflare have something for this?” It covers three use cases:

  • Self-hosting — exposing homelab services safely without opening ports
  • Web app development — building full-stack apps entirely on Cloudflare’s edge
  • AI research — running inference, vector search, and RAG pipelines without leaving the ecosystem

I’ve written detailed implementation guides for specific Cloudflare services separately. This post links to those rather than repeating the setup steps. Think of it as the map; the other posts are the turn-by-turn directions.


The Free Tier Philosophy

Cloudflare’s business is selling network services to enterprises. The free tier isn’t charity — it’s a customer acquisition strategy. But the side effect is real: the free limits are production-grade, not demo-grade.

AWS gives you 750 hours of a t2.micro per month for one year, then shuts it off. GCP gives you $300 in credits that expire in 90 days. Both free tiers exist specifically to pressure you toward paid before you’re ready.

Cloudflare’s free tier is different. 100,000 Workers requests per day is enough to handle a real personal app. 10GB of R2 storage with zero egress fees is enough for years of a blog’s media library. 10,000 AI inference “neurons” per day is enough to run a dozen LLM experiments. None of these expire.

The upgrade decision is straightforward: if you hit the limits, the Workers Paid plan at $5/month unlocks dramatically more capacity across almost every service in the ecosystem. Most personal projects never get there.


Service Directory

Every Cloudflare service relevant to self-hosting, web apps, and AI research — in one table.

ServiceCategoryFree LimitPaid FromWhat it’s for
PagesWeb hostingUnlimited bandwidth, 500 builds/mo$20/mo ProStatic site hosting with Git CI/CD
WorkersServerless100K requests/day, 10ms CPU$5/moEdge functions, API routes, middleware
KVKey-value store100K reads/day, 1K writes/day, 1GB$5/mo baseSession storage, config, feature flags
D1SQLite database5M rows read/day, 100K writes/day, 1GB$5/mo baseRelational data for Workers apps
R2Object storage10GB, 1M Class A ops/mo$0.015/GB above freeFiles, images, blobs — zero egress fees
QueuesMessage queue1M operations/mo$5/mo baseBackground jobs, webhook processing
Durable ObjectsStateful edgeBasic (Workers Paid required)$5/moWebSocket state, real-time coordination
HyperdriveDB acceleratorWorkers Paid required$5/moPool connections to existing Postgres/MySQL
Analytics EngineCustom analyticsFree (limited)Paid aboveTime-series event tracking from Workers
Email RoutingEmail forwardingFreeFreeForward you@yourdomain.com to any inbox
StreamVideo CDN1K min storage free$5/mo + usageVideo hosting and adaptive streaming
ImagesImage optimizationFree resize transforms$5/moOn-the-fly image resizing via URL
TurnstileCAPTCHAFree, unlimitedFreeBot-proof forms without user friction
TunnelSecure ingressFreeFreeExpose homelab services — no open ports
Zero Trust AccessIdentity/auth50 users free$3/user/moAuth layer in front of any URL
WARPVPN clientFree personal$3/user/moEncrypted DNS + private network access
GatewayDNS/HTTP filtering3 locations free$3/user/moDNS-level malware and ad blocking
WAFFirewall5 custom rules free$20/mo ProBlock attack patterns by rule
DDoSL3/L4/L7 protectionAlways-on, freeFreeAutomatic volumetric attack mitigation
DNSNameserverFreeFreeAuthoritative DNS with Anycast routing
SSL/TLSCertificatesFree, auto-renewingFreeHTTPS for any domain
RegistrarDomain registrationAt-cost (~$10.11/yr .com)At-costBuy/transfer domains at wholesale price
Web AnalyticsTraffic statsFreeFreeCookie-free, no-JavaScript-required analytics
Workers AIAI inference10K neurons/day$0.011/1K neuronsRun 50+ models at the edge
AI GatewayAI proxyFreeFree (currently)Unified proxy for all AI providers
VectorizeVector database30M dimensions freePaid aboveSemantic search, RAG retrieval layer
Browser RenderingHeadless Chrome2 concurrent sessions$5/mo baseScraping, PDF generation from Workers
AutoRAGManaged RAGLimited previewTBDFull document ingestion + RAG pipeline

Use Case 1: Self-Hosting and the Homelab Stack

Running services at home — Jellyfin, Home Assistant, Gitea, Proxmox, Paperless-ngx — used to require a stack of configuration glue: dynamic DNS, nginx reverse proxy, certbot for SSL, UFW rules, and careful port forwarding on the router. Cloudflare replaces most of that with three free services: Tunnel, Access, and Gateway.

Cloudflare Tunnel — The Foundation

Tunnel creates an outbound-only encrypted connection from your server to Cloudflare’s edge. Traffic flows in from the internet, reaches Cloudflare, and is forwarded to your server over that established connection. Your router never has an open port.

What you eliminate: port forwarding, DDNS, certbot, nginx reverse proxy config. Cloudflare handles SSL automatically for any hostname you route through the tunnel.

Free tier: unlimited tunnels, unlimited routes, no bandwidth cap.

I’ve written a full setup guide including Docker Compose configs and the Access policy integration: Cloudflare Tunnel Changed How I Run My Homelab.

Zero Trust Access — The Authentication Layer

Tunnel exposes your services to the internet. Access decides who can reach them. Before a request hits your Gitea or Home Assistant, it hits an Access policy that requires the user to authenticate.

Free tier: up to 50 users, which covers any homelab or small team setup.

Authentication options configured entirely from the dashboard:

  • Email OTP — simplest; sends a one-time code to a verified address. No app required.
  • GitHub OAuth — good if you want to restrict to specific GitHub accounts.
  • Google OAuth — works with any Google account or a specific Google Workspace domain.

The decision of what to protect and how aggressively depends on the service:

ServiceExpose Publicly?Use Access?Auth Method
Blog / PortfolioYesNo
Uptime Kuma status pageYesNoRead-only is fine
JellyfinNoYesEmail OTP
GiteaNoYesGitHub OAuth
Home AssistantNoYesEmail OTP
VaultwardenNoYesEmail OTP + app 2FA
Paperless-ngxNoYesEmail OTP
Proxmox UINever publiclyNeverLAN or WARP only

The Proxmox case is deliberate. The hypervisor management UI is the most sensitive service in a homelab. I don’t route it through Tunnel at all — I access it on the local network or through WARP.

WARP / Cloudflare One — The VPN Layer

WARP is Cloudflare’s client-based VPN alternative. Install it on your phone or laptop, and your device’s DNS goes through Cloudflare’s network. The Zero Trust flavor integrates with your Access policies to give you private network access without public hostnames.

Practical homelab use: access Proxmox’s management interface from your phone when you’re away from home. WARP connects you into the private network, Proxmox never needs a public URL.

When Tailscale is better: if you want a purely private mesh network between devices with no public-facing components at all, Tailscale’s WireGuard-based approach is simpler and purpose-built for that case. Cloudflare WARP + Tunnel is better when you want some services to be publicly accessible (through Tunnel + Access) and some to be private (through WARP).

Gateway — DNS-Level Filtering

Gateway is Cloudflare’s managed DNS resolver that can block malware domains, ad networks, and specific content categories before a request ever leaves your network.

Configuration: set your router’s upstream DNS to Cloudflare’s Gateway IP, and every device on the network benefits. On the free plan you get 3 locations and basic malware blocking.

Practical value for a homelab: blocks known-bad domains before they reach your exposed services. A DNS-level block doesn’t require the WAF to fire — the request never gets to your origin. It’s a lightweight first line of defense.


Use Case 2: Full-Stack Web App Development

This is where Cloudflare’s ecosystem is strongest. You can build a complete, production-grade web application — hosting, serverless compute, relational database, key-value cache, object storage, message queue, analytics — without leaving the ecosystem or opening a separate cloud account.

The Core Trio: Workers + D1 + KV + R2

I’ve covered this stack in detail in The Complete Cloudflare Stack for Developer Portfolios. The short version:

  • Pages hosts your static site, deploys on every git push, and gives you unlimited bandwidth at no cost.
  • Workers / Pages Functions run server-side logic at the edge — form handlers, API routes, middleware.
  • D1 is a SQLite database that lives alongside your Workers. 5M rows read per day, 100K writes, 1GB storage on the free tier.
  • KV is a distributed key-value store for session data, feature flags, or anything you need to read fast from anywhere in the world. 100K reads per day free.
  • R2 is S3-compatible object storage with zero egress fees. Store images, attachments, and generated files. 10GB free.

For the full setup guide with code examples, TypeScript types, and wrangler configuration: The Complete Cloudflare Stack for Developer Portfolios.

Queues — Async Background Jobs

Workers have a CPU time limit: 10ms on the free tier, 30 seconds on Workers Paid. For work that takes longer — sending an email, processing an uploaded file, calling an external API that’s slow — you need to hand off to a background process.

Cloudflare Queues is the answer. A producer Worker publishes a message to a Queue. A consumer Worker picks it up, processes it, and acknowledges it. If the consumer fails, the message is retried automatically.

Free tier: 1 million queue operations per month.

// Producer: publish to queue from a request handler
export default {
  async fetch(req: Request, env: Env): Promise<Response> {
    const { email } = await req.json();
    await env.EMAIL_QUEUE.send({ type: 'welcome_email', to: email });
    return new Response('Queued', { status: 202 });
  }
};

// Consumer: wrangler.jsonc declares this Worker as a queue consumer
export default {
  async queue(batch: MessageBatch<{ type: string; to: string }>, env: Env) {
    for (const msg of batch.messages) {
      await sendEmail(msg.body.to, msg.body.type);
      msg.ack();
    }
  }
};

The wrangler.jsonc binding connects them. The producer doesn’t wait for the email to send — it returns immediately, and the consumer processes the work asynchronously.

Durable Objects — Stateful Edge Compute

Workers are stateless by design. Every invocation starts fresh. For most request/response patterns, that’s exactly right. But some problems need state that’s consistent across many concurrent clients: a live visitor counter, a collaborative cursor system, a WebSocket hub for real-time notifications.

Durable Objects solve this by giving you a single, globally addressable JavaScript object with in-memory state and durable storage. All requests to a specific Durable Object are routed to the same physical instance, so state is always consistent.

When to use Durable Objects vs. D1:

  • D1 for traditional database queries — insert/select/update patterns, user records, blog posts.
  • Durable Objects for real-time coordination — WebSocket connections that need shared state, live counters, collaborative editing.

Practical examples:

  • A live view counter for blog posts that’s accurate to the second (not a cached approximation)
  • A rate limiter that’s consistent across all edge locations
  • A WebSocket server for a real-time chat feature

Durable Objects require Workers Paid ($5/month). There’s no additional charge beyond the base plan for basic use — duration-based billing only kicks in above specific thresholds.

Hyperdrive — Turbocharge Your Existing Database

If you already have a Postgres database on Neon, Supabase, or a VPS, connecting to it from Workers at the edge creates a cold TCP connection on every request. Database connection establishment is slow — easily 100–300ms depending on geography.

Hyperdrive maintains a warm connection pool at the edge. Instead of your Worker connecting to your database server, it connects to Hyperdrive, which already has connections pooled and ready. The result is dramatically lower latency for database-heavy Workers.

When to use: you have an existing Postgres/MySQL database you don’t want to migrate to D1 — maybe because it has complex queries, existing data, or stored procedures that D1 can’t replicate.

Requires: Workers Paid ($5/month).

Analytics Engine — Custom Event Tracking

KV is too slow for high-write event streams. D1 is a transactional database — not optimized for “write one event per page view” patterns. Analytics Engine is Cloudflare’s purpose-built time-series event store, designed to receive writes from Workers without bottlenecking your request handlers.

// Track an event from any Worker
env.ANALYTICS.writeDataPoint({
  blobs: ['blog_post_read', postSlug],
  doubles: [1],
  indexes: ['blog']
});

Practical uses:

  • Custom page view tracking without JavaScript on the client side (the Worker logs the event on every request)
  • API endpoint usage counts
  • Error rate monitoring without a third-party APM

The data is queryable via the Cloudflare API using a SQL-like syntax. It’s not a replacement for a full analytics platform, but for custom metrics it’s simpler and cheaper than anything else in the ecosystem.

Email Routing — Free Email Forwarding

If your domain is registered through or managed by Cloudflare, Email Routing gives you you@yourdomain.com for free. Messages sent to your domain address are forwarded to any personal inbox — Gmail, Fastmail, iCloud, wherever.

Setup takes under five minutes in the dashboard. Cloudflare adds the required MX records automatically.

Bonus: Email Routing lets you write a Worker that handles incoming messages programmatically. Parse the email, extract data, post it to a webhook, auto-reply, or forward selectively based on content. It’s essentially a serverless email processing pipeline.

For the DNS and domain setup prerequisites, see Why I Registered My Domain Through Cloudflare.

Stream and Images — When Scale Demands It

Stream handles video hosting and adaptive streaming. Useful if you’re building a video platform and don’t want to pay Vimeo’s rates or host the bandwidth yourself. $5/month base plus $1 per 1,000 minutes stored and delivered.

Cloudflare Images handles on-the-fly image resizing via URL parameters. ?width=400&format=webp turns any stored image into a WebP-optimized thumbnail. $5/month for the first 5,000 images stored.

Honest assessment for personal projects: if you’re running a blog or small app, R2 for storage and a plain <img> tag with browser-native lazy loading is sufficient. Stream and Images shine at scale — multiple formats, multiple resolutions, high request volume — where the CDN integration pays for itself in bandwidth savings.


Use Case 3: AI Research and Inference

I started using Cloudflare for AI workloads almost by accident. Workers AI appeared in the dashboard with a selection of models and a one-API-call interface. I was curious. Within a week it had replaced my local Ollama setup for quick inference experiments because it required zero infrastructure.

Workers AI — Edge Inference Without a GPU

Workers AI gives you access to 50+ models — including Llama 3.1 8B, Mistral 7B, Whisper for transcription, SDXL for image generation, and several embedding models — via the same env.AI binding pattern as D1 and KV.

Free tier: 10,000 neurons per day. Neurons are Cloudflare’s compute unit — roughly correlated to inference time, not token count. For casual experimentation, 10K neurons per day is generous.

Above free: $0.011 per 1,000 neurons.

The API is OpenAI-compatible. If you’ve written code targeting OpenAI’s /v1/chat/completions endpoint, it works with Workers AI by changing the base URL:

// Using Cloudflare's built-in AI binding (simplest)
const result = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: userPrompt }
  ]
});

// Or using OpenAI SDK with Workers AI base URL
const client = new OpenAI({
  apiKey: env.CLOUDFLARE_API_TOKEN,
  baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CF_ACCOUNT_ID}/ai/v1`
});

Good for: classification, summarization, content moderation, embedding generation, transcription, quick prototyping.

Not great for: tasks requiring frontier-model reasoning. Llama 3.1 8B is capable but not GPT-4 or Claude Sonnet. For complex reasoning or tool use, routing through AI Gateway to OpenAI/Anthropic (see next section) is the right move.

AI Gateway — Unified Proxy for All Your AI Calls

The problem AI Gateway solves: you’re calling OpenAI from one part of your app, Anthropic from another, Workers AI for embeddings, and Hugging Face for a fine-tuned model. There’s no unified place to see all the requests, no retry logic, no cost visibility, and no way to cache identical prompts.

AI Gateway sits between your code and any AI provider. You point your SDK at the AI Gateway URL instead of the provider URL. Everything flows through it.

Currently free across all plans (pricing may change as the product matures).

Supported providers:

ProviderSupported
OpenAIYes
Anthropic (Claude)Yes
Cloudflare Workers AIYes
Google GeminiYes
Hugging FaceYes
CohereYes
GroqYes
Azure OpenAIYes

What you get:

  • Request logging — every prompt and response is logged. You can replay, search, and inspect them in the dashboard.
  • Semantic caching — identical or near-identical prompts return cached responses immediately. No inference cost, no latency.
  • Rate limiting — protect against runaway AI costs from a bug in your app.
  • Fallback providers — if OpenAI rate-limits you, AI Gateway can automatically retry the same request against Workers AI or another provider.

For AI researchers running multiple experiments, the logging alone is worth it. You get a complete audit trail of every prompt you ran without instrumenting your own code.

// Route through AI Gateway by changing the base URL
const client = new OpenAI({
  apiKey: env.OPENAI_API_KEY,
  baseURL: `https://gateway.ai.cloudflare.com/v1/${env.CF_ACCOUNT_ID}/${env.GATEWAY_ID}/openai`
});
// Everything else stays the same

Vectorize — Vector Database for Semantic Search and RAG

Vectorize is Cloudflare’s managed vector database. Unlike Pinecone, Weaviate, or Chroma, there’s nothing to self-host, no separate API key, and no separate service to integrate. It uses the same env.MY_INDEX binding pattern as everything else.

Free tier: 30 million vector dimensions. At 768 dimensions (a common embedding size), that’s roughly 39,000 embeddings. Enough for a meaningful RAG knowledge base.

The primary use case is RAG — retrieval-augmented generation. The full pipeline stays inside Cloudflare:

  1. Ingestion Worker: receive a document, chunk it, generate embeddings with Workers AI (@cf/baai/bge-small-en-v1.5), store in Vectorize.
  2. Query Worker: receive a question, embed it with the same model, find the closest vectors in Vectorize, use the retrieved documents as context, call Workers AI for a final answer.
// Store an embedding
const embedding = await env.AI.run('@cf/baai/bge-small-en-v1.5', {
  text: documentChunk
});
await env.VECTOR_INDEX.upsert([{
  id: documentId,
  values: embedding.data[0],
  metadata: { source: url, chunk: chunkIndex }
}]);

// Query for similar content
const queryEmbedding = await env.AI.run('@cf/baai/bge-small-en-v1.5', {
  text: userQuestion
});
const results = await env.VECTOR_INDEX.query(queryEmbedding.data[0], { topK: 5 });
// Pass results.matches as context to Workers AI

The entire RAG pipeline — embedding, storage, retrieval, generation — without leaving Cloudflare’s network or managing any infrastructure.

For a deeper look at RAG architecture, see RAG in Production.

Browser Rendering — Headless Chrome at the Edge

Browser Rendering gives you a Puppeteer-compatible API that runs a headless Chrome browser inside a Worker. No Selenium server, no managed browser farm, no infrastructure to maintain.

Requires: Workers Paid ($5/month). Free: 2 concurrent sessions within that plan.

For AI researchers, the key use case is web scraping for RAG ingestion. Dynamic pages that require JavaScript execution can’t be fetched with a simple fetch() call — you need a real browser to render them first.

import puppeteer from '@cloudflare/puppeteer';

export default {
  async fetch(req: Request, env: Env): Promise<Response> {
    const { url } = await req.json<{ url: string }>();
    const browser = await puppeteer.launch(env.BROWSER);
    const page = await browser.newPage();
    await page.goto(url, { waitUntil: 'networkidle0' });
    const content = await page.evaluate(() => document.body.innerText);
    await browser.close();
    // Feed `content` into your Vectorize ingestion pipeline
    return Response.json({ content });
  }
};

Other uses: generate PDF reports from rendered HTML, take screenshots for visual regression testing, test pages that require authentication.

AutoRAG — Managed RAG Pipeline (Preview)

AutoRAG is Cloudflare’s attempt to package the Workers AI + Vectorize RAG pipeline into a turnkey product. Upload documents, configure a source (R2 bucket, website URL), and get a RAG API endpoint back. Chunking, embedding, storage, and retrieval are all managed.

Status: limited preview as of February 2026. Not generally available.

Why it matters when it ships: the manual Workers AI + Vectorize pipeline above requires wiring up embeddings, chunking logic, metadata storage, and query handling. AutoRAG handles all of that. For researchers who want to experiment with RAG without building the infrastructure, it removes the setup cost entirely.

Watch this space. Cloudflare ships fast.


Use Case → Service Mapping

If you have a specific problem and aren’t sure which Cloudflare service to reach for:

I want to…Use this
Host a static site with CI/CDPages
Build a serverless API or webhook handlerWorkers
Store structured relational dataD1
Cache session data or config valuesKV
Store user-uploaded files without egress feesR2
Send emails from a WorkerEmail Routing + Worker handler
Receive email at my custom domainEmail Routing
Process background jobs asynchronouslyQueues
Handle real-time WebSocket connections with stateDurable Objects
Connect my existing Postgres or MySQL databaseHyperdrive
Track custom analytics events from WorkersAnalytics Engine
Host video contentStream
Resize and optimize images on the flyImages
Protect forms from bots without annoying CAPTCHAsTurnstile
Expose a homelab service without opening portsTunnel
Add authentication in front of any URLZero Trust Access
Give my team private network access remotelyWARP / Cloudflare One
Block malware and ads at the DNS levelGateway
Run AI inference without a GPU or API keyWorkers AI
Proxy and log all my AI API callsAI Gateway
Build a vector search or RAG systemVectorize
Scrape dynamic web pages from a WorkerBrowser Rendering
Get a fully managed RAG pipelineAutoRAG (preview)

Pricing: Free vs. Paid

Two paid tiers cover most upgrade decisions:

Workers Paid — $5/month unlocks the developer platform:

MetricFreeWorkers Paid
Workers requests100K/day10M/month + $0.30/million
Workers CPU time10ms/invocation30s/invocation
KV reads100K/day10M/month
KV writes1K/day1M/month
D1 rows read5M/day25M/day
D1 rows written100K/day50M/month
D1 storage1GB5GB
R2 storage10GB10GB + $0.015/GB
R2 Class A operations1M/month1M/month + $4.50/million
Queues operations1M/month1M/month + $0.40/million
Workers AI neurons10K/day10K/day + $0.011/1K
Vectorize dimensions30M free30M + usage-based above
Durable ObjectsNot includedIncluded
HyperdriveNot includedIncluded
Browser RenderingNot includedIncluded (2 sessions)

Zero Trust Teams — $3/user/month removes the 50-user cap on Tunnel, Access, and Gateway, and adds more granular policy options. For a personal homelab with fewer than 50 people who need access, the free tier covers everything indefinitely.

The honest math: personal projects almost never hit free tier limits. A blog with 10,000 monthly visitors uses roughly 10,000 Workers requests per day — right at the free limit, but not over it. The $5/month Workers Paid upgrade is worth considering when you’re building an actual application with multiple users and background job requirements.


My Personal Stack — What I Actually Use

Self-hosting: Tunnel + Zero Trust Access, both on the free tier. Gitea, Jellyfin, and my monitoring dashboard sit behind Access with email OTP. Proxmox never touches Tunnel — I access it via WARP when I’m away from home or on LAN when I’m not.

Portfolio: Pages + Pages Functions + KV (contact form submissions) + Turnstile (bot protection). Zero additional cost beyond the domain.

AI experimentation: Workers AI for quick inference tests (I keep a Worker deployed that accepts a prompt and returns a response — faster than opening a chat interface for simple tests), AI Gateway to log what I’m actually sending to OpenAI, and a small Vectorize index for a private notes RAG experiment.

What I’m not using yet: Queues, Durable Objects, and AutoRAG. My personal projects don’t need async job queues — the contact form is the most complex “background” operation I have, and it’s fast enough to handle synchronously. AutoRAG is still in preview. I’m watching it.

My monthly Cloudflare bill:

ServiceCost/month
Domain registration$0.84 ($10.11/year)
Pages hosting$0
Tunnel + Access (free tier)$0
Workers (free tier)$0
KV, D1, R2 (free tier)$0
Workers AI (free tier)$0
AI Gateway$0
Vectorize (free tier)$0
Total~$0.84/month

What Cloudflare Is Not Great For

Long-running processes. Workers have a 10ms CPU cap on the free tier and a 30-second cap on the paid tier. A data processing job that takes five minutes can’t run in a Worker. You can work around this with Queues — break the work into chunks, each handled by a separate message — but it’s more complex than a traditional server. For jobs that genuinely need to run for minutes, a VPS or homelab machine is the right tool.

GPU-intensive ML workloads. Workers AI runs on Cloudflare’s managed GPU infrastructure. You don’t choose the hardware, you can’t fine-tune models, and you can’t load custom weights. For actual ML research — training, fine-tuning, running inference on custom models — you want Runpod, Lambda Labs, or Vast.ai. Cloudflare Workers AI is great for inference on pre-existing models; it’s not a research compute platform.

Persistent server processes. If you need a long-lived process — a Discord bot, a WebSocket server that maintains state for thousands of concurrent connections at scale, a background scheduler — Cloudflare’s architecture isn’t the right fit. Durable Objects help with stateful coordination, but they’re designed for edge use cases, not general-purpose server processes. A VPS is simpler for this.

Strict data residency requirements. Cloudflare’s network is global by design. If your use case requires that data stays within a specific geographic region (EU data residency for GDPR, or US-only storage for government compliance), verify Cloudflare’s compliance documentation before committing. They do offer data locality controls in some products, but it requires research.


Where to Go Next

Cloudflare has quietly become a full-platform company. The free tier alone covers a professional portfolio, a safely-exposed homelab, and serious AI inference experimentation — for less than a dollar a month (the domain).

If you’re new to Cloudflare, start with the two highest-value entry points:

Once those are running, Workers AI is the natural next experiment if you do any AI work. Create a Worker, bind the AI model, and you have a serverless inference endpoint in ten minutes.

One last note: Cloudflare ships constantly. AutoRAG, AI Gateway, Browser Rendering, and Vectorize all appeared in the past two years. The service table above will look different in twelve months. Worth revisiting when you’re planning your next project.


Related posts:

Export for reading

Comments