#cloud

5 posts

Apr 3, 2026 · 6 min read

The LLM Cost War: Qwen3.6-Plus, Gemini Flash-Lite, and the Dawn of Commodity AI

Alibaba just released its third proprietary model in days. Google's Gemini Flash-Lite costs $0.25 per million tokens. NVIDIA's Nemotron runs 2.2x faster than GPT-OSS-120B. The LLM cost war has arrived — here's what it means for architects choosing AI infrastructure in 2026.

Mar 29, 2026 · 5 min read

Gemini 3.1 Flash-Lite Is Free: What It Actually Means for Developer Economics

Google just made Gemini Code Assist free and priced Flash-Lite at $0.25/M tokens. After 15 years building production systems, here's what this cost collapse really changes about how we build.

ai google gemini +3

Mar 28, 2026 · 5 min read

The AI Cost Collapse: How to Architect Smart at Under $1/M Tokens

GPT-4 level AI cost $30/M tokens in 2023. Today it's under $1. Here's the technical architecture that lets you capture 90%+ of that savings without sacrificing quality.

ai architecture cost-optimization +2

Mar 28, 2026 · 12 min read

Build a Full-Stack Startup for $20/Month: The 2026 Free-Tier Stack

A comprehensive guide to the modern free-tier tech stack that lets you build, deploy, and scale a startup for roughly $20/month. No servers. No DevOps team. No funding required. Just an idea and WiFi.

startup free-tier architecture +5

Oct 20, 2024 · 4 min read

Setting Up Azure OpenAI for Enterprise: What Nobody Tells You

We integrated Azure OpenAI into a client's enterprise platform. Here's the practical stuff — networking, content filtering, cost surprises, and what the docs don't cover.

ai azure enterprise +1

← All posts