#research

10 posts

Jun 10, 2026 · 5 min read

TurboQuant: Google's 100x KV Cache Breakthrough and What It Means for Long-Context AI

Google's TurboQuant research reduces KV cache memory overhead by ~100x using a two-step algorithm combining PolarQuant vector rotation and Johnson-Lindenstrauss compression. This could make 2M-token context models economically feasible for many more teams.

Mar 10, 2026 · 22 min read Part 8

Production Voice AI for Research at Scale: Deployment and Go-Live — From Docker Compose to 200 Concurrent Sessions

The complete deployment guide: Docker multi-stage builds, Kubernetes orchestration, CI/CD with GitHub Actions, zero-downtime deploys, go-live checklist, production monitoring with Prometheus/Grafana, and the operational runbook that keeps voice AI running at scale.

voice-ai s2s research +7

Mar 8, 2026 · 17 min read Part 7

Production Voice AI for Research at Scale: Multi-Language Voice AI — When Your Agent Needs to Think in Japanese

Multi-language voice AI for research: language detection, provider routing (Gemini Live for 30+ languages, OpenAI Realtime for English), locale-aware VAD tuning, i18n prompt packs, and cross-language analysis pipelines.

voice-ai s2s research +5

Mar 6, 2026 · 13 min read Part 6

Production Voice AI for Research at Scale: What Breaks at 200 Concurrent Sessions

Scaling from 10 sessions/week to 200 concurrent. The enrichment bottleneck (30,000 API calls), session recovery for dropped WebRTC connections, provider failover, and the operational metrics that keep it all visible.

voice-ai s2s research +5

Mar 4, 2026 · 12 min read Part 5

Production Voice AI for Research at Scale: The Real Cost

Real-time per-minute cost tracking, provider comparison (OpenAI Realtime ~$0.053/min vs Gemini Live ~$0.029/min), budget enforcement with soft/hard limits, and the self-hosting math that saves 90% on transport.

voice-ai s2s research +4

Mar 2, 2026 · 13 min read Part 4

Production Voice AI for Research at Scale: From Recording to Insight

The 3-stage automatic pipeline that turns raw interview recordings into enriched, queryable research data in 3-7 minutes. Transcription, enrichment, analysis — with the transcript batching trick that cut DB load by 80%.

voice-ai s2s research +5

Mar 1, 2026 · 7 min read Part 4

AI Workflow Mastery: NotebookLM — Turning Documents into Knowledge

The definitive guide to Google NotebookLM: Audio Overviews, Mind Maps, Deep Research, and 5 practical workflows for business owners, developers, writers, and content creators.

ai workflow notebooklm +3

Feb 28, 2026 · 11 min read Part 3

Production Voice AI for Research at Scale: Multi-Phase State Machines

Research interviews follow structured protocols with distinct phases. How to build an LLM-driven state machine with next_phase() function calling and dynamic instruction swapping via set_chat_ctx().

voice-ai s2s research +5

Feb 26, 2026 · 12 min read Part 2

Production Voice AI for Research at Scale: Zombie Agents, Pre-Warming, and the 5 Bugs That Cost Us Weeks

The production pain points nobody warns you about: zombie agents, metadata latency, pre-warming for 1-2s time-to-first-voice, VAD tuning for research respondents, and provider quirks.

voice-ai s2s research +4

Feb 24, 2026 · 10 min read Part 1

Production Voice AI for Research at Scale: The Architecture Nobody Warns You About

Why research interviews need server-side voice agents, the three-tier architecture, room metadata as configuration transport, and the 100-500ms propagation latency nobody tells you about.

voice-ai s2s research +4

← All posts

#research

Stay in the loop