Testing AI Agents Is Nothing Like Testing Regular Code
How to test LLM outputs, validate RAG retrieval quality, and verify vector search accuracy using DeepEval, Ragas, and Testcontainers. Part 2 of the AI-powered quality engineering series.
How to test LLM outputs, validate RAG retrieval quality, and verify vector search accuracy using DeepEval, Ragas, and Testcontainers. Part 2 of the AI-powered quality engineering series.
How I designed, analyzed, implemented, and tested Kids Learn — an AI-powered educational SaaS platform — using Claude as my development partner, Gemini for AI features, Next.js, PostgreSQL, and pgvector. A complete walkthrough from napkin sketch to production.
We built a RAG knowledge base. The first version gave wrong answers half the time. Six months of iteration later, it actually works. Here's every lesson.
We tested ChromaDB, Pinecone, Weaviate, Qdrant, and pgvector for our RAG system. Benchmarks, costs, developer experience — and the one I'd pick today.
Get notified when I publish new posts on AI, .NET, cloud architecture and more.