Agent Learning: Memory, RAG, and Skills That Grow Over Time (Part 10 of 12)

Your agents have a problem you might not have noticed yet. They are goldfish.

Every time the pipeline runs, the SSE agent writes code as if it has never written code before. The QC agent invents test strategies from scratch. The TA agent makes architecture decisions without remembering that it made the exact same decision last Tuesday — and that decision turned out to be wrong.

This is not a model limitation. It is architectural. We built a pipeline where state lives in TeamState, flows through the graph, and disappears when the run completes. The agents have working memory but no long-term memory. They have intelligence but no experience.

In human teams, experience separates a junior from a senior. The senior developer has seen a hundred failed deployments, a thousand code review comments. They pattern-match against history and get faster with every project.

This article gives your agents the same capability. We build three types of memory, a reflection system that learns from every completed task, a skills library that accumulates proven patterns, and an inter-agent knowledge sharing protocol. By the end, your agents will get measurably better with every run.

1. The Three Types of Agent Memory

Cognitive science divides human memory into three categories that map remarkably well to agent systems.

Working memory is what you are thinking about right now. Fast, limited, temporary. In our system: TeamState — messages, task context, iteration counts. It exists for one pipeline execution and vanishes.

Episodic memory is your record of past experiences. “Last time I used MongoDB for relational queries, it was a disaster.” In our system: a ChromaDB vector store where agents write reflections after tasks. When a similar task arrives later, the agent retrieves relevant past experiences.

Semantic memory is general knowledge abstracted from specific experiences. “Relational data belongs in PostgreSQL.” In our system: a skills library with confidence scores that increase as evidence accumulates.

Three memory types: Working Memory (TeamState, ephemeral), Episodic Memory (ChromaDB, past experiences), Semantic Memory (Skills Library, learned abilities)

The flow is directional. Working memory produces raw material during a run. Reflection distills it into episodic memories. Recurring patterns get promoted to semantic memory. Each layer is slower to write but longer-lasting and more valuable.

2. The Memory Store: ChromaDB for Episodic Memory

ChromaDB is an open-source vector database that stores documents alongside embeddings. You give it text, it vectorizes it, and later you query by similarity. Exactly what episodic memory needs.

The data model:

# memory/models.py
from pydantic import BaseModel, Field
from datetime import datetime
from enum import Enum

class MemoryType(str, Enum):
    REFLECTION = "reflection"
    ERROR_PATTERN = "error_pattern"
    DECISION_OUTCOME = "decision_outcome"
    REVIEW_FEEDBACK = "review_feedback"

class AgentMemory(BaseModel):
    """A single episodic memory entry."""
    memory_id: str = Field(default_factory=lambda: str(uuid4()))
    agent_role: str                         # "sse", "qc", "ta", etc.
    memory_type: MemoryType
    task_id: str                            # which pipeline run produced this
    content: str                            # the actual memory text
    context: dict = Field(default_factory=dict)  # structured metadata
    created_at: datetime = Field(default_factory=datetime.utcnow)
    relevance_score: float = 0.0            # updated on retrieval

    @property
    def search_text(self) -> str:
        """Combined text used for embedding and similarity search."""
        return f"[{self.agent_role}] [{self.memory_type.value}] {self.content}"

Now the store itself:

# memory/store.py
import chromadb
from chromadb.config import Settings

class AgentMemoryStore:
    """Persistent episodic memory backed by ChromaDB."""

    def __init__(self, persist_dir: str = "./data/memory"):
        self.client = chromadb.PersistentClient(
            path=persist_dir,
            settings=Settings(anonymized_telemetry=False),
        )
        self.collection = self.client.get_or_create_collection(
            name="agent_memories",
            metadata={"hnsw:space": "cosine"},
        )

    def store(self, memory: AgentMemory) -> str:
        """Store a memory. Returns the memory ID."""
        self.collection.add(
            ids=[memory.memory_id],
            documents=[memory.search_text],
            metadatas=[{
                "agent_role": memory.agent_role,
                "memory_type": memory.memory_type.value,
                "task_id": memory.task_id,
                "created_at": memory.created_at.isoformat(),
            }],
        )
        return memory.memory_id

    def recall(
        self,
        query: str,
        agent_role: str | None = None,
        memory_type: MemoryType | None = None,
        n_results: int = 5,
    ) -> list[dict]:
        """Retrieve memories similar to the query."""
        where_filter = {}
        if agent_role:
            where_filter["agent_role"] = agent_role
        if memory_type:
            where_filter["memory_type"] = memory_type.value

        results = self.collection.query(
            query_texts=[query],
            n_results=n_results,
            where=where_filter if where_filter else None,
        )
        return [
            {
                "content": results["documents"][0][i],
                "distance": results["distances"][0][i],
                "metadata": results["metadatas"][0][i],
            }
            for i in range(len(results["documents"][0]))
        ]

    def forget_before(self, cutoff: datetime) -> int:
        """Remove memories older than cutoff. Returns count removed."""
        all_data = self.collection.get(
            where={"created_at": {"$lt": cutoff.isoformat()}},
        )
        if all_data["ids"]:
            self.collection.delete(ids=all_data["ids"])
        return len(all_data["ids"])

The recall method takes a natural-language query and returns the most similar memories. When the SSE agent is about to implement a FastAPI endpoint, it queries “implementing REST API endpoint with authentication” and gets back memories from previous runs where it did something similar — including what went wrong.

3. Task Reflection: Learning After Every Run

Memory without reflection is just logging. A log says “test failed with exit code 1.” A memory says “the test failed because I forgot to mock the database connection, and this is the third time I have made that mistake with integration tests.”

Reflection happens after a task completes. Each agent examines what it produced, what feedback it received, and what the outcome was. Then it writes a structured reflection into episodic memory.

# memory/reflection.py
from memory.models import AgentMemory, MemoryType
from memory.store import AgentMemoryStore
from langchain_anthropic import ChatAnthropic
from pydantic import BaseModel

class TaskReflection(BaseModel):
    """Structured reflection output from an agent."""
    what_went_well: str
    what_went_wrong: str
    key_decision: str
    decision_outcome: str           # "positive", "negative", "neutral"
    pattern_noticed: str | None = None
    improvement_for_next_time: str

REFLECTION_PROMPT = """You are {agent_role}, reflecting on a completed task.

## Task Summary
{task_summary}

## Your Output
{agent_output}

## Feedback Received
{feedback}

## Final Outcome
{outcome}

Reflect on this task honestly. What went well? What went wrong?
What was the key decision you made, and how did it turn out?
Do you notice any recurring pattern? What would you do differently next time?

Respond as JSON matching the TaskReflection schema."""


class ReflectionEngine:
    """Generates and stores post-task reflections."""

    def __init__(self, memory_store: AgentMemoryStore):
        self.memory = memory_store
        self.llm = ChatAnthropic(
            model="claude-sonnet-4-20250514",
            temperature=0.3,
        )

    async def reflect(
        self,
        agent_role: str,
        task_id: str,
        task_summary: str,
        agent_output: str,
        feedback: str,
        outcome: str,
    ) -> TaskReflection:
        """Generate a reflection and store it as episodic memory."""
        prompt = REFLECTION_PROMPT.format(
            agent_role=agent_role,
            task_summary=task_summary,
            agent_output=agent_output[:2000],  # truncate to fit context
            feedback=feedback,
            outcome=outcome,
        )

        response = await self.llm.ainvoke([{"role": "user", "content": prompt}])
        reflection = TaskReflection.model_validate_json(response.content)

        # Store the main reflection
        self.memory.store(AgentMemory(
            agent_role=agent_role,
            memory_type=MemoryType.REFLECTION,
            task_id=task_id,
            content=(
                f"Task: {task_summary}\n"
                f"Well: {reflection.what_went_well}\n"
                f"Wrong: {reflection.what_went_wrong}\n"
                f"Improve: {reflection.improvement_for_next_time}"
            ),
            context={
                "decision": reflection.key_decision,
                "decision_outcome": reflection.decision_outcome,
            },
        ))

        # If a pattern was noticed, store it separately
        if reflection.pattern_noticed:
            self.memory.store(AgentMemory(
                agent_role=agent_role,
                memory_type=MemoryType.ERROR_PATTERN,
                task_id=task_id,
                content=reflection.pattern_noticed,
            ))

        return reflection

The reflection step runs after the PM marks the task complete. Every participating agent reflects:

# graph/reflection_node.py
async def reflection_node(state: TeamState) -> dict:
    engine = ReflectionEngine(memory_store=get_memory_store())
    for role in ["po", "ba", "ta", "qc", "sse", "tl"]:
        output_key = f"{role}_output"
        if not state.get(output_key):
            continue
        await engine.reflect(
            agent_role=role, task_id=state["task_id"],
            task_summary=state.get("clarified_requirement", ""),
            agent_output=str(state[output_key])[:2000],
            feedback=state.get("qc_feedback", "No feedback"),
            outcome=state.get("phase", "unknown"),
        )
    return {"reflection_complete": True}

4. The Skills Library: Semantic Memory

Episodic memories are raw experiences — useful but noisy. The skills library is the curated layer above them: proven patterns validated across multiple experiences.

# memory/skills.py
from pydantic import BaseModel, Field
from datetime import datetime
from uuid import uuid4

class Skill(BaseModel):
    """A learned skill with confidence tracking."""
    skill_id: str = Field(default_factory=lambda: str(uuid4()))
    name: str                               # "fastapi_auth_endpoint"
    description: str                        # what this skill does
    agent_role: str                         # which agent owns this
    category: str                           # "api_design", "testing", etc.
    template: str                           # the actual pattern/template
    confidence: float = 0.5                 # 0.0 to 1.0
    times_used: int = 0
    times_succeeded: int = 0
    created_at: datetime = Field(default_factory=datetime.utcnow)
    updated_at: datetime = Field(default_factory=datetime.utcnow)
    source_task_ids: list[str] = Field(default_factory=list)
    approved: bool = False                  # requires human approval

    @property
    def success_rate(self) -> float:
        if self.times_used == 0:
            return 0.0
        return self.times_succeeded / self.times_used

    def record_usage(self, succeeded: bool, task_id: str):
        """Update stats after the skill is used."""
        self.times_used += 1
        if succeeded:
            self.times_succeeded += 1
        self.source_task_ids.append(task_id)
        # Bayesian-style confidence update
        self.confidence = (
            (self.confidence * (self.times_used - 1) + (1.0 if succeeded else 0.0))
            / self.times_used
        )
        self.updated_at = datetime.utcnow()


class SkillsLibrary:
    """Persistent store for learned skills."""

    def __init__(self, db_path: str = "./data/skills.json"):
        self.db_path = db_path
        self.skills: dict[str, Skill] = {}
        self._load()

    def _load(self):
        import json; from pathlib import Path
        path = Path(self.db_path)
        if path.exists():
            data = json.loads(path.read_text())
            self.skills = {k: Skill.model_validate(v) for k, v in data.items()}

    def _save(self):
        import json; from pathlib import Path
        data = {k: v.model_dump(mode="json") for k, v in self.skills.items()}
        Path(self.db_path).parent.mkdir(parents=True, exist_ok=True)
        Path(self.db_path).write_text(json.dumps(data, indent=2))

    def add_skill(self, skill: Skill) -> str:
        """Add a new skill. Returns skill ID."""
        self.skills[skill.skill_id] = skill
        self._save()
        return skill.skill_id

    def find_skills(self, agent_role: str | None = None,
                    category: str | None = None,
                    min_confidence: float = 0.0,
                    approved_only: bool = True) -> list[Skill]:
        """Find skills matching criteria, sorted by confidence."""
        results = [
            s for s in self.skills.values()
            if (not agent_role or s.agent_role == agent_role)
            and (not category or s.category == category)
            and s.confidence >= min_confidence
            and (not approved_only or s.approved)
        ]
        return sorted(results, key=lambda s: s.confidence, reverse=True)

    def record_usage(self, skill_id: str, succeeded: bool, task_id: str):
        if skill_id in self.skills:
            self.skills[skill_id].record_usage(succeeded, task_id)
            self._save()

    def promote_from_reflections(
        self,
        memory_store: AgentMemoryStore,
        agent_role: str,
        pattern_query: str,
        min_occurrences: int = 3,
    ) -> Skill | None:
        """
        Check if a pattern appears frequently enough in episodic memory
        to be promoted to a skill.
        """
        memories = memory_store.recall(
            query=pattern_query,
            agent_role=agent_role,
            memory_type=MemoryType.ERROR_PATTERN,
            n_results=10,
        )

        # Filter by similarity threshold
        relevant = [m for m in memories if m["distance"] < 0.3]

        if len(relevant) >= min_occurrences:
            skill = Skill(
                name=f"auto_{agent_role}_{len(self.skills)}",
                description=f"Pattern detected from {len(relevant)} similar experiences",
                agent_role=agent_role,
                category="auto_detected",
                template=pattern_query,
                confidence=0.5,
                approved=False,  # needs human review
                source_task_ids=[
                    m["metadata"]["task_id"] for m in relevant
                ],
            )
            self.add_skill(skill)
            return skill
        return None

Skills start at 0.5 confidence. Every successful use pushes confidence up; every failure pushes it down. Over enough runs, reliable skills float to the top and unreliable ones sink. An agent looking for guidance gets the highest-confidence relevant skill first.

The approved flag prevents the system from learning bad habits. Auto-detected skills start unapproved. A human reviews them before they enter active rotation. A pattern that appeared three times might exist because the agent made the same mistake three times, not because it found a good approach.

5. Injecting Memory into Agent Prompts

Memory that is never consulted is useless. Before each agent runs, we query both episodic memory and the skills library, then inject relevant context into the prompt.

# memory/injection.py

class MemoryInjector:
    """Retrieves and formats memories for agent prompt injection."""

    def __init__(
        self,
        memory_store: AgentMemoryStore,
        skills_library: SkillsLibrary,
    ):
        self.memory = memory_store
        self.skills = skills_library

    def build_memory_context(
        self,
        agent_role: str,
        task_description: str,
        max_memories: int = 3,
        max_skills: int = 2,
    ) -> str:
        """Build a memory context block for prompt injection."""
        sections = []

        # 1. Retrieve relevant episodic memories
        memories = self.memory.recall(
            query=task_description,
            agent_role=agent_role,
            n_results=max_memories,
        )

        if memories:
            sections.append("## Relevant Past Experiences")
            for i, mem in enumerate(memories, 1):
                sections.append(f"{i}. {mem['content']}")

        # 2. Retrieve relevant skills
        skills = self.skills.find_skills(
            agent_role=agent_role,
            min_confidence=0.6,
            approved_only=True,
        )[:max_skills]

        if skills:
            sections.append("\n## Proven Skills Available")
            for skill in skills:
                sections.append(
                    f"- **{skill.name}** (confidence: {skill.confidence:.0%}): "
                    f"{skill.description}\n"
                    f"  Template: {skill.template[:300]}"
                )

        if not sections:
            return ""

        return (
            "\n---\n"
            "# Memory Context (from past runs)\n"
            "Use these past experiences and skills to inform your work. "
            "Do not follow them blindly — adapt to the current task.\n\n"
            + "\n".join(sections)
            + "\n---\n"
        )


def inject_memory_into_prompt(
    base_prompt: str,
    agent_role: str,
    task_description: str,
    injector: MemoryInjector,
) -> str:
    """Append memory context to an agent's base prompt."""
    memory_context = injector.build_memory_context(
        agent_role=agent_role,
        task_description=task_description,
    )
    if memory_context:
        return base_prompt + "\n" + memory_context
    return base_prompt

The injection happens in each agent node function, right before the LLM call. The SSE agent’s node, for example, creates a MemoryInjector, calls inject_memory_into_prompt with the base system prompt and current task description, then passes the enhanced prompt to the LLM. The agent now sees not just its role definition and the current task, but also “last time you built a similar API, you forgot input validation on the PATCH endpoint” and “proven skill: always add request body validation middleware (confidence: 87%).“

Some lessons apply across the team. When the QC agent discovers that “endpoints without rate limiting always fail load testing,” that knowledge should reach the TA and SSE agents. The sharing mechanism uses a proposal-and-approval pattern:

# memory/sharing.py
from pydantic import BaseModel
from enum import Enum

class ShareStatus(str, Enum):
    PROPOSED = "proposed"
    APPROVED = "approved"
    REJECTED = "rejected"

class KnowledgeShare(BaseModel):
    """A knowledge item proposed for cross-agent sharing."""
    share_id: str
    source_agent: str
    target_agents: list[str]
    knowledge: str
    evidence_task_ids: list[str]
    status: ShareStatus = ShareStatus.PROPOSED


class KnowledgeBroker:
    """Manages cross-agent knowledge sharing with approval gates."""

    def __init__(self, skills_library: SkillsLibrary):
        self.skills = skills_library
        self.pending: list[KnowledgeShare] = []

    def propose(self, source_agent: str, target_agents: list[str],
                knowledge: str, evidence_task_ids: list[str]) -> KnowledgeShare:
        share = KnowledgeShare(
            share_id=str(uuid4()), source_agent=source_agent,
            target_agents=target_agents, knowledge=knowledge,
            evidence_task_ids=evidence_task_ids,
        )
        self.pending.append(share)
        return share

    def approve(self, share_id: str) -> bool:
        """Approve a share and create skills for target agents."""
        share = next((s for s in self.pending if s.share_id == share_id), None)
        if not share:
            return False
        share.status = ShareStatus.APPROVED
        for target in share.target_agents:
            self.skills.add_skill(Skill(
                name=f"shared_from_{share.source_agent}",
                description=share.knowledge, agent_role=target,
                category="shared_knowledge", template=share.knowledge,
                confidence=0.6, approved=True,
                source_task_ids=share.evidence_task_ids,
            ))
        return True

After reflection, agents propose shares automatically based on a role-to-beneficiary map:

# Role -> who benefits from their discoveries
SHARE_TARGETS = {
    "qc": ["sse", "ta"],      # QC findings help SSE and TA
    "tl": ["sse", "qc"],      # TL review patterns help SSE and QC
    "sse": ["ta", "qc"],      # SSE impl patterns help TA and QC
    "ta": ["sse", "devops"],   # TA arch decisions help SSE and DevOps
}

async def propose_cross_agent_shares(
    reflection: TaskReflection, agent_role: str,
    task_id: str, broker: KnowledgeBroker,
):
    if not reflection.pattern_noticed:
        return
    targets = SHARE_TARGETS.get(agent_role, [])
    if targets:
        broker.propose(agent_role, targets,
                       reflection.pattern_noticed, [task_id])

The human reviews pending shares in the dashboard. “QC agent noticed that all APIs without input validation fail test cases. Share with SSE and TA?” Approve, and both agents get a new skill.

7. Learning from External Sources

Agents should not only learn from their own experience. A ResearchTool lets agents query external documentation and best-practice guides during execution.

# tools/research.py
from langchain_core.tools import tool
from langchain_community.utilities import GoogleSearchAPIWrapper

class ResearchTool:
    """Lets agents search for external best practices."""

    def __init__(self):
        self.search = GoogleSearchAPIWrapper(k=3)

    @tool
    async def research_best_practice(self, query: str) -> str:
        """Search for best practices on a topic. Use when encountering
        unfamiliar problems or verifying your approach."""
        results = self.search.results(query, num_results=3)
        formatted = [
            f"**{r['title']}**\n{r['snippet']}\nSource: {r['link']}"
            for r in results
        ]
        return "\n\n---\n\n".join(formatted) if formatted else "No results found."

The TA and SSE agents get this tool. The key constraint: research results are ephemeral. They inform the current run but do not enter the memory store. Only reflections and approved skills persist. This prevents the system from memorizing random Stack Overflow answers. The agent uses research to make a decision, the outcome is reflected on, and only proven patterns graduate to long-term memory.

8. Wiring It All Into the Graph

Here is how memory integrates with the LangGraph pipeline:

# graph/builder.py (updated with memory nodes)
from langgraph.graph import StateGraph, END

def build_team_graph() -> StateGraph:
    graph = StateGraph(TeamState)

    # ── Existing agent nodes (from Parts 5-9) ──
    graph.add_node("po", po_node)
    graph.add_node("ba", ba_node)
    graph.add_node("ta", ta_node)
    graph.add_node("qc", qc_node)
    graph.add_node("sse", sse_node)          # now uses memory injection
    graph.add_node("tl", tl_node)
    graph.add_node("devops", devops_node)
    graph.add_node("pm", pm_node)

    # ── New memory nodes ──
    graph.add_node("reflect", reflection_node)
    graph.add_node("skill_check", skill_promotion_node)

    # ── Existing edges (abbreviated) ──
    graph.set_entry_point("po")
    graph.add_edge("po", "ba")
    graph.add_edge("ba", "ta"); graph.add_edge("ba", "qc")  # parallel
    graph.add_edge("ta", "sse"); graph.add_edge("qc", "sse")
    graph.add_edge("sse", "tl")
    graph.add_conditional_edges("tl", tl_router)
    graph.add_edge("devops", "pm")
    # ── Memory edges (post-pipeline) ──
    graph.add_edge("pm", "reflect")
    graph.add_edge("reflect", "skill_check")
    graph.add_edge("skill_check", END)

    return graph.compile()


async def skill_promotion_node(state: TeamState) -> dict:
    """Check if any episodic patterns should be promoted to skills."""
    memory_store = get_memory_store()
    skills_lib = get_skills_library()

    # Check common pattern categories
    patterns_to_check = [
        ("sse", "input validation on API endpoints"),
        ("sse", "error handling in async functions"),
        ("qc", "edge cases in authentication flows"),
        ("ta", "database selection for relational data"),
    ]

    promoted = []
    for agent_role, pattern in patterns_to_check:
        skill = skills_lib.promote_from_reflections(
            memory_store=memory_store,
            agent_role=agent_role,
            pattern_query=pattern,
        )
        if skill:
            promoted.append(skill.name)

    return {"promoted_skills": promoted}

The pipeline now has a tail: after the PM declares the task complete, the reflection node runs, every agent writes its memories, and the skill promotion node checks whether any patterns have occurred often enough to become skills.

9. Measuring Improvement

“The agents are getting better” is not a useful claim without metrics. Here is how we track whether the memory system works.

# metrics/improvement.py
from dataclasses import dataclass

@dataclass
class ImprovementMetrics:
    """Track agent improvement across pipeline runs."""
    task_id: str
    run_number: int
    qc_pass_on_first_try: bool = False
    tl_review_pass_on_first: bool = False
    iteration_count: int = 0
    memories_recalled: int = 0
    skills_applied: int = 0
    total_tokens_used: int = 0


def compute_improvement_trend(history: list[ImprovementMetrics]) -> dict:
    """Compare early runs vs recent runs."""
    if len(history) < 4:
        return {"status": "insufficient_data"}
    mid = len(history) // 2
    early, recent = history[:mid], history[mid:]

    def avg(items, attr):
        vals = [getattr(i, attr) for i in items]
        return sum(vals) / len(vals) if vals else 0

    return {
        "first_try_pass_rate": {"early": avg(early, "qc_pass_on_first_try"),
                                "recent": avg(recent, "qc_pass_on_first_try")},
        "avg_iterations":      {"early": avg(early, "iteration_count"),
                                "recent": avg(recent, "iteration_count")},
        "avg_tokens":          {"early": avg(early, "total_tokens_used"),
                                "recent": avg(recent, "total_tokens_used")},
    }

The metrics you want trending in the right direction:

First-try pass rate should increase. Agents learning from past failures should pass QC and TL review on the first attempt more often.
Iteration count should decrease. Fewer rejection cycles means higher-quality output from the start.
Token usage should stabilize or decrease. Better skills mean less fumbling, fewer retry tokens wasted.

Track these across runs. If the lines are flat, your memory system is not working. If first-try pass rate is climbing and iteration count is falling, your agents are genuinely learning.

10. Practical Considerations

Memory Hygiene

Not all memories are worth keeping. Filter before storing — reflections that say “everything went fine” add noise without signal. Reject reflections where what_went_wrong == "Nothing" and improvement_for_next_time is shorter than 20 characters.

Memory Decay

Old memories become less relevant. Run forget_before() monthly with a 90-day cutoff to prevent stale patterns from polluting retrieval.

Cost Control

Six agents reflecting per run means 6 extra LLM calls. Use a cheaper model for reflections (Claude Haiku or GPT-4o-mini) and batch embedding calls where possible.

The Cold Start Problem

On the first run, there are no memories. The pipeline works identically to the non-memory version from previous parts. This is fine — memory is additive, not a dependency. To accelerate cold start, seed the skills library before the first run:

def seed_initial_skills(library: SkillsLibrary):
    """Pre-load common skills to avoid cold start."""
    library.add_skill(Skill(
        name="api_input_validation",
        description="Always validate request bodies with Pydantic models",
        agent_role="sse",
        category="api_design",
        template="Use Pydantic BaseModel for all request/response schemas",
        confidence=0.8,
        approved=True,
    ))
    library.add_skill(Skill(
        name="test_isolation",
        description="Each test case must be independent and idempotent",
        agent_role="qc",
        category="testing",
        template="Use fixtures for setup/teardown. Never depend on test order.",
        confidence=0.9,
        approved=True,
    ))

What We Built

This article added three layers to the agent system:

Episodic memory via ChromaDB — agents store reflections after every task and retrieve similar experiences before future tasks.
Semantic memory via the Skills Library — recurring patterns are promoted to skills with confidence scores and human approval gates.
Knowledge sharing via the KnowledgeBroker — one agent’s lessons can be shared with the rest of the team through a proposal-and-approval workflow.

The result is a system that genuinely improves over time — not through fine-tuning, but through accumulated experience, the same mechanism that makes human teams better with practice.

In Part 11, we tackle deployment: containerizing the multi-agent system, orchestration infrastructure, and running the pipeline in production with proper observability. The agents are smart. Now we need them to be reliable.

Export for reading

Agent Learning: Memory, RAG, and Skills That Grow Over Time (Part 10 of 12)

1. The Three Types of Agent Memory

2. The Memory Store: ChromaDB for Episodic Memory

3. Task Reflection: Learning After Every Run

4. The Skills Library: Semantic Memory

5. Injecting Memory into Agent Prompts

7. Learning from External Sources

8. Wiring It All Into the Graph

9. Measuring Improvement

10. Practical Considerations

Memory Hygiene

Memory Decay

Cost Control

The Cold Start Problem

What We Built

Comments

On this page

Agent Learning: Memory, RAG, and Skills That Grow Over Time (Part 10 of 12)

Agent Learning: Memory, RAG, and Skills That Grow Over Time (Part 10 of 12)

1. The Three Types of Agent Memory

2. The Memory Store: ChromaDB for Episodic Memory

3. Task Reflection: Learning After Every Run

4. The Skills Library: Semantic Memory

5. Injecting Memory into Agent Prompts

6. Inter-Agent Knowledge Sharing

7. Learning from External Sources

8. Wiring It All Into the Graph

9. Measuring Improvement

10. Practical Considerations

Memory Hygiene

Memory Decay

Cost Control

The Cold Start Problem

What We Built

Comments

Agent Learning: Memory, RAG, and Skills That Grow Over Time (Part 10 of 12)