I blocked off a Friday afternoon to review a new open-source agent framework. Three hours later I was still at my desk, rereading the same files — not because the code was hard, but because I kept stopping to make sure I hadn’t misread.

The moment that made me pause was in run_agent.py. I was expecting a standard orchestration loop: call the model, get a response, execute tools, repeat. That loop is there. But it sits inside a larger runtime that runs seven other loops simultaneously, each operating at a completely different timescale, each feeding its outputs into the others.

That is not a chatbot. That is an operating system for intelligence.

This post is my attempt to explain what Nous Research actually built with Hermes Agent — and why the architectural choice matters for anyone building AI agents in 2026.


The Problem with Every Other Agent Framework

Before we get into Hermes, let me describe the standard model you see in LangChain, CrewAI, and even OpenAI Assistants.

All of them run a single temporal loop:

User prompt → Construct context → Call LLM → Execute tools → Return response

That loop is stateless by default. Every time you start a new session, the agent begins from zero. It doesn’t remember that you prefer concise outputs. It can’t recall the workaround it found last Tuesday for the same API error. It doesn’t get faster at the tasks you give it repeatedly.

The agent is just as naive on day 90 as it was on day 1.

This is not a flaw in the implementations — it’s a design assumption. These frameworks were built to help developers compose AI-powered pipelines. They’re excellent at that. But they were not designed to model an agent that accumulates experience over time.

Hermes was.


The 8 Temporal Loops

Nous Research released Hermes Agent in February 2026. The core insight is simple to state but hard to implement: instead of one loop running at human conversation speed, the agent runs eight loops at different timescales. The slower loops manage and improve the faster ones.

Here is how they work.

Loop 1 — Core Agent (milliseconds to minutes): The Heartbeat

This is the familiar loop. The central AIAgent class processes input, constructs a prompt, routes to the provider, calls the model, executes tools, and displays the response. Session state is persisted to SQLite after every turn.

Everything else exists to make this loop faster and smarter over time.

Loop 2 — Ralph /goal (minutes to hours): Long-term Goal Arbiter

When you run /goal "migrate all tests to pytest", Hermes enters a judge-checked pursuit loop. After every turn, an auxiliary judge model evaluates whether the goal’s acceptance criteria are met. If not, and if the turn budget remains (default: 20 turns, configurable via goal_max_turns), the worker continues autonomously. Goals persist across sessions — /goal resume picks up exactly where you left off.

This is the loop that handles work that can’t fit in a single conversation.

Loop 3 — Self-Improvement (post-task): Saves Skills to Files

After any task that involved five or more tool calls, error recovery, user corrections, or a non-obvious workflow, Hermes runs a post-task analysis. If the workflow was reusable, it serializes the procedure as a Markdown file with YAML front matter into ~/.hermes/skills/.

This is not just logging. Skills are living documents. The agent continuously patches them using targeted string replacements when it discovers a better approach. Hermes ships with 91 built-in skills; the community has published 520+ more.

Loop 4 — Curator (every 7 days): Cleans the Skill Repository

As skills accumulate, quality degrades if left unmanaged. The Curator runs on a weekly cycle, grades every skill in the repository, consolidates overlapping procedures, and prunes ones that have become obsolete or counterproductive. This was introduced in v0.12.0 and is what prevents the skill library from becoming a graveyard of stale procedures.

Loop 5 — Memory (per session): Records Cross-Session Habits

Two files — MEMORY.md (~2,200 chars, ~800 tokens) and USER.md (~1,375 chars, ~500 tokens) — are loaded at session start via a frozen-snapshot pattern that preserves Anthropic’s prefix caching. When the agent learns something stable about your preferences or environment, it writes to disk immediately.

The critical design detail: memory writes do not mutate the current session’s system prompt. Changes appear in the next session. This prevents mid-session context corruption and keeps the cache hit rate high, which reduces input token costs by roughly 75% on multi-turn conversations.

Loop 6 — Kanban (every 60 seconds): Orchestrates Parallel Tasks

The dispatcher loop runs inside the gateway process and ticks every 60 seconds. Each tick runs a six-phase cycle:

  1. Reclaim stale claims (15-minute TTL)
  2. Crash detection via kill(pid, 0) POSIX probes
  3. Auto-decompose triage tasks (capped at 3 per tick)
  4. Dependency promotion — moves tasks from todo to ready when parents complete
  5. Atomic worker spawning via hermes -p <profile> chat
  6. Sleep until next tick

Tasks have seven statuses: triage | todo | ready | running | blocked | done | archived. Workers communicate via kanban_show(), kanban_heartbeat(), and kanban_complete() — never via raw CLI commands. A circuit breaker auto-blocks after 2 consecutive failures.

Loop 7 — Context Compression (automatic): Keeps API Costs Stable

Two compressors operate at different thresholds. The agent compressor fires at 50% of the context window and uses accurate API-reported token counts. The gateway hygiene compressor fires at 85% as a safety net, using rough token estimates.

The four-phase algorithm: replace old tool results over 200 chars with placeholders (no LLM call needed), segment messages into a protected head, summarizable middle, and recent tail, generate a structured summary via an auxiliary LLM (budget: 20% of content tokens, min 2K, max 12K), then assemble the compressed output.

The critical detail: subsequent compressions update the previous summary rather than regenerating from scratch. This prevents context quality degradation over long sessions. You can run 2-hour coherent agent sessions within a 100K token window.

Loop 8 — Sub-agents (parallel): Multiplies Throughput

A parent agent can spawn child agents — default cap is 3, configurable via kanban.max_in_progress_per_profile. Each child runs with isolated context, a restricted toolset, and a separate terminal session. Workers receive HERMES_KANBAN_TASK and HERMES_KANBAN_WORKSPACE environment variables for scope isolation.

A built-in safety gate prevents workers from mutating tasks they don’t own: "worker is scoped to task t_abcd; refusing to mutate t_efgh." Run history with structured handoffs allows retry workers to see prior attempt errors without re-running failed work.


How the Loops Feed Each Other

The compound effect is mechanical. There is no magic here — just well-designed data flow between loops operating at different timescales.

flowchart TD
    L1[Loop 1: Core Agent\nms–min] --> L3[Loop 3: Self-Improvement\npost-task]
    L3 --> SKILLS[(Skill Files\n~/.hermes/skills/)]
    SKILLS --> L1
    SKILLS --> L4[Loop 4: Curator\n7-day cycle]
    L4 --> SKILLS
    L1 --> L5[Loop 5: Memory\nper-session]
    L5 --> MEMORY[(MEMORY.md\nUSER.md)]
    MEMORY --> L1
    L2[Loop 2: /goal\nmin–hours] --> L1
    L1 --> L7[Loop 7: Compression\nautomatic]
    L7 --> L1
    L6[Loop 6: Kanban\n60s tick] --> L8[Loop 8: Sub-agents\nparallel]
    L8 --> L1
    L8 --> L3

    style L1 fill:#1e40af,color:#fff
    style L2 fill:#7c3aed,color:#fff
    style L3 fill:#059669,color:#fff
    style L4 fill:#d97706,color:#fff
    style L5 fill:#dc2626,color:#fff
    style L6 fill:#0891b2,color:#fff
    style L7 fill:#9333ea,color:#fff
    style L8 fill:#be185d,color:#fff

The compound effect works like this:

  1. L1 executes work → triggers L3 when complexity threshold is met
  2. L3 saves skills → future identical tasks skip the tool-exploration phase
  3. Skills reduce token cost — 50 skills index to ~630 tokens via progressive disclosure vs. 25K+ if inlined — so L7 fires less aggressively, coherence lasts longer
  4. L5 memory preserves project context → less steering overhead per turn
  5. L6 Kanban enables parallel work → multiple L1 instances run simultaneously → skills accumulate faster
  6. L8 sub-agents each run their own L1/L3/L5 loops → skill outputs are shared across the parent’s repository
  7. L2 /goal drives multi-turn persistence → more complex tasks complete → higher-value skills get generated
  8. L4 Curator prunes low-quality skills → retrieval precision improves → less noise in future context
gantt
    title 8 Loops at Different Timescales
    dateFormat  X
    axisFormat  %s

    section Milliseconds
    Loop 1 - Core Agent heartbeat       :active, l1, 0, 1

    section Seconds
    Loop 7 - Context compression        :active, l7, 0, 5

    section Minutes
    Loop 2 - /goal arbiter check        :active, l2, 0, 30
    Loop 6 - Kanban dispatcher tick     :active, l6, 0, 60

    section Hours
    Loop 2 - /goal full pursuit         :active, l2b, 0, 240
    Loop 8 - Sub-agent session          :active, l8, 0, 180

    section Sessions
    Loop 5 - Memory write               :active, l5, 0, 1440
    Loop 3 - Skill save post-task       :active, l3, 0, 1440

    section Weeks
    Loop 4 - Curator prune cycle        :active, l4, 0, 10080

The agent that ran a migration task today is measurably faster at migration-like tasks next week — not because the model changed, but because the procedure is now a skill that costs 40 tokens to load instead of requiring 15 tool calls to rediscover.


The 40% Speed Claim — What the Evidence Actually Shows

The widely-cited figure is that Hermes with 20+ accumulated skills completes similar tasks 40% faster than a fresh instance. Third-party writeups attribute this to Nous Research’s internal benchmarks.

I could not locate the primary source — no methodology document, no GitHub release note, no published benchmark. Treat the specific number as unverified.

What is confirmed and mechanically sound:

  • Progressive disclosure architecture: 50 skills load as ~630 tokens (an index of titles and descriptions). Full content is fetched only for matched skills. This means skill accumulation does not linearly increase per-turn costs.
  • Procedure elimination: when a skill matches the current task, the agent skips the tool-exploration phase entirely. For a task that required 8 tool calls to figure out the first time, the second execution runs in 1 call (load skill, execute procedure).
  • Context stability: memory files inject stable project context without requiring re-explanation each session.

The mechanism for speed improvement is real and well-documented even if the exact 40% number is unverified marketing.


Framework Comparison

DimensionLangChain / LangGraphCrewAIOpenAI AssistantHermes Agent
Mental modelOrchestration graphRole-based crewManaged threadsPersistent brain
StateStateless by defaultStateless between runsPlatform-managedFully persistent: SQLite + files
Self-improvementNoneNoneNoneCore design: skills, memory, curator
Multi-agentVia LangGraph nodesNative via rolesLimitedNative: Kanban + sub-agent spawning
DeploymentFramework for embeddingFramework for embeddingSaaS API onlyLocal terminal, Docker, SSH, Modal
ControlMaximum (explicit graph)MediumMinimumMedium-high (local, inspectable)
Best fitPrecise RAG pipelinesStructured multi-role workQuick prototypingAgents improving over weeks/months

The core distinction: LangChain and CrewAI are frameworks for building agents. Hermes is an agent you run. They serve different needs and are largely complementary — LangChain can handle precise retrieval inside a Hermes tool; CrewAI can coordinate multiple Hermes sessions as specialist workers with distinct memory namespaces.


What the Source Code Actually Looks Like

The refactoring history is instructive. The original codebase was a 16,083-line monolith. After the refactor, the same functionality lives in 3,821 lines across 14 modules — a 76% reduction.

Reading the modules in order, the design becomes legible:

  • run_agent.py — the core loop; surprisingly compact given what it orchestrates
  • context_manager.py — the compression logic; the two-threshold design is where most of the production robustness lives
  • kanban/dispatcher.py — the six-phase tick loop; the POSIX kill(pid, 0) crash detection is the kind of detail you only add after a real production failure
  • skills/ — the skill store; the YAML front matter schema makes it clear these are meant to be human-readable and hand-editable, not just machine artifacts

The codebase reads as written by people who had run this in production and paid the cost of the mistakes. The circuit breaker in the Kanban loop. The frozen-snapshot memory pattern. The two-compressor architecture with different thresholds. These are not academic design choices — they are scar tissue from real usage.


What This Means for Agent Design in 2026

The architectural lesson from Hermes is not “run eight loops.” It is: agents that improve require durable behavioral artifacts.

A stateless agent is bounded. Its ceiling is the quality of its prompt and the capability of its model. A stateful agent with well-designed feedback loops is unbounded in principle — it gets better at your specific workflows, in your specific environment, over your specific usage patterns.

The practical implications for anyone building agents:

  1. Design for durability first. Skills, memory, and structured task state are not features to add later. They determine whether your agent gets better or stays static.
  2. Multiple timescales are necessary. You need fast loops for execution, medium loops for goal pursuit, and slow loops for maintenance. The slow loops are what prevent quality degradation over time.
  3. Compression is not optional. Any agent intended to run long sessions needs a principled context management strategy. The two-threshold approach in Hermes is worth studying.
  4. Sub-agents should be scoped, not autonomous. The Kanban safety gate — refusing to mutate foreign tasks — is a design pattern that prevents the cascade failures that autonomous multi-agent systems are prone to.

Conclusion: Not a Chatbot. An AI Operating System.

After three hours with the Hermes source code, the framing that stuck with me is this: every AI agent framework I have used before treats intelligence as a function to call. Hermes treats intelligence as a process to run.

The difference shows up at month three. A LangChain pipeline built in February 2026 runs exactly as it did in February 2026. A Hermes instance running the same tasks in May 2026 has accumulated dozens of skills specific to your codebase, your team’s conventions, your recurring task patterns. It routes around known failure modes. It skips re-exploration of solved problems. It is, measurably, not the same agent it was three months ago.

That is the compound effect of eight loops running at eight different timescales, feeding each other continuously.

Most AI agents are very fast at forgetting everything they learn. Hermes was designed to remember.


References: NousResearch/hermes-agent · Official Docs · Kanban Docs · Context Compression Docs · NVIDIA Blog

Export for reading

Comments