If you have read my earlier post on Hermes Agent’s eight temporal loops, you already know that the framework’s self-improvement story hinges on two loops: Loop 3 saves skills after task completion, and Loop 6 retrieves them at the start of future tasks. Between those two events sits a decision problem that most Hermes users never think about until it bites them: how do you tag a skill so you can actually find it later?

I spent several days running experiments with Hermes’s skill library, deliberately breaking my tagging system and rebuilding it. This post is what I learned. It is a practical guide to designing a tag taxonomy that works at scale — not just for ten skills, but for the hundreds you will accumulate after a few months of daily use.


Why Tagging Is the Foundation of Skill Retrieval

Hermes persists reusable skills as .md files under ~/.hermes/skills/. Each file contains the skill description, the code or pattern, and — critically — a YAML frontmatter block with tags. When Loop 6 runs at task start, it performs a two-pass retrieval:

  1. Tag match: skills whose tags overlap with inferred task tags are loaded first.
  2. Semantic similarity: among tag-matched skills, the top-N by embedding similarity are injected into context.

The semantic step is powerful but bounded: if a skill’s tags never match the inferred task tags, the semantic step never sees it. This means poor tagging creates retrieval dead zones — skills that exist but are never found. All the self-improvement Loop 3 produces is wasted.

The good news is that fixing this is entirely within your control. You design the taxonomy; Hermes follows it.


The Tagging Flow: End to End

Here is what happens from the moment a task completes to the moment a skill is retrieved on a future task:

flowchart TD
    A[Task completes - Loop 3 fires] --> B[Hermes generates skill summary]
    B --> C{User reviews\nor auto-save?}
    C -- Auto-save --> D[Tags inferred from task context]
    C -- User reviews --> E[User edits tags in skill file]
    D --> F[Skill written to ~/.hermes/skills/]
    E --> F
    F --> G[Loop 4 Curator runs periodically]
    G --> H{Duplicate or\nlow-quality?}
    H -- Yes --> I[Curator merges or flags]
    H -- No --> J[Skill retained as-is]
    I --> F
    J --> K[New task begins - Loop 6 fires]
    K --> L[Task intent analyzed for tag candidates]
    L --> M[Tag index scanned for matches]
    M --> N[Candidate skills ranked by semantic similarity]
    N --> O[Top-N skills injected into context]
    O --> P[Agent executes task with skill context]

The critical hand-off is at step D and E: the tags assigned at save time determine what the tag index looks like at retrieval time. Everything downstream is automatic. The only moment you can influence the system is when the skill is first written.


Flat vs. Hierarchical Taxonomies

The first design decision is whether to use flat tags or a hierarchy.

Flat tags look like this:

tags: [python, api, rate-limiting, retry, exponential-backoff]

Hierarchical tags look like this:

tags: [lang/python, domain/api-integration, pattern/retry, pattern/exponential-backoff]

Here is an honest comparison:

DimensionFlatHierarchical
Initial setup costLowMedium
Tag collision riskHigh (e.g., “api” matches everything)Low (namespacing prevents collisions)
Retrieval precision at 50 skillsGoodGood
Retrieval precision at 200+ skillsDegradesStays stable
Loop 6 tag inference accuracyVariableMore consistent
Curator merge decisionsHarder (no structure)Easier (merges within namespace)

My recommendation: start flat if you are new to Hermes. Switch to hierarchical once you hit around fifty skills. The transition is a one-time effort — a sed pass over your skills directory plus an afternoon of cleanup.

For the rest of this post, I will use hierarchical tags in examples, because they are what I now use and they scale better.


Designing Your Tag Taxonomy

A practical taxonomy has four namespaces. You do not need more.

1. lang/ — Programming language or runtime

lang/python
lang/typescript
lang/bash
lang/sql
lang/rust

This is the most important namespace. Language is the fastest filter. A Python skill is useless during a TypeScript task — loading it wastes context tokens.

2. domain/ — The problem domain

domain/api-integration
domain/database
domain/auth
domain/devops
domain/data-processing
domain/testing
domain/llm

Domain is your second filter. It captures what the skill is about, independent of how it is implemented.

3. pattern/ — The reusable technique or pattern

pattern/retry
pattern/pagination
pattern/rate-limiting
pattern/caching
pattern/streaming
pattern/error-handling
pattern/mocking

This is where retrieval precision comes from. Two skills can share lang/python and domain/api-integration but differ on pattern/retry vs. pattern/pagination. The pattern tag is what tells Loop 6 which one to surface.

4. tool/ — External libraries, frameworks, or services

tool/httpx
tool/sqlalchemy
tool/pytest
tool/openai-sdk
tool/anthropic-sdk
tool/redis
tool/stripe

Tool tags are optional but high-value for library-specific patterns. If you have a skill for handling Anthropic SDK streaming errors, tagging it tool/anthropic-sdk means it appears whenever you are working with that SDK — even if the domain and pattern do not align perfectly.


Naming Conventions for Skills

The skill filename and the title field inside the file both feed into Loop 6’s semantic search. Apply these conventions:

Filenames: use verb-object-qualifier format.

# Good
handle-rate-limit-exponential-backoff.md
paginate-cursor-based-api.md
mock-openai-responses-pytest.md

# Bad
rate_limit.md
api_stuff.md
useful_pattern.md

Titles: full phrases, not keywords.

# Good
title: "Handle API rate limits with exponential backoff and jitter"

# Bad
title: "Rate limit"

The title is what the embedding model encodes. Vague titles produce vague embeddings and weak semantic matches.

Description: one sentence describing when to apply the skill, not what it does.

# Good
description: "Use when calling any third-party REST API that returns 429 responses under burst load."

# Bad
description: "Exponential backoff implementation."

The “when to use” framing is important because Loop 6 is trying to answer exactly that question: should I load this skill right now?


Real Examples: Well-Tagged vs. Poorly-Tagged Skills

Example 1: Rate-Limit Retry (Well Tagged)

---
title: "Handle API rate limits with exponential backoff and jitter"
description: "Use when calling any third-party REST API that returns 429 responses under burst load."
tags:
  - lang/python
  - domain/api-integration
  - pattern/retry
  - pattern/rate-limiting
  - pattern/exponential-backoff
  - tool/httpx
created: 2026-05-14
---

```python
import asyncio, random, httpx

async def request_with_backoff(client: httpx.AsyncClient, url: str, max_retries: int = 5) -> dict:
    for attempt in range(max_retries):
        resp = await client.get(url)
        if resp.status_code != 429:
            resp.raise_for_status()
            return resp.json()
        wait = (2 ** attempt) + random.uniform(0, 1)
        await asyncio.sleep(wait)
    raise RuntimeError(f"Exceeded {max_retries} retries for {url}")

This skill will surface when Loop 6 infers tags like `domain/api-integration`, `pattern/retry`, or `tool/httpx`. Four independent tag paths lead here.

### Example 2: Same Skill, Poorly Tagged

```yaml
---
title: "retry"
description: "retry logic"
tags:
  - python
  - api
created: 2026-05-14
---

The tags python and api match virtually every skill in the library. The skill competes with everything else and the semantic match on “retry” as a title is weak. In a library of 100 skills, this version will likely never be retrieved.

Example 3: Database Pagination (Well Tagged)

---
title: "Paginate large SQLAlchemy query results without loading entire table"
description: "Use when iterating over more than 10k rows in a SQLAlchemy ORM query to avoid memory exhaustion."
tags:
  - lang/python
  - domain/database
  - pattern/pagination
  - pattern/memory-efficiency
  - tool/sqlalchemy
created: 2026-05-22
---

Notice the pattern/memory-efficiency tag. That is not about SQL or ORM — it is about why you would use this pattern. When a user works on a data export job and Loop 6 infers pattern/memory-efficiency, this skill appears even if the user never mentioned SQLAlchemy.


How the Curator (Loop 4) Uses Tags

Loop 4, the Curator, runs periodically to merge duplicate skills and flag low-quality ones. Its merge heuristic is tag-based: two skills with identical tag sets and high title similarity are merge candidates.

This creates an important side effect: your tag design determines merge behavior. If you are inconsistent with naming — sometimes pattern/retry, sometimes pattern/retries, sometimes retry-logic — the Curator cannot identify duplicates that should be merged. You end up with fragmented variants of the same skill, each slightly different, none of them the canonical version.

To help the Curator:

  • Use singular forms consistently: pattern/retry not pattern/retries
  • Never abbreviate: domain/api-integration not domain/api
  • Commit to your namespace separator before you start: I use /, others use : or --

Run hermes curator --dry-run periodically to see what the Curator would merge. If it is not finding obvious duplicates, your tag consistency is probably off.


Building a Personal Skill Library Structure

After accumulating skills for a few months, I organize my ~/.hermes/skills/ directory into subdirectories that mirror the domain/ namespace:

~/.hermes/skills/
├── api-integration/
│   ├── handle-rate-limit-exponential-backoff.md
│   ├── paginate-cursor-based-api.md
│   └── parse-openapi-spec-for-endpoints.md
├── database/
│   ├── paginate-sqlalchemy-large-query.md
│   └── upsert-postgres-on-conflict.md
├── devops/
│   ├── parse-github-actions-workflow-yaml.md
│   └── build-minimal-docker-image-python.md
├── llm/
│   ├── stream-anthropic-response-with-tool-use.md
│   └── implement-prompt-caching-anthropic.md
└── testing/
    ├── mock-openai-responses-pytest.md
    └── snapshot-test-json-api-response.md

Hermes does not require this structure — it recursively scans the skills directory. But the subdirectory layout matches my domain namespace, which means when I manually browse my skill library, the filesystem structure and the tag structure are in sync. That consistency pays dividends when I am adding new skills by hand.


Retrieval Performance Tips

A few practical optimizations I have validated:

Keep the tag count between four and eight. Below four, the tag index has too few paths into the skill. Above eight, you introduce noise and the tag match becomes less selective. Four to eight is the sweet spot.

Always include at least one pattern/ tag. The pattern namespace is what differentiates skills within the same domain and language. It is the tag most likely to uniquely identify a skill.

Use tool/ tags liberally for SDK or library-specific code. Library APIs change and are specific. When Loop 6 infers tool/stripe from an import in the current file, every Stripe-tagged skill in your library becomes a retrieval candidate. This is high-precision context injection.

Do not tag for topic; tag for retrieval intent. Ask yourself: what would I be working on if I needed this skill? Tags should describe that context, not describe the skill itself. The title and description already describe the skill. Tags describe when it is relevant.


What Good Looks Like at Scale

A well-designed skill library at 200 skills should behave like this: you start a task involving a GitHub Actions workflow, and before you type a single character of code, Hermes has already loaded your parse-github-actions-workflow-yaml skill and your handle-rate-limit-exponential-backoff skill (because you are calling the GitHub API). You did not ask for either. They arrived because the tags were right.

That is the promise of Loops 3 and 6. They make the agent smarter over time without any active effort on your part — but only if the tagging system underneath them is well-designed. The loops are the engine. The taxonomy is the fuel quality. You do not notice the difference on day one, but by month three, a well-tagged library feels like a collaborator that knows your codebase. A poorly tagged one feels like a search engine that never quite finds what you need.

Spend an hour now on your taxonomy design. Your future self will thank you.

Export for reading

Comments