YantrikDB Hermes Plugin — Persistent Memory for Hermes Agents

Hermes is an open-source agent runtime. The yantrikdb-hermes-plugin gives Hermes agents persistent memory via YantrikDB — embedded mode by default, no separate server, no token-mint step, no cluster.

If you’re running a Hermes agent and want it to remember across conversations, this is the smallest possible integration: install the plugin, set three environment variables, restart the agent.

Installation

pip install yantrikdb-hermes-plugin

Pulls ~10 MB total (yantrikdb engine + plugin + bundled embedder). No torch, no transformers, no ONNX runtime.

Configuration (3 lines)

Add to ~/.hermes/.env:

YANTRIKDB_MODE=embedded
YANTRIKDB_DB_PATH=~/.hermes/memory.db
YANTRIKDB_NAMESPACE=default

That’s it. Verify the agent picked it up:

hermes memory status
# → yantrikdb available ✓

What it adds to a Hermes agent

The plugin registers three tools the agent can call autonomously:

Tool	What it does
`yantrikdb_remember`	Store a memory with importance + domain tags. ~0.08s first call (engine warmup), sub-ms after.
`yantrikdb_recall`	Semantic search across stored memories. Returns ranked results with `why_retrieved` explanations the agent can integrate.
`yantrikdb_stats`	Namespace stats — active memories, conflicts, decay state. Useful for the agent to introspect its own memory.

Default ranking uses YantrikDB’s full scoring: similarity × importance × decay × graph proximity. The why_retrieved field tells the agent why a memory surfaced (e.g., ["semantically similar (0.62)", "important (decay=0.98)", "graph-connected via Alice"]) — this lets the agent give natural-language explanations of recall, rather than just dumping vector hits.

Embedded vs HTTP mode

The plugin supports two backends:

Mode	When to use	Latency	Setup cost
`embedded` (default)	Single agent, local data, no replication needed	Sub-ms recall	3 env vars
`http`	Cluster deployment, replication, multi-agent shared memory	~10-30 ms	YantrikDB server + token mint + URL

For most Hermes agent setups (one agent, local memory), embedded is the right choice. For multi-agent systems or production with replication, switch to HTTP and run a YantrikDB server.

Example: agent with persistent memory

# Hermes agent run
python run_agent.py --base_url https://api.deepseek.com/v1 --model deepseek-chat

Without the plugin: every conversation starts blank.

With the plugin: the agent autonomously calls yantrikdb_remember on decisions, preferences, and facts during the conversation, and yantrikdb_recall at the start of subsequent turns. A real DeepSeek session was verified end-to-end on 2026-05-09 — 3 yantrikdb_remember calls + 1 yantrikdb_recall (correctly ranked) + 1 yantrikdb_stats, all sub-millisecond on the embedded backend.

The agent’s natural-language explanation of why it recalled what it recalled uses YantrikDB’s why_retrieved annotations directly — no extra prompting needed.

What you don’t get from the plugin

The plugin ships YantrikDB’s core memory primitives (record / recall / stats). It does not ship:

Skill management (/v1/skills/* endpoints) — skills are server-side in YantrikDB, and Hermes has its own filesystem-based skill catalog that stays canonical. See the Skill as Memory paper for why this separation is deliberate.
Schema validation for agent-written content — embedded mode has no admission control. Agent-written records are accepted as-is.
Raft replication — single-machine embedded backend by definition.
Knowledge graph operations — relate / entity_profile are available via the engine API but the plugin exposes only the three core tools by default.

For these, run the YantrikDB server and set YANTRIKDB_MODE=http.