Hermes Deployment

A fresh direction for the agent layer at Dungeon Books, evaluated against marty-roadmap. Triggered by reviewing Hermes Agent on 2026-05-03.

Decision frame

The product goal is automate Dungeon’s back-office, augment Carrie + Panat. Not “build a multi-tenant agent platform for partner shops” — that’s a future possibility, not today’s job. With that reframe, marty-roadmap’s plan to spend ~6 weeks rebuilding the agent loop, persona system, observability, scheduled-job primitive, and tool registry from scratch is wrong-sized. Hermes already ships those primitives.

The previous Anthropic-first posture is also obsolete for back-office automation. Going Kimi K2.6 on Fireworks is ~5-7x cheaper and ~2x faster than Sonnet, with strong agentic tool use. Hermes is provider-agnostic by design, which suddenly aligns with the right choice instead of fighting it.

Architecture

[Tailscale tailnet]
  └── Proxmox host
        ├── HA VM (existing)
        ├── CosmosOS (existing)
        └── Hermes LXC (new)
              ├── Hermes Agent runtime
              ├── MCP servers (bundled)
              │     ├── policies
              │     ├── hi-events
              │     ├── square (read-only)
              │     ├── hardcover
              │     └── github (escalation)
              └── ~/.hermes/
                    ├── config.yaml
                    ├── .env (Fireworks key)
                    ├── SOUL.md
                    ├── memories/ (cross-session)
                    ├── sessions/ (SQLite + FTS5)
                    └── skills/

[External]
  ├── Fireworks API (Kimi K2.6)
  ├── Discord (gateway adapter)
  ├── Square API (via MCP)
  ├── Hi.Events API (via MCP)
  ├── Hardcover API (via MCP)
  └── github.com/dungeonbooks/ops (escalation target)

Single LXC, single Hermes process, multiple profiles. No public ingress — admin access via Tailscale. Outbound to API providers.

LXC sizing

1 vCPU, 2 GB RAM, 16 GB disk. Debian 12. Scale on signal, not prediction.
Hermes is mostly idle waiting on API responses; LLM compute is on Fireworks.
Tool execution (file ops, MCP calls) is bursty but light.
Bump if: local browser/Playwright usage, voice mode with local TTS/STT, heavy code-sandbox use.
Self-hosted Langfuse later goes in a separate LXC (~2 GB).

Provider config

Fireworks API direct, Kimi K2.6:

# ~/.hermes/config.yaml
model:
  provider: openai-compatible
  base_url: https://api.fireworks.ai/inference/v1
  model: accounts/fireworks/models/kimi-k2-instruct-0905  # verify slug

# ~/.hermes/.env
FIREWORKS_API_KEY=...

$30 in e x i s t in g F i re w or k scre d i t s i s t h es t a r t in g r u n w a y . Wa t c h t h e d a s hb o a r d a f t er t h e f i rs t re a l sess i o n t oco n f i r m p ro m pt - c a c h e hi t s l an d a tt h ec a c h e d r a t e ($ 0.16/M) rather than uncached ($0.95/M). Hermes’ chat-completions mode should trigger Fireworks caching automatically — verify, don’t assume.

Firepass ($7/week unlimited K2.5 Turbo) is invite-only and not currently available. Stays open as a future config flip if access arrives.

Profiles

Two to start:

marty — customer-facing Discord bot. SOUL.md = burnt-out wizard book voice. Allowed tools: hardcover, scryfall, manapool, RSS, policies (read), escalation. Allowed channels: Dungeon Books Discord guild.
ops — back-office Carrie/Panat tool. SOUL.md = direct, professional, terse. Allowed tools: square (read), hi-events, hardcover, postgres-domain, github (escalation), filesystem (vault). Allowed channels: CLI + private Discord channel.

Profiles isolate config, memory, sessions, credentials. Run concurrently in the same LXC.

SOUL.md

Two SOUL.md files, one per profile (~/.hermes/profiles/marty/SOUL.md, ~/.hermes/profiles/ops/SOUL.md).

Marty SOUL.md — port from existing marty/src/ai_client.py system prompt. Lowercase voice, chill, sells books. Add the support-mode register (capitalized, formal) as documented in interim-marty-support-mode-decided-2026-05-03. Persona-switch announced (“switching to support mode…”).
Ops SOUL.md — write fresh. Direct, no whimsy, references policies and the vault. Knows the difference between Carrie and Panat as users (Hermes’ user modeling will pick this up over time).

Persona drift on Kimi K2.6 vs Sonnet is the main risk. Mitigation: golden eval set per profile, replayed on every SOUL.md change.

MCP servers

Bundle in the same LXC. Each is a small Python service (FastMCP or similar) reading tenant config (Square creds, etc.) from .env. Wire in this order:

policies — reads references/policies/ from the vault (synced or git-cloned into the LXC). Tools: list_policies(), get_policy(slug). Mirrors the index-file pattern from srctoolspolicy. Simplest, no external API. Good shakedown of the MCP wiring.
hi-events — wraps the Hi.Events API. Tools: find_ticket(email_or_name), ticket_status(id). The original Carrie pain point (refund window lookups). See hi-events-api.
square — read-only Square wrapper. Tools: lookup_member(query), recent_orders(member_id), inventory_check(sku), daily_summary(date). See square-integration.
hardcover — book data. Already shaped right in current Marty repo; port as MCP. Tools: search_books(query), book_details(id).
github — escalation target. Tools: open_issue(title, body, labels) against dungeonbooks/ops. Either shells out to gh CLI or uses the GitHub API directly. See pattern-1-escalation-as-liberal-paternalism.
postgres-domain (later) — members, orders, opt-out, conversation history. Defer until Square + Hi.Events cover the immediate cases.

Policy-and-Hi.Events first because they collapse the most painful manual workflow (Carrie’s refund lookups).

Skills (first batch)

Each is a ~/.hermes/skills/<name>/SKILL.md with frontmatter + procedure markdown. Not Python.

policy-quote — given a customer question, pick the right slug from the policies index, fetch via MCP, answer in-voice. Quote policy directly, never promise exceptions.
escalate-issue — structured escalation. Builds title + body with conversation context, opens issue via github MCP, optionally pings via notify_human. Used when policy is exceeded, ambiguous, or the customer pushes for an exception.
ticket-lookup — Hi.Events ticket query, returns refund-window status, suggests policy-quote (event-ticket-policy) or escalate-issue based on result.
square-day-summary — daily reconciliation. Recent orders, top items, anomalies. Output: short markdown report. Cron-attached (see below).
trending-books-research (later) — extends rss-as-the-entry-point-for-agent-research-jobs. Reads trending feeds, cross-references inventory + sales velocity, drafts an order-recommendation issue. Runs weekly, output is an escalation.

Skill format steals Hermes’ progressive disclosure — listing fits in ~3k tokens, full content loads only on selection. Cheap to have many.

Cron jobs

Daily 6am ET — square-day-summary skill, output to #ops Discord channel.
Weekly Monday 8am ET — trending-books-research skill (once shipped), output as an issue in dungeonbooks/ops.
Weekly Friday 10am ET — RSS digest skill replacing existing feeds.py plumbing. Migrate after Discord adapter parity is reached.

Hermes cron creates a fresh agent per job and attaches the skill. State doesn’t leak between runs.

Discord transport

Hermes has a built-in Discord adapter. The current Marty repo has Discord-specific richness (MTG card embeds, ISBN→bookshop.org links, threaded RPG news digests) the generic adapter won’t replicate.

Hybrid for now.

Keep marty repo running on its current Discord guild role for embed-rich book/MTG features (#book-recs, #mtg-cards, #rpg-news).
New ops Hermes profile takes a separate Discord channel (#ops or DMs to Panat/Carrie). No embed work needed, just text + escalation links.
New marty Hermes profile (the rewrite) ships only when it can match or exceed the existing repo’s embed quality. Probably 4-6 weeks out, possibly never if hybrid stays useful.

This avoids the trap of breaking the things current-Marty does well in pursuit of a unified agent. Two surfaces, different jobs.

Migration from existing Marty repo

Goal: don’t break what works. Port what fits.

Stays in current marty repo (for now):

Discord transport for #book-recs, #mtg-cards, #rpg-news.
MTG card embed renderer (Scryfall + price lookups).
ISBN→bookshop.org affiliate validation.
Existing RSS digest plumbing (feeds.py) until cron-attached skill version proves out.
Postgres conversations table for current Discord users.

Moves to Hermes:

Customer support flows (policies, refunds, escalation) — new surface, not a port.
Back-office ops (reconciliation, member lookup, ticket lookup) — new surface, not in marty today.
Scheduled research jobs (trending books) — new, was planned in marty-roadmap but never built.
Carrie’s day-to-day chat with the ops layer — new surface.

Eventually consolidates if the Hermes Discord adapter + custom embed-skill work matures enough to replace current-Marty’s Discord features. Defer until there’s a reason.

Observability

Hermes built-in trajectory + session storage is enough for week 1-2.
Self-hosted Langfuse on a second LXC when the questions get sharper than “what did the agent do” — e.g., per-skill cost attribution, eval-set regression tracking, escalation throughput. See observability.
Fireworks dashboard for raw token-spend monitoring. Confirm cache hit rates after week 1 of usage.

Backup strategy

Everything stateful lives in ~/.hermes/:

config.yaml, .env, SOUL.md files — git these (minus .env).
memories/MEMORY.md, USER.md — irreplaceable accumulated context. Snapshot nightly.
sessions/*.db — SQLite session history. Snapshot nightly.
skills/*/SKILL.md — git these.
logs/ — discardable.

Proxmox snapshot the LXC volume nightly. Retain 14 days. The thing that hurts to lose is what Hermes has learned about Carrie and the shop over time.

Sequencing

Week 1 — stand up the substrate.

Create LXC, install Hermes, configure Fireworks K2.6.
One profile (ops), one MCP server (policies), one skill (policy-quote).
Talk to it via CLI. Get the loop working end-to-end. No Discord yet.

Week 2 — back-office MVP.

hi-events and square MCP servers.
Skills: escalate-issue, ticket-lookup, square-day-summary.
dungeonbooks/ops repo created for escalation target.
First cron job: daily Square summary to #ops.

Week 3 — customer-facing.

Hermes Discord adapter wired for ops profile (private channel only).
Second profile (marty) configured but not yet on Discord.
Eval harness: golden cases from the Carrie interview, replayed against both profiles.

Week 4 — research jobs.

Trending-books RSS skill, weekly cron.
hardcover MCP, port Scryfall logic if/when needed for skills.
Decide: does the new marty profile go live on Discord, or does current-Marty stay?

Week 5+ — consolidation decisions.

Langfuse if needed.
Postgres-domain MCP if needed.
Migration path for current-Marty Discord features only if hybrid friction emerges.

What this changes about marty-roadmap

Replaces: Claude Agent SDK refactor (weeks 1-2). Hermes is the agent loop.
Replaces: custom src/tools/escalation/, src/tools/policy/, src/tools/square/, src/tools/hi_events/. These become MCP servers + Hermes skills.
Replaces: persona-as-YAML primitive. SOUL.md per profile is the equivalent.
Replaces: custom Langfuse wiring as a week-1 priority. Hermes’ built-in observability covers v1.
Preserves: escalation-as-liberal-paternalism pattern. The shape is the same; implementation moves to MCP + skill.
Preserves: scheduled-research-job pattern. Hermes cron + skills is the cleaner version of the RSS-extension plan.
Defers indefinitely: persona-as-YAML as platform tenancy primitive, multi-tenant deployment story, Phil-onboarding-readiness. Out of scope until partner-shop demand materializes. The platform-vs-application reframe in the roadmap was a fork in the road; deploying Hermes commits to “application for Dungeon Books today, platform questions later.”

Open questions

Persona quality on K2.6 vs Sonnet for Marty’s voice. Eval-test before declaring parity.
Hermes Discord adapter feature parity. Does it support threading the way marty does? Custom embeds? Streaming responses? Find out before committing migration.
Cross-thread memory tier on Nous Portal vs Hermes’ built-in. Built-in is sufficient on persistent infra (Proxmox snapshots), but worth re-evaluating if memory becomes a product surface (e.g., Carrie wanting to query “what did we decide last week”).
Where does the current marty Postgres conversations data go? If/when the Hermes marty profile takes over Discord, do we migrate conversation history into Hermes sessions, or start fresh? Probably start fresh — different schema, different memory model.
Operator UX for editing skills and SOUL.md. Today: SSH into the LXC, edit markdown. Eventually: Carrie should be able to tweak Marty’s voice or add a policy without engineer help. Defer until the shape is stable.
What happens if Fireworks deprecates K2.6 mid-deployment. Mitigation: model is a one-line config; OpenRouter/Moonshot route is the obvious fallback.

Cross-references

marty-roadmap — the prior plan this supersedes for back-office and customer-support work.
agent-coordinated-operations — broader thesis. Hermes deployment is the first concrete instance.
symphony-pattern-agent-control-plane — escalation target (dungeonbooks/ops) is the first node of this loop.
policies — the policy directory the policies MCP serves.
hi-events-api — Hi.Events MCP source.
square-integration — Square MCP source.
marty — current state of the Marty Discord bot.
2026-05-02 — Carrie interview that surfaced the support-routing gap; eval set seed.

Quartz 4

Explorer

hermes-deployment