Pyramus & Thisbe: Hermes Puppet Theater
Submission for the Hermes Agent Creative Hackathon (Nous Research × Kimi Moonshot). Due EOD Sunday 2026-05-03.
Context
Multiple Hermes Agent instances take on the roles of Shakespeare’s rude mechanicals (Quince, Bottom, Flute, Snout, Snug, Starveling) and stage The Most Lamentable Comedy and Most Cruel Death of Pyramus and Thisbe as a papercraft puppet show in the style of Paper Mario: The Thousand-Year Door. A Kimi agent plays the Athenian court audience providing live commentary.
The piece is metatheatrical commentary on the AI industry’s own theatrics. The humor is Shakespeare’s original humor: agents that refuse to trust theatrical illusion, over-literalize everything, break character to reassure the audience. The crude-on-purpose puppetry makes the conceit legible in a demo video.
Core thesis: the stage is the only coordination surface. No director. Agents watch the shared event log (the theater) and know their cues from the script + what other agents have just done. Models actual theater — rehearse, then perform without direction.
Qualifies for both Main Track (5k).
Creative premise
The mechanicals as agents, diegetically: They have Quince’s script (actual Shakespeare P&T text from Act 5 Sc 1). Each agent’s persona corrupts delivery and staging in character-specific ways:
- Quince — playwright/director, precious about his text, mispunctuates his own prologue
- Bottom — ham, wants to play every role, improvises “improvements,” upstages
- Flute — reluctantly plays Thisbe, reads stage directions aloud, rushes lines
- Snout — plays Wall, narrates his own wall-ness, no faith in audience imagination
- Snug — plays Lion, announces he’s Snug-the-joiner before roaring so as not to frighten ladies
- Starveling — plays Moonshine, over-explains his lantern/dog/thornbush
Failure modes are baked into persona, not prompted as “make mistakes.” Comedy emerges from earnest agents colliding.
Kimi audience (Theseus, Hippolyta, Philostrate): Reacts in real-time. Theseus indulgent, Hippolyta mocking (“This is the silliest stuff that ever I heard”), Philostrate deadpan. Natural heckling loop — canonical to the play.
Model assignment (heterogeneous by design)
Each agent literally is a different open/agentic model. The submission story: six+ earnest heterogeneous agents misperforming Shakespeare together, coordinating only via the stage.
| Agent | Model (production takes) |
|---|---|
| Quince | Z.ai GLM 5 |
| Bottom | OpenAI GPT-5.4 (verbose, suits the ham) |
| Flute | Google Gemini 3 Flash |
| Snout | Mistral Small 4 |
| Snug | MiniMax M2.7 |
| Starveling | Anthropic Sonnet 4.6 |
| Court (Kimi audience) | Moonshot Kimi K2.5 (free; satisfies Kimi Track) |
Prototype phase: all agents on Mistral Small 4 for fast iteration. Swap in heterogeneous cast for curated takes.
Models accessed via OpenRouter (single API surface). Nous Portal API ($10 credits) used for any Nous-hosted models. All models routed through Hermes Agent’s config-driven provider system (hermes model).
Expected total token spend across all takes: well under $20.
System architecture
┌───────────────────────────┐
│ Stage MCP server (Python)│
│ ─────────────────────── │
│ observe(since, timeout) │◄────┐
│ act(tool, args) │ │ long-polling
│ event log (JSONL) │ │ + tool calls
│ WebSocket broadcast │ │
└─────────┬─────────────────┘ │
│ │
┌───────────────────┴──────────────┐ │
│ 7 isolated Hermes Agent instances│─────────┘
│ HERMES_HOME=~/.hermes-{role} │
│ each with own model + persona │
│ (Quince, Bottom, Flute, Snout, │
│ Snug, Starveling, Court) │
└──────────────────────────────────┘
│
▼ WebSocket
┌──────────────────────────┐
│ Browser stage (vanilla JS)│
│ DOM puppets + CSS anim │
│ animalese audio per say │
│ typewriter bubble reveal │
└──────────────────────────┘
The stage is the only coordination surface. No director process. Each agent loops: observe() → decide → optionally act() → repeat. Long-polling observe(since, timeout=30s) keeps agents in a tight wait without burning tokens.
Determinism: every event appended to JSONL with {event_id, timestamp, agent_id, tool, args}. Stage renderer is a pure function of log → visual state. Enables replay, curation, post-hoc editing.
Generation is offline. Run many takes, curate the funniest/most coherent one, render the curated log for the demo video. No liveness fragility.
Hermes Agent integration (concrete)
From repo exploration (/home/panat/projects/dungeonbooks/hermes-agent):
- Isolation:
HERMES_HOMEenv var (hermes_constants.py:11-18). One directory per agent. All config/memory/sessions/skills scoped to that dir. - Programmatic driving:
run_agent.pyexportsAIAgent.run_conversation(prompt)— runs full model→tools loop until agent decides to stop. Each instance booted once with its character brief; the agent’s own tool loop carries the performance. - MCP: each agent’s
config.yamlundermcp_serverspoints at our Stage MCP server (stdio or HTTP). Stage tools appear as native tools to the agent. - No cron, no delegate, no subagents needed — the long-polling observe + internal agent loop is sufficient.
Discord gateway and ACP protocol exist but are unused here. Submission is local-only.
Agent skills (Stage MCP tool surface)
Minimal — enough to stage the play, not so much agents get lost.
| Tool | Args | Effect |
|---|---|---|
observe | since, timeout=30s | Long-polling read. Returns events since cursor, or blocks until new events/timeout |
say | text, emotion? | Dialogue bubble above puppet + animalese audio |
move | puppet_id, x, y, duration | Slide puppet on stage |
enter | puppet_id, side (L/R) | Puppet slides in from wing |
exit | puppet_id, side | Puppet slides out |
gesture | puppet_id, type (bow, point, cower, die) | Canned animation |
face | puppet_id, direction (L/R) | Flip sprite |
prop | prop_id, action (show/hide/give/take) | Wall, mulberry tree, sword, mantle |
aside | text | Bubble directed at audience, not other puppets |
narrate | text | Fourth-wall break (Snout and Starveling lean on this) |
Kimi court gets heckle(text, from).
Coordination concerns + mitigations
- Deadlock (everyone waiting for someone): Script provides explicit cues — every line has a next-speaker implied in the text. Quince opens unconditionally with prologue. Bottom’s persona includes “ad-lib if silent too long” as a safety valve.
- Collision (simultaneous acts): Stage MCP serializes by arrival. Second writer sees their event landed after someone else’s and re-evaluates on next observe. Stepping-on is in-genre comedy.
- Cost runaway: Long-polling observe (no tokens burned during wait), short “pass” generations, hard take budget (500 events or 10 min wall-time), curtain event as terminal signal.
Visual system
- Reference: Paper Mario: TTYD — proscenium, curtain, painted backdrop, foreground audience silhouettes, flat-cutout characters in shallow z-stack.
- Tech: Vanilla JS + CSS transforms, absolutely-positioned
<img>cutouts. No Pixi, no WebGL. Crude is the aesthetic. - Idle animation: subtle paper-wobble (2-3° rotation, 4s loop, staggered per puppet) — cheapest trick that sells the papercraft illusion.
- Dialogue bubbles: anchored to puppet position, typewriter character-by-character reveal.
- Audio: animalese.js — Animal Crossing-style per-character blips synced to typewriter reveal. Per-puppet pitch mapping gives each mechanical a distinct vocal texture. No TTS.
- Scene transitions: curtain drop + backdrop swap.
Asset pipeline
Generator: Midjourney. Single consistent style pass (one seed/style reference applied to all generations) for cohesive paper-cutout look.
Style prompt spine: “paper cutout puppet, flat, visible paper texture and edge shadows, hand-painted gouache, Paper Mario Thousand-Year Door aesthetic, [subject] on transparent background”
Asset manifest (~30):
Puppets (~15):
- Quince (neutral, reading scroll, gesturing)
- Bottom (as himself, as Pyramus)
- Flute (as himself, as Thisbe in a dress he hates)
- Snout (as himself, as Wall — cardboard wall held in front)
- Snug (as himself, as Lion — lion mask held up)
- Starveling (as himself, as Moonshine — lantern + dog + thornbush)
Court audience (3): Theseus, Hippolyta, Philostrate — stylized foreground silhouettes.
Props (~6): mulberry tree, sword, Thisbe’s mantle (clean + bloodied), lantern, scroll, curtain.
Backdrops (~3): Athenian great hall, painted forest, curtain down.
All generated in days 1-2.
Scope
Must ship:
- Prologue (Quince mispunctuates)
- Wall scene (Snout over-explains)
- Lion entrance (Snug reassures audience)
- Pyramus’s death (Bottom hams)
- Thisbe’s death (Flute rushes)
- Kimi court heckling throughout
Stretch:
- Rehearsal prelude scene
- Multiple-takes montage in video (showing emergent variation)
Timeline (16 days → 2026-05-03)
| Days | Milestone |
|---|---|
| 1-2 | Asset generation (Midjourney), style lock, manifest complete |
| 3-4 | Stage MCP server (observe long-polling, act tools, event log, WebSocket broadcast) |
| 4-5 | Browser stage: DOM puppets, CSS transitions, bubbles, typewriter, animalese, scene backdrops |
| 6-8 | 7 Hermes Agent configs (HERMES_HOME each), character briefs, Stage MCP wired, single-agent smoke test |
| 8-10 | Full cast runs — prototype on Mistral Small, then heterogeneous models. Iterate character briefs |
| 10-12 | Curate takes. Post-hoc edits to event log if needed |
| 13-14 | Final render, screen capture, edit demo video |
| 15 | Tweet + writeup, open-source repo, buffer |
| 16 | Submit (tweet @NousResearch + Discord post) |
Submission artifact
- Demo video (2-4 min): cold-open mid-performance with court heckling, then reveal the agent system (shared event log visualized, different models labeled per puppet), closing beat on metatheatrical commentary.
- Tweet tagging
@NousResearch, Kimi usage proven on-screen (K2.5 model tag visible in court heckling). - Open-source repo at submission: Stage MCP server + browser stage + 7 character briefs + Hermes configs. Anyone with Hermes Agent installed can reproduce.
Verification
- Replay curated event log → stage renders deterministically, no drift.
- Fresh run → all agents complete one scene without deadlock; at least one take yields full performance through curtain.
- Kimi court heckles at least 3× per scene.
- Screen capture: clean 1080p @ 30fps.
- Kimi K2.5 visible on-screen (Kimi Track eligibility).
Hosting
Local only for hackathon. Future: stage-only replay could go on Railway (stream a curated log, no live agents) — abuse-resistant since there’s nothing to prompt.