2026-05-04 — Square ManagerBot test, Guild boundary, Edelweiss inventory drift, Concourse take-home delivered

Summary

Square launched ManagerBot (agentic merchant assistant in the Seller Dashboard, open beta). First read was that it invalidated parts of the Guild roadmap around inventory and insights. Pushed back on that, then ran a live three-question test on the shop’s data to map the actual capability boundary. Result: Guild not invalidated; ManagerBot becomes free QA + a sharper demo contrast. Side effect: dead-stock query surfaced what looks like Edelweiss Omnibus pushing wrong inventory counts.

What ManagerBot does and doesn’t

Captured fully in managerbot-boundary. Headline:

  • Does: sales/items/customer trends, POs from packing lists, count updates, bulk catalog edits, schedules, email campaigns, labor-vs-revenue analysis, “Pulse” proactive brief. Dashboard-only, no API.
  • Doesn’t: ISBN, publisher, pub date, format, series, distributor, return windows, cost/COGS, membership cohorts, anything customer-facing.

Live test results

Q1: “Items not sold in 180 days, still in stock”

Aced it. 586 items, ranked by units. Spotted clusters. Buried books under sidelines.

Q2: “Of those, which are books? Top 20 with cost value”

Identified books cleanly, parsed authors from item names, spotted Amanda Leigh duplication. Failed honestly on cost: “no access to COGS.” Showed retail 1,100.

Top dead books had high units that look wrong. Triggered the Edelweiss suspicion.

What it missed

Cost (no publisher discount model), series awareness, publisher grouping, pub-date / return-window flag, format-aware action, wrong recommendation (“bundle dead books” — bookstore playbook is return/markdown).

Strategic read

  • Guild not invalidated. Member-facing, ManagerBot is merchant-side.
  • Generic merchant-dashboard products are dead. Pulse occupies that surface. Cut anything framed as “better Square dashboard for owners.”
  • Bookstore-specific inventory/insights still wide open. Square has no domain layer.
  • ManagerBot = free QA oracle for Guild. Any Guild metric must reconcile to ManagerBot on shared primitives. Mismatches mean a Guild bug or a Square data quirk worth knowing.
  • Demo writes itself. Run three questions on a pilot shop’s account: one ManagerBot aces, one partial, one clean failure. Then show Guild’s answer to the failure. See managerbot-boundary for the demo sequence.

Edelweiss Omnibus inventory drift (new)

ManagerBot’s dead-book list had unit counts that look too high. Suspect Omnibus → Square sync is double-counting or adding instead of authoritative. Captured in edelweiss-omnibus-inventory-drift. Needs a physical count on 3-5 suspect titles to confirm direction and mechanism.

This matters beyond hygiene: if Square’s counts are unreliable, Guild’s analytics layer reads wrong data and the entire “bookstore-aware insights on top of Square” pitch breaks on first demo. Inventory authority needs to be settled before Phil pilot.

Things learned

  • “Square just invalidated my roadmap” was wrong on inspection. The reflex panic on big-vendor announcements is rarely the right read — the vendor’s product almost always operates one abstraction level below where the domain product lives.
  • ManagerBot’s failure modes are useful product input, not just competitive intel. The cost gap is the highest-ROI thing Guild can fix immediately (publisher discount schedule + Square cost field → real dollar exposure).
  • Running competitor’s tool on real shop data is faster than reading their docs. Three queries gave a sharper boundary than the help article and the marketing post combined.

Action items

  • (deferred) Physical count on Teo’s Durumi, Malevolent Eight, Revenge Next Door — confirm Edelweiss drift
  • (deferred) Decide inventory authority (Square vs Omnibus) and the sync direction
  • Watch Square dev changelog for ManagerBot API/webhook surface

Reassigned / closed

  • ManagerBot demo capture for Phil pitch — cut
  • Publisher discount schedule data model — scope unclear (Guild vs Emporium). Cost/COGS truth lives in Medusa product records on the Emporium side; Guild only reads it for member-facing analytics. Likely Emporium-owned, pending decision. See emporium / re-inventory-strategy.

Carried over from 2026-05-03

  • Marty PR #27 merged
  • Anthropic prompt caching on Marty system prompt — shipped (PR #32, squash-merged)
  • events.md flipped to publish: true (verified on dungeonbooks/docs main); orders.md still publish: false
  • Discord pinned posts — not now

2026-05-04 evening — Marty Anthropic prompt caching

Summary

Shipped prompt caching on Marty’s Claude calls (PR #32, merged). ~4-5x input-token cost reduction on multi-turn Discord threads, no behavior change, no reliability tradeoff. The cheap-cost move that yesterday’s abandoned Kimi K2.5 migration was supposed to be.

What shipped

  • system= converted from string to a list of two content blocks: (1) base prompt + docs index, (2) customer + time context. Both tagged with cache_control: {"type": "ephemeral"}.
  • Last tool entry tagged with cache_control so the tools block caches too.
  • claude_usage structured log on every Claude call, including cache_creation_input_tokens and cache_read_input_tokens for visibility.

Static prefix is ~3954 tokens (system prompt + inlined dungeonbooks/docs index + 5 tool defs).

Live verification (marty dev, 9-message Discord thread)

Turn 1 (cold):       write 3954, read    0
Turn 2 (same min):   write    0, read 3954
Turn 3 (time tick):  write   73, read 3881   ← block 1 still hits
Turns 4-7:           write    0, read 3954
Turn 8 (time tick):  write   73, read 3881
Turn 9:              write    0, read 3954

Tool calls (get_doc, hardcover_api) still fire correctly. No fabricated policies on the same prompt suite that broke Kimi K2.5 yesterday.

Why two breakpoints, not one

First implementation had a single breakpoint on block 2 (the contextual one). First smoke run showed turn 4 missing entirely (write 3954) when current_time happened to tick to a new minute between turns. With one breakpoint, any change to block 2 busts the whole prefix.

Second pass put a breakpoint on block 1 too. Now when block 2’s current_time changes, block 1 still hits cache (~3881 tokens) and only block 2 (~73 tokens) rewrites. Three breakpoints used (tools + block 1 + block 2), out of max 4.

Lesson: cache breakpoint placement matters more than just “turn it on.” If the tail varies even subtly, breakpoints higher in the prefix preserve the win.

Things learned

  • The cache_creation_input_tokens / cache_read_input_tokens fields in response.usage are the only honest signal for whether caching is actually working. Logged at INFO so cache hit-rate is visible in prod without flipping levels.
  • Test assertions against system as a string break when you switch to content blocks. One-line shim — "\n\n".join(b["text"] for b in system) — keeps existing substring checks working.
  • The 5-min TTL covers Discord help-channel cadence by design. Customers asking follow-up questions in a thread re-use the cache. Cold opens always pay the write cost (1.25x base) but read price (~10% of base) covers it after the first reuse.
  • Pre-existing Redis/DB test failures on main cleared once I confirmed they reproduce on main without my changes. Saves time vs. trying to fix them inside an unrelated PR.

Action items

  • Apply same caching pattern to tools/utils/query_optimizer.py if its system prompt grows past 1024 tokens (currently smaller, deferred)
  • Extract LLM_MODEL constant — model name still hardcoded at three call sites + query_optimizer
  • Watch billing dashboard tomorrow to confirm input-token cost actually drops on real Discord traffic

Cross-references

  • 2026-05-03 — Kimi K2.5 abandonment that made this the obvious next move
  • dungeonbooks/marty PR #32

2026-05-04 late — Concourse take-home delivered

Summary

Shipped the Concourse take-home. Two click-through prototypes (Win32-styled MSI installer, fleet health dashboard with five mocked tenants), Hugo site at the repo root, Cloudflare Workers Static Assets deploy, Cloudflare Access gate for @concoursetech.com. Sent the gated URL to Rapolas. Started building Friday afternoon, finished Monday evening.

What shipped

  • Installer prototype: single-page Win32-styled MSI wizard with five visible states cycled by an “inject error” demo bar. Copy and error codes (E-01 through E-07) verbatim from architecture section 6.1. Pre-flight failures route to testing pane, install-time failures route to installing pane.
  • Fleet Health Dashboard prototype: per-tenant table plus drill-in for Brooklyn. Five mocked NY/NJ libraries spanning the alert spectrum (NYPL red on no-heartbeat, Brooklyn yellow on sync lag, Queens yellow on cert expiry, Jersey City + Hoboken green). Inline SVG sparklines, no chart library. Seeded mulberry32 PRNG so the snapshot is stable across reloads.
  • Hugo site at repo root using Hugo Book theme as a submodule. Module mounts pull existing root-level markdown into Hugo’s content tree without moving files.
  • Cloudflare Workers Builds: package.json with hugo-extended (downloads the binary on install), postinstall hook for submodule init, wrangler.jsonc pinning the build command. Workers Builds doesn’t recurse submodules by default and Wrangler’s auto-detected npx hugo doesn’t work without the package.
  • Cloudflare Access gate: PIN-to-email, allows specific emails plus anyone with @concoursetech.com.

Things learned

  • For Workers Static Assets on *.workers.dev, Access at the edge is the security boundary. JWT validation in the worker is defense-in-depth for Tunnels or external-origin setups, not for static-only workers.dev hosting. Wasted ~30 min wiring jose before stopping to think about where the actual attack surface is.
  • Hugo’s BookPortableLinks resolves [architecture.md](architecture.md) to /architecture/ on the rendered site while leaving the link valid on raw GitHub view. Setting it to true (boolean) instead of 'warning' keeps resolution on while suppressing missing-page warnings on static asset paths.
  • git filter-branch --tree-filter "sed -i ...; true" is the right tool for blanket phrase replacement across history. Author and committer dates preserve by default. After main is rewritten, git rebase --onto new-main old-main feature-branch moves stacked work cleanly.
  • Rewrite history instead of adding a follow-up commit when fixing AI tells in prose. The follow-up commit advertises that the earlier draft was wrong, which is exactly what take-home reviewers should not see. Cherry-pick + overwrite + amend with original metadata is the pattern.
  • Copilot PR review on gh repos catches real things: hardcoded snapshot timestamp instead of reading from the shared dataset, Math.random() making mock data non-deterministic on refresh, absolute paths breaking GitHub source view. Useful even with no human reviewer in the loop.
  • Hugo’s locale config key replaced languageCode in 0.158+. Copilot flagged the change as wrong; the deprecation warning on languageCode was the rebuttal. Worth verifying tooling complaints against actual tool behavior before applying the suggested “fix.”

What got cut along the way

  • JWT validation worker (started, killed before commit): would have added a runtime-validation step for the Access JWT. Unnecessary for *.workers.dev.
  • PDFs of the markdown: Hugo’s print-to-PDF is sufficient if needed; not building a separate compile pipeline.
  • A landing page at prototypes/index.html (committed and immediately reverted): Hugo owns the homepage now.

Action items

  • Wait for Rapolas’s response. He may send a GitHub username to review source; have gh api invite ready.
  • Debrief prep on call day: re-read the four docs and write up the three decisions most likely to get pushed on (single-node OpenBao for pilot, OpenBao consolidation vs cloud-managed KMS, Railway compliance gaps for federal customers).

2026-05-04 cross-references