2026-05-04 — Square ManagerBot test, Guild boundary, Edelweiss inventory drift, Concourse take-home delivered
Summary
Square launched ManagerBot (agentic merchant assistant in the Seller Dashboard, open beta). First read was that it invalidated parts of the Guild roadmap around inventory and insights. Pushed back on that, then ran a live three-question test on the shop’s data to map the actual capability boundary. Result: Guild not invalidated; ManagerBot becomes free QA + a sharper demo contrast. Side effect: dead-stock query surfaced what looks like Edelweiss Omnibus pushing wrong inventory counts.
What ManagerBot does and doesn’t
Captured fully in managerbot-boundary. Headline:
- Does: sales/items/customer trends, POs from packing lists, count updates, bulk catalog edits, schedules, email campaigns, labor-vs-revenue analysis, “Pulse” proactive brief. Dashboard-only, no API.
- Doesn’t: ISBN, publisher, pub date, format, series, distributor, return windows, cost/COGS, membership cohorts, anything customer-facing.
Live test results
Q1: “Items not sold in 180 days, still in stock”
Aced it. 586 items, ranked by units. Spotted clusters. Buried books under sidelines.
Q2: “Of those, which are books? Top 20 with cost value”
Identified books cleanly, parsed authors from item names, spotted Amanda Leigh duplication. Failed honestly on cost: “no access to COGS.” Showed retail 1,100.
Top dead books had high units that look wrong. Triggered the Edelweiss suspicion.
What it missed
Cost (no publisher discount model), series awareness, publisher grouping, pub-date / return-window flag, format-aware action, wrong recommendation (“bundle dead books” — bookstore playbook is return/markdown).
Strategic read
- Guild not invalidated. Member-facing, ManagerBot is merchant-side.
- Generic merchant-dashboard products are dead. Pulse occupies that surface. Cut anything framed as “better Square dashboard for owners.”
- Bookstore-specific inventory/insights still wide open. Square has no domain layer.
- ManagerBot = free QA oracle for Guild. Any Guild metric must reconcile to ManagerBot on shared primitives. Mismatches mean a Guild bug or a Square data quirk worth knowing.
- Demo writes itself. Run three questions on a pilot shop’s account: one ManagerBot aces, one partial, one clean failure. Then show Guild’s answer to the failure. See managerbot-boundary for the demo sequence.
Edelweiss Omnibus inventory drift (new)
ManagerBot’s dead-book list had unit counts that look too high. Suspect Omnibus → Square sync is double-counting or adding instead of authoritative. Captured in edelweiss-omnibus-inventory-drift. Needs a physical count on 3-5 suspect titles to confirm direction and mechanism.
This matters beyond hygiene: if Square’s counts are unreliable, Guild’s analytics layer reads wrong data and the entire “bookstore-aware insights on top of Square” pitch breaks on first demo. Inventory authority needs to be settled before Phil pilot.
Things learned
- “Square just invalidated my roadmap” was wrong on inspection. The reflex panic on big-vendor announcements is rarely the right read — the vendor’s product almost always operates one abstraction level below where the domain product lives.
- ManagerBot’s failure modes are useful product input, not just competitive intel. The cost gap is the highest-ROI thing Guild can fix immediately (publisher discount schedule + Square cost field → real dollar exposure).
- Running competitor’s tool on real shop data is faster than reading their docs. Three queries gave a sharper boundary than the help article and the marketing post combined.
Action items
- (deferred) Physical count on Teo’s Durumi, Malevolent Eight, Revenge Next Door — confirm Edelweiss drift
- (deferred) Decide inventory authority (Square vs Omnibus) and the sync direction
- Watch Square dev changelog for ManagerBot API/webhook surface
Reassigned / closed
- ManagerBot demo capture for Phil pitch — cut
- Publisher discount schedule data model — scope unclear (Guild vs Emporium). Cost/COGS truth lives in Medusa product records on the Emporium side; Guild only reads it for member-facing analytics. Likely Emporium-owned, pending decision. See emporium / re-inventory-strategy.
Carried over from 2026-05-03
- Marty PR #27 merged
- Anthropic prompt caching on Marty system prompt — shipped (PR #32, squash-merged)
-
events.mdflipped topublish: true(verified ondungeonbooks/docsmain);orders.mdstillpublish: false - Discord pinned posts — not now
2026-05-04 evening — Marty Anthropic prompt caching
Summary
Shipped prompt caching on Marty’s Claude calls (PR #32, merged). ~4-5x input-token cost reduction on multi-turn Discord threads, no behavior change, no reliability tradeoff. The cheap-cost move that yesterday’s abandoned Kimi K2.5 migration was supposed to be.
What shipped
system=converted from string to a list of two content blocks: (1) base prompt + docs index, (2) customer + time context. Both tagged withcache_control: {"type": "ephemeral"}.- Last tool entry tagged with
cache_controlso the tools block caches too. claude_usagestructured log on every Claude call, includingcache_creation_input_tokensandcache_read_input_tokensfor visibility.
Static prefix is ~3954 tokens (system prompt + inlined dungeonbooks/docs index + 5 tool defs).
Live verification (marty dev, 9-message Discord thread)
Turn 1 (cold): write 3954, read 0
Turn 2 (same min): write 0, read 3954
Turn 3 (time tick): write 73, read 3881 ← block 1 still hits
Turns 4-7: write 0, read 3954
Turn 8 (time tick): write 73, read 3881
Turn 9: write 0, read 3954
Tool calls (get_doc, hardcover_api) still fire correctly. No fabricated policies on the same prompt suite that broke Kimi K2.5 yesterday.
Why two breakpoints, not one
First implementation had a single breakpoint on block 2 (the contextual one). First smoke run showed turn 4 missing entirely (write 3954) when current_time happened to tick to a new minute between turns. With one breakpoint, any change to block 2 busts the whole prefix.
Second pass put a breakpoint on block 1 too. Now when block 2’s current_time changes, block 1 still hits cache (~3881 tokens) and only block 2 (~73 tokens) rewrites. Three breakpoints used (tools + block 1 + block 2), out of max 4.
Lesson: cache breakpoint placement matters more than just “turn it on.” If the tail varies even subtly, breakpoints higher in the prefix preserve the win.
Things learned
- The
cache_creation_input_tokens/cache_read_input_tokensfields inresponse.usageare the only honest signal for whether caching is actually working. Logged at INFO so cache hit-rate is visible in prod without flipping levels. - Test assertions against
systemas a string break when you switch to content blocks. One-line shim —"\n\n".join(b["text"] for b in system)— keeps existing substring checks working. - The 5-min TTL covers Discord help-channel cadence by design. Customers asking follow-up questions in a thread re-use the cache. Cold opens always pay the write cost (1.25x base) but read price (~10% of base) covers it after the first reuse.
- Pre-existing Redis/DB test failures on
maincleared once I confirmed they reproduce on main without my changes. Saves time vs. trying to fix them inside an unrelated PR.
Action items
- Apply same caching pattern to
tools/utils/query_optimizer.pyif its system prompt grows past 1024 tokens (currently smaller, deferred) - Extract
LLM_MODELconstant — model name still hardcoded at three call sites + query_optimizer - Watch billing dashboard tomorrow to confirm input-token cost actually drops on real Discord traffic
Cross-references
- 2026-05-03 — Kimi K2.5 abandonment that made this the obvious next move
dungeonbooks/martyPR #32
2026-05-04 late — Concourse take-home delivered
Summary
Shipped the Concourse take-home. Two click-through prototypes (Win32-styled MSI installer, fleet health dashboard with five mocked tenants), Hugo site at the repo root, Cloudflare Workers Static Assets deploy, Cloudflare Access gate for @concoursetech.com. Sent the gated URL to Rapolas. Started building Friday afternoon, finished Monday evening.
What shipped
- Installer prototype: single-page Win32-styled MSI wizard with five visible states cycled by an “inject error” demo bar. Copy and error codes (E-01 through E-07) verbatim from architecture section 6.1. Pre-flight failures route to testing pane, install-time failures route to installing pane.
- Fleet Health Dashboard prototype: per-tenant table plus drill-in for Brooklyn. Five mocked NY/NJ libraries spanning the alert spectrum (NYPL red on no-heartbeat, Brooklyn yellow on sync lag, Queens yellow on cert expiry, Jersey City + Hoboken green). Inline SVG sparklines, no chart library. Seeded mulberry32 PRNG so the snapshot is stable across reloads.
- Hugo site at repo root using Hugo Book theme as a submodule. Module mounts pull existing root-level markdown into Hugo’s content tree without moving files.
- Cloudflare Workers Builds: package.json with
hugo-extended(downloads the binary on install), postinstall hook for submodule init, wrangler.jsonc pinning the build command. Workers Builds doesn’t recurse submodules by default and Wrangler’s auto-detectednpx hugodoesn’t work without the package. - Cloudflare Access gate: PIN-to-email, allows specific emails plus anyone with
@concoursetech.com.
Things learned
- For Workers Static Assets on
*.workers.dev, Access at the edge is the security boundary. JWT validation in the worker is defense-in-depth for Tunnels or external-origin setups, not for static-only workers.dev hosting. Wasted ~30 min wiring jose before stopping to think about where the actual attack surface is. - Hugo’s
BookPortableLinksresolves[architecture.md](architecture.md)to/architecture/on the rendered site while leaving the link valid on raw GitHub view. Setting it totrue(boolean) instead of'warning'keeps resolution on while suppressing missing-page warnings on static asset paths. git filter-branch --tree-filter "sed -i ...; true"is the right tool for blanket phrase replacement across history. Author and committer dates preserve by default. After main is rewritten,git rebase --onto new-main old-main feature-branchmoves stacked work cleanly.- Rewrite history instead of adding a follow-up commit when fixing AI tells in prose. The follow-up commit advertises that the earlier draft was wrong, which is exactly what take-home reviewers should not see. Cherry-pick + overwrite + amend with original metadata is the pattern.
- Copilot PR review on
ghrepos catches real things: hardcoded snapshot timestamp instead of reading from the shared dataset,Math.random()making mock data non-deterministic on refresh, absolute paths breaking GitHub source view. Useful even with no human reviewer in the loop. - Hugo’s
localeconfig key replacedlanguageCodein 0.158+. Copilot flagged the change as wrong; the deprecation warning onlanguageCodewas the rebuttal. Worth verifying tooling complaints against actual tool behavior before applying the suggested “fix.”
What got cut along the way
- JWT validation worker (started, killed before commit): would have added a runtime-validation step for the Access JWT. Unnecessary for
*.workers.dev. - PDFs of the markdown: Hugo’s print-to-PDF is sufficient if needed; not building a separate compile pipeline.
- A landing page at
prototypes/index.html(committed and immediately reverted): Hugo owns the homepage now.
Action items
- Wait for Rapolas’s response. He may send a GitHub username to review source; have
gh apiinvite ready. - Debrief prep on call day: re-read the four docs and write up the three decisions most likely to get pushed on (single-node OpenBao for pilot, OpenBao consolidation vs cloud-managed KMS, Railway compliance gaps for federal customers).
2026-05-04 cross-references
- managerbot-boundary — full capability map
- edelweiss-omnibus-inventory-drift — the new operational issue
- guild
- square-loyalty-gap-analysis
- square-shared-inventory-pain
- phil-victory-point — pilot demo sequence applies here
ptaranat/concourse-takehome(private repo)- https://concourse-takehome.panat-taranat.workers.dev/