2026-05-18 — Emporium on Railway, Brief 1.7 shipped

Summary

Got medusa + payload + storefront deployed to Railway staging. Brief 1.7 done. Single branch chore/railway-staging-deploy, PR #14. Along the way: bumped Medusa 2.13 → 2.15, attempted a swap to the @rokmohar/medusa-plugin-meilisearch plugin, reverted that swap after weighing single-maintainer risk against ~300 LoC saved, generated Payload migrations, fixed a handful of deploy-time landmines, added category-breadcrumb support to the storefront’s search dropdown.

Staging URLs:

Storefront: https://storefront-staging-6366.up.railway.app/
Medusa admin: https://medusa-staging-7818.up.railway.app/app
Payload admin: https://payload-staging.up.railway.app/admin

What actually happened

Started simple — pnpm dev boots everything locally, build commands existed, install pnpm deps + filter to the package, point at Railway, ship. Should have been a couple hours.

Turned into a long evening because Medusa’s production deploy pattern is not “run medusa start from source dir.” medusa build emits a self-contained npm package at backend/.medusa/server/ — its own package.json, its own deps, the admin bundle pre-compiled at public/admin/. Production is supposed to cd into that and npm install from scratch. Took two failed deploys to surface the “Could not find index.html” error and another to find the docs page that explained the bundle pattern.

Then npm install in the bundle timed out at 4 minutes (Railway’s pre-deploy timeout). Switched to pnpm there — same install, 16 seconds. pnpm’s CAS store + hardlink-based install vs npm’s copy-everything is a real production difference, not just a dev nicety. Moved the install into the build phase so subsequent deploys cache.

Meilisearch ESM was the other unblock. Custom module was importing Meilisearch from a v0.57 ESM-only package; CJS backend can’t require() ESM. medusa develop (swc/ts-node) tolerated it, medusa build (tsc) did not. Bug hidden behind dev-mode forgiveness for months.

The plugin detour (and revert)

When the ESM build error hit, I framed “switch to the @rokmohar/medusa-plugin-meilisearch plugin” as the forward path. Sold it as “stop maintaining our own glue, plugin has broader event coverage (variant, tag, collection, category upserts), 300 LoC deleted, MIT, actively maintained — released literally today.”

What I undersold:

The plugin pin to @tanstack/react-query exactly 5.64.2 (we have ^5.96.1) wasn’t just a peer warning, it was a maintenance commitment to stay on an old version.
Plugin’s engines.node is >=22; our backend declares >=20. Copilot caught this — would have broken on a Node-20 deploy.
Single maintainer. MIT in lockfile mitigates, but rewriting after abandonment is more expensive than writing now.
“300 LoC saved” was overstated — we still maintain config (fields, searchable attrs, etc.). Trade code for config-against-a-black-box. Net positive but less clean than pitched.
Lost the admin “Sync All Products” UI we’d built.

Then on May 11 the TanStack credential compromise happened. Adding pressure to pin to a specific older version of a TanStack package became actively unattractive. Reverted the plugin entirely in cfc5990. Forward fix: dynamic-import in service.ts (await import("meilisearch") inside a lazy getter; type via import("meilisearch", { with: { "resolution-mode": "import" }}).Meilisearch).

Improvements over the original module while we were in there:

retrieveFromIndex uses a proper type predicate instead of .filter(Boolean) that didn’t narrow nulls.
updateSettings() now actually runs — called at the top of syncProductsStep. The original had the method but never called it, so the index inherited Meilisearch defaults (every field searchable, which is the noise we kept tripping on).
Sync workflow query adds categories.parent_category_id, so the storefront’s buildCategoryChain helper can render breadcrumb order (“Books / Fantasy” not “Fantasy / Books”).
Searchable attrs tightened to title, subtitle, description, variants.sku, metadata.{author,publisher,series}. No more metadata blob (so “en” stopped matching every book) and no categories/tags in search (they’re facets).
meilisearch JS SDK bumped to ^0.58.

The Medusa 2.15 admin react-query trap

After restoring the admin Meilisearch sync page, it crashed at render with “No QueryClient set, use QueryClientProvider to set one.” The page used useMutation directly from @tanstack/react-query. Medusa 2.15’s admin shell has its own isolated QueryClient instance — custom routes can’t see it via direct react-query imports. Replaced with plain useState + async handler. Could probably use Medusa’s admin SDK hooks (whatever they expose) but the plain version is simpler and dependency-free.

The monorepo question

User asked mid-session: is the monorepo still worth it? Apps don’t share code. No internal packages. No cross-app refactors. Each deploy fights pnpm + nixpacks + Medusa’s bundle layout. Three repos would lose almost nothing day-to-day.

Park it. Deploy pain is mostly one-time. The cost of splitting (three CI configs, three Dependabots, lost atomic commits when we actually want them) outweighs the recurring annoyance once Railway’s wired.

Database naming sloppy on staging

Postgres has railway (medusa data) and payload. Tried to rename railway → medusa mid-session, got stuck (POSTGRES_DB is init-only), reverted. Decided: prod gets named databases (medusa, payload) + manually-configured env vars (MEDUSA_DATABASE_URL, PAYLOAD_DATABASE_URL) from day one. Staging stays sloppy. Works, just ugly.

Meilisearch image bump (v1.12 → v1.44) and the storage incompat

User bumped the Meilisearch image to v1.44 (latest, released that day). Container immediately crash-looped on “Your database version (1.12.8) is incompatible with your current engine version (1.44.0).” Predicted but underweighted by me; 32-minor-version leap on a service with a persistent volume was the higher risk than I sold it. Wiped the volume (data is derivative — Medusa is source of truth), boots clean on v1.44, hit Sync All Products from admin, reindexed in seconds.

Meilisearch memory

10GB RSS in the Railway metric. Not a leak — Meilisearch starts one actix worker per CPU core, and Railway’s host shows 48 vCPUs to the container. 48 workers × per-worker overhead + LMDB mmap window = the number you see. Set MEILI_MAX_INDEXING_MEMORY=512Mb and MEILI_MAX_INDEXING_THREADS=2 as guardrails — throttles indexing bursts, doesn’t cap worker count (no env var for that). Volume attached so Railway won’t let us cap CPU/memory at service level. For prod: Meilisearch Cloud or a properly sized self-host. Railway with 48 cores visible to the process is not the right home long-term.

Things I want to remember

Medusa production pattern: cd .medusa/server && pnpm install --prod && pnpm run db:migrate && pnpm run start. The bundle is a flat npm package. Move install to build phase, only migrate in pre-deploy. Pre-Deploy runs while old container still serves; if migrations fail the deploy aborts and prod stays up. Start runs after the swap. Do not put heavy install or migrations in Start.
MEDUSA_BACKEND_URL is baked into the admin bundle at build time. Missing it = admin tries to fetch from localhost in production. “Failed to fetch” with localhost in DevTools Network tab is the tell.
NEXT_PUBLIC_* is baked at build time, full stop. Changing them needs a redeploy.
Payload’s push: true is dev-only. Production needs payload migrate:create (424-line schema file in our case), committed, then payload migrate in pre-deploy.
Static prerender + useSearchParams() = build error in Next 16. <Suspense> or export const dynamic = 'force-dynamic'.
Search facets ≠ search targets. Categories and tags belong in filterableAttributes, not searchableAttributes.
Medusa 2.15 admin doesn’t share its QueryClient. Custom admin routes can’t use @tanstack/react-query directly. Use plain state or Medusa’s admin SDK hooks.
Meilisearch storage isn’t version-portable across many minors. Plan version bumps with a dump/restore (or wipe-and-reindex if data is derivative).
Don’t paste live DB URLs into chat. Did. Need to rotate POSTGRES_PASSWORD.
Don’t put PR slugs in commit messages, and don’t @org in PR comments. Commits travel beyond PRs (cherry-picks, mirrors); PR-level state goes stale. Bare @org pings the real GitHub org. Backtick-wrap package names like `@tanstack/react-query` to neutralize.
The “we get more event coverage” pitch for a plugin doesn’t pay off if we don’t yet have the data shape that needs that coverage. Plugin’s variant/tag/collection subscribers are nice in theory; in practice we ship with 15 products in one branch per category and don’t exercise any of those code paths. Justify dependencies by current needs, not theoretical future ones.

Decision frame for production

After this PR merges, prod prep is the next milestone. Two things gating it:

CI/CD setup. No CI on this repo yet. Need at minimum: typecheck, lint, build all three apps on PR. Probably GitHub Actions (default since we’re on GitHub already). Should also run pnpm --filter backend build (catches the meilisearch ESM-style issues that medusa develop hides) and pnpm --filter payload exec payload migrate smoke test against an ephemeral Postgres. Storefront: next build.
Prod environment. Separate Railway project (emporium-prod), separate Postgres (named medusa and payload from the start), separate Stripe live keys, separate Meilisearch — and seriously consider Meilisearch Cloud over Railway self-host given the 48-vCPU visibility issue. Plan Postgres password rotation and env-var management cleanly from the start (no chat leakage this time).

Not prod-ready until both are in place. Don’t skip CI to ship faster — every fix in this PR was something pnpm --filter backend build would have caught on PR if CI had been running.

Action items (status as of end-of-day)

Merge PR #14 to main (fc9f3ad squash on main)
Switch all three Railway services from chore/railway-staging-deploy back to main after merge
Rotate POSTGRES_PASSWORD on staging Postgres (leaked in chat during debugging)
PR #15 opened for oxlint/oxfmt swap; awaiting merge
Set up GitHub Actions CI — plan written today, see below
Plan prod environment — separate Railway project with named DBs from day one, manually-configured env vars (MEDUSA_DATABASE_URL, PAYLOAD_DATABASE_URL), Meilisearch Cloud or self-host with sized CPU/memory, Stripe live mode keys
Populate Payload content (Footer global, static pages) for Carrie’s review — Brief 4.2
Pick next feature brief in parallel — Phase 2 catch-up (homepage, staff picks/reading lists pages, events pages, wishlist, gift cards) or jump to Square sync per extended-catalog-and-preorders

Evening session — lint/format toolchain swap (eslint/prettier → oxlint/oxfmt)

After staging was up and PR #14 was through Copilot’s three review passes, jumped into the lint/format toolchain. Branch chore/lint-format-setup, PR #15.

What got done

Dropped eslint + every plugin/config in storefront, eslint + prettier in payload. ~74 packages gone from the lockfile.
Workspace-root .oxlintrc.json and .oxfmtrc.json. Root scripts: lint, lint:fix, format, format:check.
oxlint runs in 350ms across 487 files. oxfmt runs in 700ms across 530. Subjectively unmissable difference vs eslint+prettier.
524-file reformat to oxfmt defaults: semis, double quotes, tighter line packing. Solace’s no-semi/single-quote style is gone — easy to restore via .oxfmtrc.json if anyone hates it.
Cleared all errors (a11y on click-without-keyboard, missing useEffect deps, a real <img> vs next/image case) and all warnings (down from 467 → 0) via a mix of rule scoping and source fixes.

Real bugs oxlint caught that eslint had been silent on

storefront/src/lib/data/customer.ts: is_default_shipping: formData.get("...") === "on" || "true" ? true : false was unconditionally true because the string "true" is truthy and short-circuits the OR. Addresses were saved as default-shipping regardless of the checkbox state. Live in main until today. oxc/const-comparisons + no-unneeded-ternary surfaced it.
side-menu/index.tsx: redundant item.icon && item.icon short-circuit. Dead code, caught by oxc/const-comparisons.
cart.ts: dead inner const authHeaders redeclaration shadowing the outer in the region-update branch. no-shadow.

That validates the oxlint swap on its own: a strict linter caught a live production bug in the first pass after install.

What got not fixed (tuned-out warnings)

Inherited Solace code surfaces 185 react-perf hits (inline arrows in JSX props), 72 no-explicit-any, 38 no-array-index-key, etc. Refactoring all of those during a tooling-swap PR is the wrong shape. Turned the noisy categories off via config; they’re not deleted from oxlint, just not failing the build. Ratchet later.

CLAUDE.md does say “avoid inline object literals in JSX props” — so we do care about the react-perf rules conceptually. Honest plan: leave them off for inherited code, turn them on with a files: override scoped to new code, and clean up Solace components when we touch them for the rebrand.

Lessons worth pinning

CI absence is now expensive to me, not abstractly. Every fix in PR #14 was something pnpm --filter <pkg> build would have caught at PR time. PR #15 caught a live bug that should have been caught at PR time too. Next PR is CI.
524-file reformats happen. Once. Make peace with the diff. They look terrifying but the actual changes are mechanical and reviewer-time spent on them is wasted. Squash-merge is the right call.
InstanceType<typeof import("x", { with: ... }).Class> is the safe TS type-import-from-ESM pattern. The direct import(...).Member form broke between branches; the InstanceType wrapper is equivalent and more portable.
Tune rules to match your codebase reality, not the linter’s defaults. Custom widget patterns (Radix radios, button-styled checkboxes, span role=link breadcrumbs) all use ARIA roles intentionally. Blanket prefer-tag-over-role is wrong for any codebase with headless UI; turn it off.
Don’t put PR slugs in commit messages, don’t @org in PR comments. Saved this as a feedback memory after I did both during the staging deploy session.

CI + tests plan

Wrote [[../plans/ci-and-tests-plan]] tonight. TL;DR:

4 PRs: scaffolding (lint/format-check/typecheck/build) → backend integration tests → payload tests → storefront e2e.
GH Actions, service containers for postgres/redis/meili, docker-compose for storefront e2e.
Skip turbo. 3 packages with no inter-package code deps doesn’t unlock turbo’s value. pnpm + actions/cache keyed by lockfile hash gets ~80% of turbo’s win.
Critical paths to test: search dropdown, add-to-cart → payment intent render, account login, category page render, JSON-LD on PDP. Plus backend workflow tests for sync-products, plugin reindex (the race-condition fix needs a guard), Stripe webhook signature verify.

Tomorrow (2026-05-19)

Pick up by:

Merging PR #15 (oxlint/oxfmt swap) — 11k diff but it’s the format pass, can’t usefully be reviewed line-by-line
Switching all three Railway services back to main
Starting the CI scaffolding PR (the first of the four PRs in the plan)
Rotating the staging POSTGRES_PASSWORD

Image storage decision (deferred, captured tonight)

Staging product thumbnails currently flash white/grey because Medusa has no file provider configured. Falls back to local container disk — uploaded files land at https://medusa-staging-7818.up.railway.app/static/<filename> and get wiped on every backend redeploy. The add-product-thumbnails.js script can re-upload them in seconds, so day-to-day fine for staging, just ugly until first paint.

Long-term: Cloudflare R2 + Cloudflare CDN. Already templated in .env.example (S3_FILE_URL, S3_ACCESS_KEY_ID, S3_BUCKET, S3_ENDPOINT), just not wired in medusa-config.ts and not set on staging. R2 wins over Railway buckets because:

Zero egress fees. Big deal for an image-heavy product catalog
$0.015/GB/mo storage
Cloudflare CDN free in front, global edge cache
Image Resizing add-on can serve appropriately-sized thumbnails per device on the fly
Industry standard, portable if we ever leave Railway

Rough scope of a “wire R2” PR:

Create R2 bucket + scoped API token in Cloudflare dashboard
pnpm --filter backend add @medusajs/file-s3 (or whichever Medusa v2 S3 file provider is current)
Wire into medusa-config.ts modules
Set S3_* env vars on local + medusa-staging
Custom domain images.dungeonbooks.com → R2 bucket via Cloudflare (optional but clean)
Re-run add-product-thumbnails.js on staging — now persists across deploys

Not blocking anything; do it before prod cutover so we don’t migrate live data.

Seed data drift / content reconciliation

Started poking at this end-of-day, then shipped the category piece tonight.

Drift surface found by querying the local DB vs reading seed.ts:

Categories drifted via admin UI, never reflected back into seed.ts. Local DB had “Comics” and “Gifts” as top-level (good); seed source still said “Comics & Graphic Novels” and “Stationery & Gifts”. Also a redundant soft-deleted “Comics” sub-category under the Comics parent (same handle as parent → URL collision risk).
Other drift items worth a separate pass when Carrie’s available:
- Product metadata corrections, pricing fixes, new titles
- Payload globals / pages still mostly empty on staging (Footer, Homepage, Announcements, Pages)
- Bookshop.org / Libro.fm affiliate URLs need to come from Carrie

Shipped (chore/reconcile-categories branch):

seed.ts updated to the new taxonomy:
- Top: Books, RPG, Comics, Gifts, Zines, Board Games
- Books subs: Sci-Fi, Fantasy, Horror, Literary Fiction, Non-Fiction
- RPG subs: D&D, Pathfinder, OSR, Other Systems, Accessories
- Comics subs: Graphic Novels, Manga (dropped the redundant “Comics” sub)
One-shot idempotent script reconcile-categories.ts. Renames if old name present, deletes redundant Comics sub if exists with 0 products, emits meilisearch.sync at the end. Verified locally — no-ops cleanly against the already-reconciled local DB.
PR not opened yet (waiting on CI scaffolding to land first so this gets a clean check).

Staging needs the script run after merging:

cd /app/backend/.medusa/server && pnpm exec medusa exec ./src/scripts/reconcile-categories.js

The deferred content reconciliation (product fixes, Payload paste, affiliate URLs) is a Carrie-coordination job — capturing as a follow-up brief, not blocking.

Cross-references

task-briefs — Brief 1.7 done; Phase 2 / Phase 3 / Phase 4 ahead
roadmap — Phase 1 effectively complete
platform-eval-medusa-official-audit — for the .medusa/server production bundle pattern
ci-and-tests-plan — written tonight, drives 2026-05-19 onward

Quartz 4

Explorer

2026-05-18