2026-05-18 — Emporium on Railway, Brief 1.7 shipped
Summary
Got medusa + payload + storefront deployed to Railway staging. Brief 1.7 done. Single branch chore/railway-staging-deploy, PR #14. Along the way: bumped Medusa 2.13 → 2.15, attempted a swap to the @rokmohar/medusa-plugin-meilisearch plugin, reverted that swap after weighing single-maintainer risk against ~300 LoC saved, generated Payload migrations, fixed a handful of deploy-time landmines, added category-breadcrumb support to the storefront’s search dropdown.
Staging URLs:
- Storefront: https://storefront-staging-6366.up.railway.app/
- Medusa admin: https://medusa-staging-7818.up.railway.app/app
- Payload admin: https://payload-staging.up.railway.app/admin
What actually happened
Started simple — pnpm dev boots everything locally, build commands existed, install pnpm deps + filter to the package, point at Railway, ship. Should have been a couple hours.
Turned into a long evening because Medusa’s production deploy pattern is not “run medusa start from source dir.” medusa build emits a self-contained npm package at backend/.medusa/server/ — its own package.json, its own deps, the admin bundle pre-compiled at public/admin/. Production is supposed to cd into that and npm install from scratch. Took two failed deploys to surface the “Could not find index.html” error and another to find the docs page that explained the bundle pattern.
Then npm install in the bundle timed out at 4 minutes (Railway’s pre-deploy timeout). Switched to pnpm there — same install, 16 seconds. pnpm’s CAS store + hardlink-based install vs npm’s copy-everything is a real production difference, not just a dev nicety. Moved the install into the build phase so subsequent deploys cache.
Meilisearch ESM was the other unblock. Custom module was importing Meilisearch from a v0.57 ESM-only package; CJS backend can’t require() ESM. medusa develop (swc/ts-node) tolerated it, medusa build (tsc) did not. Bug hidden behind dev-mode forgiveness for months.
The plugin detour (and revert)
When the ESM build error hit, I framed “switch to the @rokmohar/medusa-plugin-meilisearch plugin” as the forward path. Sold it as “stop maintaining our own glue, plugin has broader event coverage (variant, tag, collection, category upserts), 300 LoC deleted, MIT, actively maintained — released literally today.”
What I undersold:
- The plugin pin to
@tanstack/react-queryexactly 5.64.2 (we have ^5.96.1) wasn’t just a peer warning, it was a maintenance commitment to stay on an old version. - Plugin’s
engines.nodeis>=22; our backend declares>=20. Copilot caught this — would have broken on a Node-20 deploy. - Single maintainer. MIT in lockfile mitigates, but rewriting after abandonment is more expensive than writing now.
- “300 LoC saved” was overstated — we still maintain config (
fields, searchable attrs, etc.). Trade code for config-against-a-black-box. Net positive but less clean than pitched. - Lost the admin “Sync All Products” UI we’d built.
Then on May 11 the TanStack credential compromise happened. Adding pressure to pin to a specific older version of a TanStack package became actively unattractive. Reverted the plugin entirely in cfc5990. Forward fix: dynamic-import in service.ts (await import("meilisearch") inside a lazy getter; type via import("meilisearch", { with: { "resolution-mode": "import" }}).Meilisearch).
Improvements over the original module while we were in there:
retrieveFromIndexuses a proper type predicate instead of.filter(Boolean)that didn’t narrow nulls.updateSettings()now actually runs — called at the top ofsyncProductsStep. The original had the method but never called it, so the index inherited Meilisearch defaults (every field searchable, which is the noise we kept tripping on).- Sync workflow query adds
categories.parent_category_id, so the storefront’sbuildCategoryChainhelper can render breadcrumb order (“Books / Fantasy” not “Fantasy / Books”). - Searchable attrs tightened to
title,subtitle,description,variants.sku,metadata.{author,publisher,series}. No moremetadatablob (so “en” stopped matching every book) and no categories/tags in search (they’re facets). meilisearchJS SDK bumped to ^0.58.
The Medusa 2.15 admin react-query trap
After restoring the admin Meilisearch sync page, it crashed at render with “No QueryClient set, use QueryClientProvider to set one.” The page used useMutation directly from @tanstack/react-query. Medusa 2.15’s admin shell has its own isolated QueryClient instance — custom routes can’t see it via direct react-query imports. Replaced with plain useState + async handler. Could probably use Medusa’s admin SDK hooks (whatever they expose) but the plain version is simpler and dependency-free.
The monorepo question
User asked mid-session: is the monorepo still worth it? Apps don’t share code. No internal packages. No cross-app refactors. Each deploy fights pnpm + nixpacks + Medusa’s bundle layout. Three repos would lose almost nothing day-to-day.
Park it. Deploy pain is mostly one-time. The cost of splitting (three CI configs, three Dependabots, lost atomic commits when we actually want them) outweighs the recurring annoyance once Railway’s wired.
Database naming sloppy on staging
Postgres has railway (medusa data) and payload. Tried to rename railway → medusa mid-session, got stuck (POSTGRES_DB is init-only), reverted. Decided: prod gets named databases (medusa, payload) + manually-configured env vars (MEDUSA_DATABASE_URL, PAYLOAD_DATABASE_URL) from day one. Staging stays sloppy. Works, just ugly.
Meilisearch image bump (v1.12 → v1.44) and the storage incompat
User bumped the Meilisearch image to v1.44 (latest, released that day). Container immediately crash-looped on “Your database version (1.12.8) is incompatible with your current engine version (1.44.0).” Predicted but underweighted by me; 32-minor-version leap on a service with a persistent volume was the higher risk than I sold it. Wiped the volume (data is derivative — Medusa is source of truth), boots clean on v1.44, hit Sync All Products from admin, reindexed in seconds.
Meilisearch memory
10GB RSS in the Railway metric. Not a leak — Meilisearch starts one actix worker per CPU core, and Railway’s host shows 48 vCPUs to the container. 48 workers × per-worker overhead + LMDB mmap window = the number you see. Set MEILI_MAX_INDEXING_MEMORY=512Mb and MEILI_MAX_INDEXING_THREADS=2 as guardrails — throttles indexing bursts, doesn’t cap worker count (no env var for that). Volume attached so Railway won’t let us cap CPU/memory at service level. For prod: Meilisearch Cloud or a properly sized self-host. Railway with 48 cores visible to the process is not the right home long-term.
Things I want to remember
- Medusa production pattern:
cd .medusa/server && pnpm install --prod && pnpm run db:migrate && pnpm run start. The bundle is a flat npm package. Move install to build phase, only migrate in pre-deploy. Pre-Deploy runs while old container still serves; if migrations fail the deploy aborts and prod stays up. Start runs after the swap. Do not put heavy install or migrations in Start. MEDUSA_BACKEND_URLis baked into the admin bundle at build time. Missing it = admin tries to fetch from localhost in production. “Failed to fetch” with localhost in DevTools Network tab is the tell.NEXT_PUBLIC_*is baked at build time, full stop. Changing them needs a redeploy.- Payload’s
push: trueis dev-only. Production needspayload migrate:create(424-line schema file in our case), committed, thenpayload migratein pre-deploy. - Static prerender +
useSearchParams()= build error in Next 16.<Suspense>orexport const dynamic = 'force-dynamic'. - Search facets ≠ search targets. Categories and tags belong in
filterableAttributes, notsearchableAttributes. - Medusa 2.15 admin doesn’t share its QueryClient. Custom admin routes can’t use
@tanstack/react-querydirectly. Use plain state or Medusa’s admin SDK hooks. - Meilisearch storage isn’t version-portable across many minors. Plan version bumps with a dump/restore (or wipe-and-reindex if data is derivative).
- Don’t paste live DB URLs into chat. Did. Need to rotate
POSTGRES_PASSWORD. - Don’t put PR slugs in commit messages, and don’t
@orgin PR comments. Commits travel beyond PRs (cherry-picks, mirrors); PR-level state goes stale. Bare@orgpings the real GitHub org. Backtick-wrap package names like`@tanstack/react-query`to neutralize. - The “we get more event coverage” pitch for a plugin doesn’t pay off if we don’t yet have the data shape that needs that coverage. Plugin’s variant/tag/collection subscribers are nice in theory; in practice we ship with 15 products in one branch per category and don’t exercise any of those code paths. Justify dependencies by current needs, not theoretical future ones.
Decision frame for production
After this PR merges, prod prep is the next milestone. Two things gating it:
- CI/CD setup. No CI on this repo yet. Need at minimum: typecheck, lint, build all three apps on PR. Probably GitHub Actions (default since we’re on GitHub already). Should also run
pnpm --filter backend build(catches the meilisearch ESM-style issues thatmedusa develophides) andpnpm --filter payload exec payload migratesmoke test against an ephemeral Postgres. Storefront:next build. - Prod environment. Separate Railway project (
emporium-prod), separate Postgres (namedmedusaandpayloadfrom the start), separate Stripe live keys, separate Meilisearch — and seriously consider Meilisearch Cloud over Railway self-host given the 48-vCPU visibility issue. Plan Postgres password rotation and env-var management cleanly from the start (no chat leakage this time).
Not prod-ready until both are in place. Don’t skip CI to ship faster — every fix in this PR was something pnpm --filter backend build would have caught on PR if CI had been running.
Action items (status as of end-of-day)
- Merge PR #14 to main (
fc9f3adsquash on main) - Switch all three Railway services from
chore/railway-staging-deployback tomainafter merge - Rotate
POSTGRES_PASSWORDon staging Postgres (leaked in chat during debugging) - PR #15 opened for oxlint/oxfmt swap; awaiting merge
- Set up GitHub Actions CI — plan written today, see below
- Plan prod environment — separate Railway project with named DBs from day one, manually-configured env vars (
MEDUSA_DATABASE_URL,PAYLOAD_DATABASE_URL), Meilisearch Cloud or self-host with sized CPU/memory, Stripe live mode keys - Populate Payload content (Footer global, static pages) for Carrie’s review — Brief 4.2
- Pick next feature brief in parallel — Phase 2 catch-up (homepage, staff picks/reading lists pages, events pages, wishlist, gift cards) or jump to Square sync per extended-catalog-and-preorders
Evening session — lint/format toolchain swap (eslint/prettier → oxlint/oxfmt)
After staging was up and PR #14 was through Copilot’s three review passes, jumped into the lint/format toolchain. Branch chore/lint-format-setup, PR #15.
What got done
- Dropped eslint + every plugin/config in storefront, eslint + prettier in payload. ~74 packages gone from the lockfile.
- Workspace-root
.oxlintrc.jsonand.oxfmtrc.json. Root scripts:lint,lint:fix,format,format:check. oxlintruns in 350ms across 487 files.oxfmtruns in 700ms across 530. Subjectively unmissable difference vs eslint+prettier.- 524-file reformat to oxfmt defaults: semis, double quotes, tighter line packing. Solace’s no-semi/single-quote style is gone — easy to restore via
.oxfmtrc.jsonif anyone hates it. - Cleared all errors (a11y on click-without-keyboard, missing useEffect deps, a real
<img>vs next/image case) and all warnings (down from 467 → 0) via a mix of rule scoping and source fixes.
Real bugs oxlint caught that eslint had been silent on
storefront/src/lib/data/customer.ts:is_default_shipping: formData.get("...") === "on" || "true" ? true : falsewas unconditionally true because the string"true"is truthy and short-circuits the OR. Addresses were saved as default-shipping regardless of the checkbox state. Live in main until today. oxc/const-comparisons + no-unneeded-ternary surfaced it.side-menu/index.tsx: redundantitem.icon && item.iconshort-circuit. Dead code, caught by oxc/const-comparisons.cart.ts: dead innerconst authHeadersredeclaration shadowing the outer in the region-update branch. no-shadow.
That validates the oxlint swap on its own: a strict linter caught a live production bug in the first pass after install.
What got not fixed (tuned-out warnings)
Inherited Solace code surfaces 185 react-perf hits (inline arrows in JSX props), 72 no-explicit-any, 38 no-array-index-key, etc. Refactoring all of those during a tooling-swap PR is the wrong shape. Turned the noisy categories off via config; they’re not deleted from oxlint, just not failing the build. Ratchet later.
CLAUDE.md does say “avoid inline object literals in JSX props” — so we do care about the react-perf rules conceptually. Honest plan: leave them off for inherited code, turn them on with a files: override scoped to new code, and clean up Solace components when we touch them for the rebrand.
Lessons worth pinning
- CI absence is now expensive to me, not abstractly. Every fix in PR #14 was something
pnpm --filter <pkg> buildwould have caught at PR time. PR #15 caught a live bug that should have been caught at PR time too. Next PR is CI. - 524-file reformats happen. Once. Make peace with the diff. They look terrifying but the actual changes are mechanical and reviewer-time spent on them is wasted. Squash-merge is the right call.
InstanceType<typeof import("x", { with: ... }).Class>is the safe TS type-import-from-ESM pattern. The directimport(...).Memberform broke between branches; the InstanceType wrapper is equivalent and more portable.- Tune rules to match your codebase reality, not the linter’s defaults. Custom widget patterns (Radix radios, button-styled checkboxes, span role=link breadcrumbs) all use ARIA roles intentionally. Blanket
prefer-tag-over-roleis wrong for any codebase with headless UI; turn it off. - Don’t put PR slugs in commit messages, don’t
@orgin PR comments. Saved this as a feedback memory after I did both during the staging deploy session.
CI + tests plan
Wrote [[../plans/ci-and-tests-plan]] tonight. TL;DR:
- 4 PRs: scaffolding (lint/format-check/typecheck/build) → backend integration tests → payload tests → storefront e2e.
- GH Actions, service containers for postgres/redis/meili, docker-compose for storefront e2e.
- Skip turbo. 3 packages with no inter-package code deps doesn’t unlock turbo’s value. pnpm +
actions/cachekeyed by lockfile hash gets ~80% of turbo’s win. - Critical paths to test: search dropdown, add-to-cart → payment intent render, account login, category page render, JSON-LD on PDP. Plus backend workflow tests for sync-products, plugin reindex (the race-condition fix needs a guard), Stripe webhook signature verify.
Tomorrow (2026-05-19)
Pick up by:
- Merging PR #15 (oxlint/oxfmt swap) — 11k diff but it’s the format pass, can’t usefully be reviewed line-by-line
- Switching all three Railway services back to
main - Starting the CI scaffolding PR (the first of the four PRs in the plan)
- Rotating the staging
POSTGRES_PASSWORD
Image storage decision (deferred, captured tonight)
Staging product thumbnails currently flash white/grey because Medusa has no file provider configured. Falls back to local container disk — uploaded files land at https://medusa-staging-7818.up.railway.app/static/<filename> and get wiped on every backend redeploy. The add-product-thumbnails.js script can re-upload them in seconds, so day-to-day fine for staging, just ugly until first paint.
Long-term: Cloudflare R2 + Cloudflare CDN. Already templated in .env.example (S3_FILE_URL, S3_ACCESS_KEY_ID, S3_BUCKET, S3_ENDPOINT), just not wired in medusa-config.ts and not set on staging. R2 wins over Railway buckets because:
- Zero egress fees. Big deal for an image-heavy product catalog
- $0.015/GB/mo storage
- Cloudflare CDN free in front, global edge cache
- Image Resizing add-on can serve appropriately-sized thumbnails per device on the fly
- Industry standard, portable if we ever leave Railway
Rough scope of a “wire R2” PR:
- Create R2 bucket + scoped API token in Cloudflare dashboard
pnpm --filter backend add @medusajs/file-s3(or whichever Medusa v2 S3 file provider is current)- Wire into
medusa-config.tsmodules - Set
S3_*env vars on local + medusa-staging - Custom domain
images.dungeonbooks.com→ R2 bucket via Cloudflare (optional but clean) - Re-run
add-product-thumbnails.json staging — now persists across deploys
Not blocking anything; do it before prod cutover so we don’t migrate live data.
Seed data drift / content reconciliation
Started poking at this end-of-day, then shipped the category piece tonight.
Drift surface found by querying the local DB vs reading seed.ts:
- Categories drifted via admin UI, never reflected back into
seed.ts. Local DB had “Comics” and “Gifts” as top-level (good); seed source still said “Comics & Graphic Novels” and “Stationery & Gifts”. Also a redundant soft-deleted “Comics” sub-category under the Comics parent (same handle as parent → URL collision risk). - Other drift items worth a separate pass when Carrie’s available:
- Product metadata corrections, pricing fixes, new titles
- Payload globals / pages still mostly empty on staging (Footer, Homepage, Announcements, Pages)
- Bookshop.org / Libro.fm affiliate URLs need to come from Carrie
Shipped (chore/reconcile-categories branch):
seed.tsupdated to the new taxonomy:- Top: Books, RPG, Comics, Gifts, Zines, Board Games
- Books subs: Sci-Fi, Fantasy, Horror, Literary Fiction, Non-Fiction
- RPG subs: D&D, Pathfinder, OSR, Other Systems, Accessories
- Comics subs: Graphic Novels, Manga (dropped the redundant “Comics” sub)
- One-shot idempotent script
reconcile-categories.ts. Renames if old name present, deletes redundant Comics sub if exists with 0 products, emitsmeilisearch.syncat the end. Verified locally — no-ops cleanly against the already-reconciled local DB. - PR not opened yet (waiting on CI scaffolding to land first so this gets a clean check).
Staging needs the script run after merging:
cd /app/backend/.medusa/server && pnpm exec medusa exec ./src/scripts/reconcile-categories.js
The deferred content reconciliation (product fixes, Payload paste, affiliate URLs) is a Carrie-coordination job — capturing as a follow-up brief, not blocking.
Cross-references
- task-briefs — Brief 1.7 done; Phase 2 / Phase 3 / Phase 4 ahead
- roadmap — Phase 1 effectively complete
- platform-eval-medusa-official-audit — for the
.medusa/serverproduction bundle pattern - ci-and-tests-plan — written tonight, drives 2026-05-19 onward