Built the whole Ingram catalog pipeline end to end today. Scrape iPage, push covers to R2, queue the books in a Postgres table, then pull them into Medusa as products from a button in the admin. Got the 25 SFF top sellers in, categorized, and published, and the product pages are live.

What we built

  • Revised the iPage scraper. We don’t need browser automation at all; a captured session cookie plus plain HTTP does everything (saved search grid, product detail, cover image). Dropped the whole Browserbase plan.
  • Catalog repo is the producer: scraper (Python, run with uv) scrapes the saved search, uploads each cover to R2, and upserts a normalized row per book into a book queue table on Railway Postgres. Redesigned the schema so the columns map almost one to one onto a Medusa product.
  • Emporium is the consumer. The importer lives in the Medusa backend (not the catalog repo) so a future admin button can reuse it. There’s now a Catalog Import settings page with Preview, Import as draft, Import as published, and Publish all, plus a queue table.
  • Wired R2 (S3) file storage into both the Medusa backend and Payload CMS. One shared bucket, separate prefixes: covers/raw for the scraper, medusa/ for admin uploads, cms/media for Payload.
  • Cookie refresh helper so reruns aren’t painful: paste a Copy-as-cURL and it writes .env and validates the session.

What we learned

  • iPage’s auth cookies are HttpOnly, so document.cookie silently drops them. Grab the cookie from Copy-as-cURL on a real app request (a .action or /ipage URL), not a static asset.
  • Medusa soft-deletes products (deleted_at stays). But the unique indexes on handle and sku are partial (WHERE deleted_at IS NULL), so a soft-deleted product frees up its handle. Verified it; a clean re-import works after a wipe.
  • Decided hardcover and paperback stay as separate products, not variants of one. They differ in pages and content, and a format can have its own variants later (a signed paperback). Handles are ISBN-suffixed so the two editions don’t collide.
  • A genre-mapping bug: BISAC writes LitRPG as “LitRPG (Literary Role-Playing Game)”, and the parenthetical tripped our “literary” rule, so the whole Dungeon Crawler Carl series got tagged Literary Fiction. Strip parentheticals before matching.
  • The storefront DYNAMIC_SERVER_USAGE error on product pages was the big rabbit hole. Root cause: the nav reads cart and login cookies, so pages can’t be statically prerendered. The Solace starter avoids this implicitly because its [countryCode] segment is a request-time param, which makes every page dynamic. Our fork hardcoded “us” and dropped that segment, so the pages became static-eligible and broke. Fix was to force-dynamic the (main) layout. The starter isn’t doing anything clever (no PPR); it’s just dynamic too.
  • PPR (static shell + dynamic cart island) does run self-hosted on Railway, it’s not a Vercel thing, but it needs Next canary, so not worth it right now.
  • iPage’s detail page has review quotes and a bio note but no real synopsis, so there’s no clean description to scrape. Left descriptions empty for staff to write.

Shipping it

  • Opened PRs for both repos and ran them through Copilot review, then addressed and replied to every comment.
    • catalog: made the migration non-destructive (no drops, create ... if not exists), skip grid rows with no ISBN, update scraped_at on re-scrape, and exit non-zero on a mid-run session expiry.
    • emporium: the import now takes a pg advisory lock so two concurrent runs (double-click, overlapping cron) can’t process the same rows and duplicate products. Validated by firing two imports at once: one created 10, the other skipped, 10 products not 20. Also made the Payload media migration idempotent, gated the Payload S3 plugin on all the S3 vars (not just the bucket), and gave the cover image a real alt.
  • Found a churn bug: content_hash included stock, and Ingram inventory changes constantly, so every re-scrape re-queued nearly every book (imported back to pending). Dropped stock from the hash. Doubly pointless because the importer is create-only, so re-importing existing books doesn’t update them anyway (update-in-place is backlogged).
  • Tooling so it isn’t a mouthful to run: a catalog CLI (cookie / scrape / status) and a justfile (just cookie, just scrape, just status, just migrate). Cookie refresh takes a pasted Copy-as-cURL (no clipboard reading) and you finish with a blank line instead of Ctrl-D.
  • Held off on concurrent/async fetch. At the concurrency iPage tolerates it’s a wash with threads, and the real win is at scale where you’d redesign anyway (pagination, a faster parser for the GIL-bound BeautifulSoup, rate-limit backoff). The saved search is ~13,619 pages, so scale is a real future thing.
  • Wrote a staging to prod cutover runbook (plans/staging-to-prod-cutover.md). Big one: prod gets its own R2 bucket, not the shared staging one (blast radius, a staging cleanup or “delete all products” can wipe live covers if shared). Plus the env diff: Stripe live keys, separate DBs/Redis, fresh secrets, CORS/URLs, separate Meilisearch.
  • Merged both PRs.

Next

Tomorrow: product pages first, then line up CI as the prod gate.

Product pages need a pass (separate PR):

  • Cover images are badly cropped; the image component crops the tall book covers.
  • Surface the metadata we already import: author, publisher, dimensions, weight, page count, etc.

Build/infra:

  • The monorepo is getting big. Split each service into its own Dockerfile that builds quickly and minimally. The storefront image currently bundles the whole workspace (~550MB); want Next standalone output and per-package scoped installs.

CI + tests (the prod gate):

  • This work added a lot with no CI. The build-time errors that shipped in PR #14 recur without it. Plan already drafted in plans/ci-and-tests-plan.md.

Importer update-in-place:

  • Right now it’s create-only (skips existing SKUs). For re-scrapes to actually push changed data (price, cover) into existing products, the importer needs an update path. Today’s content_hash fix just stopped the noise.

Catalog import UI at scale:

  • Fine for 25 titles, but once there are hundreds in the queue it gets unwieldy. Figure out how to handle already-added titles: a toggle to hide them, pagination, maybe purge imported rows from the queue DB. Open question, needs thought.

Scraper scale (my own track):

  • Pagination across the full saved search (~13,619 pages), a configurable --search-id, and concurrent fetch. Design it holistically with a faster parser and rate-limiting.

Same day, later: Docker per-service split + Railway staging cutover (PR #18) + everything that followed

Split the monorepo into per-service container builds and cut Railway staging over from RAILPACK auto-detect to Dockerfile builds. This was the “build/infra” backlog item from the catalog day, picked ahead of product pages. PR #18.

What we built

  • Three multi-stage Dockerfiles (backend/, payload/, storefront/), build context = repo root so the pnpm lockfile and every workspace package.json are reachable for a scoped, frozen install (pnpm install --frozen-lockfile --filter <svc>...).
  • Next standalone output (output: "standalone" + outputFileTracingRoot at the monorepo root) for storefront and payload, so the runtime images carry only the traced/pruned node_modules. Sizes: storefront ~285MB (was ~550MB), payload ~360MB, backend ~653MB.
  • docker-compose.app.yml: builds and runs all three against the existing infra compose for local dev. One command, no rebuild-per-URL, no Windows hosts edit.
  • Per-service railway.toml (config-as-code): DOCKERFILE builder, dockerfilePath into the subdir, watch paths, start/predeploy. Cleaner than json because we can comment build/predeploy/start in one place.
  • Root .dockerignore, pnpm image:* scripts, docs/docker-deploy.md.

What we learned

  • Sitemap was the one build-time backend coupling. app/sitemap.ts was ISR (revalidate=3600), so Next executed it at build and fetched products/categories/pages. A decoupled image/CI build has no backend, so the build died on /sitemap.xml. Fixed with force-dynamic (same class as the catalog day’s DYNAMIC_SERVER_USAGE rabbit hole). Nothing else fetches at build (pages already force-dynamic, no generateStaticParams).
  • The backend image is the biggest because Medusa has no standalone tracing. The other two prune to ~85MB of deps over base; Medusa’s .medusa/server prod install ships the whole tree (655MB node_modules): admin React deps (@medusajs/dashboard + react-aria/react-stately ~95M, even though the admin SPA is already prebuilt to public/admin at 8.8M), both swc platform variants (~60M, half unused on glibc), date-fns 37M, figlet 21M, esbuild 19M. Not bigger than the old Nixpacks build though, multi-stage already drops devDeps and build tools.
  • Docker Desktop networking bit us locally. --network host binds inside the Desktop VM, not WSL2/Windows localhost, so containers “ran up properly” but localhost:8000 gave nothing. Fix: bridge + -p publishing, infra by service name, and the storefront shares the backend’s network namespace (network_mode: service:backend) so its build-baked localhost:9000 works from both the browser and its own SSR.
  • Payload vs Medusa migrations split, because of the pruned image:
    • Payload’s CLI is build-only and absent from the standalone image, so payload migrate can’t run as a predeploy. Solved properly with the postgres adapter’s prodMigrations option: the migrations array (already exported from payload/src/migrations/index.ts) is imported into the config, so it’s bundled into the standalone trace, and Payload applies pending migrations on boot in production. No CLI, no image bloat. Verified the migration name compiles into the server chunks.
    • Medusa’s CLI is a runtime dep inside .medusa/server, so its migrate works as a predeploy (./node_modules/.bin/medusa db:migrate in the toml).

Railway gotchas (the real meat)

  • Metal builder rejects BuildKit cache mounts whose id lacks its cache-key prefix (flag --mount=type=cache ... is missing the cacheKey prefix). Removed the cache mounts; they were a local speedup and Railway caches the install layer anyway.
  • Root Directory must stay at the repo root (Docker context needs the lockfile); dockerfilePath points into the subdir. This is the opposite of Railway’s usual monorepo advice (set root to the subdir).
  • Watch paths: changing service settings (branch, config file) does NOT count as a file change, so the first Dockerfile deploy kept skipping with “No changes to watched files.” Watch paths only gate git-push auto-deploys. A manual/forced deploy (Deployments tab redeploy, Command Palette “deploy latest commit”, or railway up) bypasses them. Codified the watch paths into the tomls so they’re version-controlled, including root package.json (every image copies it).
  • Skipped builds is a footgun for the storefront specifically. It reuses the cached image when source is unchanged and only applies env at runtime. The storefront bakes NEXT_PUBLIC_* at build, so changing one of those vars triggers an auto-redeploy that SKIPS the build, goes green, and silently serves the stale baked value. Keep skipped builds OFF for storefront; safe (and useful) for medusa/payload, whose env is read at runtime.
  • Had to clear the stale RAILPACK Build/Start/Pre-Deploy commands from the dashboard per service (they pointed at the old cd backend/.medusa/server && ... layout and failed), so the Dockerfile CMD + toml are the single source of truth.

Dep pinning + the TanStack scare

  • Copilot flagged the pnpm install --prod in the backend server-deps stage running without a lockfile. Real: the workspace pins @tanstack/react-query to 5.96.1 but the fresh server install pulled 5.100.14. The .medusa/server manifest is generated by medusa build and has no companion lockfile, so a frozen install isn’t directly available. Pinned the three caret-ranged direct deps (react-query, meilisearch, pg) to exact; Medusa’s own packages were already exact. Pinned react-query upward to 5.100.14 (build and runtime now agree).
  • Mid-pin, checked the TanStack npm supply-chain compromise postmortem (attack May 11). We’re unaffected: our TanStack footprint is Query/Table/Virtual (all in the unaffected list), no Router/Start (the compromised repo), and all malicious versions were pulled from npm. The takeaway reinforced the direction: the attack vector was malicious lifecycle scripts on install + floating to a freshly-published version, and our exact pins + frozen lockfile + pnpm onlyBuiltDependencies (install scripts blocked except a 4-package allowlist in the backend image) is exactly that defense.

Shipping it

  • PR #18, ran through Copilot. One substantive comment (the lockfile/determinism one), addressed and replied.
  • Staging cutover, one service at a time: payload green, medusa green (/health, /app 200, predeploy migrate ran). Storefront deployed after forcing past the watch-path skip. As of writing, a medusa redeploy from the pin push is still in flight; merge is gated on it landing green.

Deferred / tech debt

  • CI as the prod gate (still the big one). No CI yet. Pre-existing failures already on main from PR #17 landing without it: a pnpm lint error + warning in payload/src/payload.config.ts (filename shadowing) and pnpm format:check failures in payload.config.ts and the backend/.../catalog-import files. Fold those fixes into the CI PR. CI should also build the Docker images on PRs to catch the build-time errors that shipped in #14. Turborepo is optional polish (speeds CI/local; doesn’t touch Railway’s isolated Docker builds), lower priority than just having CI, since Railway never gates PRs.
  • Backend transitive-dep determinism. Pinned direct deps, but .medusa/server’s fresh prod install still floats transitives (no lockfile). Full fix is a pnpm deploy prod tree (lockfile-pinned) with the built .medusa/server layered on top. Deferred to the CI/reproducibility work where it can be verified. This is what I told Copilot.
  • Backend image slimming (optional, ~653MB). Candidates: drop the unused musl swc variant (33M), try node:24-alpine (-100MB, needs native-module validation), test whether esbuild/swc/dashboard-source deps are actually loaded at runtime (admin is prebuilt) for maybe ~450MB. All need a boot + admin smoke test.
  • Payload migration race. prodMigrations runs on boot; Payload has a migration lock table, but verify before scaling payload past 1 replica. Fine on staging (single replica).
  • Prod cutover. This was staging only. Prod is still on RAILPACK (or not set up). The staging-to-prod runbook from the catalog day was never actually committed; redo it. Prod needs its own R2 bucket (blast radius), live Stripe keys, separate DBs/Redis/Meili, fresh secrets, CORS/URLs.
  • Storefront skipped-builds guardrail. Documented, but it relies on remembering to keep it off / force-rebuild on NEXT_PUBLIC_* change. Worth a more durable guard.
  • pnpm 11 upgrade (deferred). Major bump 10.8.1 11. Not urgent (10.8.1 works); naive corepack use pnpm@11 would break us because we lean on exactly what 11 changes:
    • pnpm settings no longer read from package.json move overrides (react 19), peerDependencyRules (Medusa), onlyBuiltDependencies into pnpm-workspace.yaml.
    • .npmrc becomes auth/registry only move public-hoist-pattern[]=@medusajs/* and engine-strict into pnpm-workspace.yaml (or @medusajs hoisting breaks).
    • onlyBuiltDependencies new allowBuilds map, and strictDepBuilds=true now FAILS (not warns) on unapproved build scripts (sharp/esbuild/unrs-resolver) — also patch the backend Dockerfile’s generated-manifest allowlist.
    • minimumReleaseAge=1440 (1 day) default; store v11 (SQLite index, re-fetches).
    • Widen payload engines pnpm: ^9 || ^10; Node 24 is fine (11 needs 22+).
    • Touches every install surface (local, CI, the 3 Docker images, Railway via packageManager). Do as its own PR with full verification + a staging build. Guide: https://pnpm.io/migration ; 11.0 notes: https://pnpm.io/blog/releases/11.0

Update (same session): CI lint+format gate shipped (#19, build jobs intentionally dropped — Railway/staging verifies builds); hot-reload dev compose added (#20). Then discovered the Medusa admin doesn’t work in the dockerized medusa develop (Vite can’t resolve admin source paths in the container; native works), so pnpm dev was repointed to the native loop (#22) — native is the daily driver, docker dev compose is optional. Product-page covers + metadata also shipped (rode into #20).

Inventory model (Carrie + Panat): online = order-on-demand via Ingram. Decision: mirror Ingram’s scraped warehouse stock for availability, capped at 50 (no ghost inventory) — means the import should switch from manage_inventory:false to manage_inventory:true + inventory levels min(stock,50), with a re-scrape sync (deferred, “larger question”; eventually unify Square in-store + Ingram in one place). Storefront side done: purchasability centralized in storefront/src/lib/util/inventory.ts so order-on-demand books aren’t hidden from shop/bestsellers/quick-add. Full model in the project_inventory_model memory.

Same day, later: product pages + a Next 16 caching rabbit hole (PR #23)

Landed 18-22 green, pointed the three Railway services back to main, then finally took the product pages, the item the catalog and infra work kept pushing. Turned into a deep caching investigation. One branch, feat/product-page-polish, PR #23. All storefront, no schema or backend changes, so it deploys as the storefront service alone.

Product polish (the original goal)

  • Author names format “Last, First” (and ”;“-separated multiples) into natural “First Last”. Ingram gives them inverted.
  • Description moved above Details, made white/readable, right column widens on large screens.
  • Covers: fixed the mobile cover not displaying (an items-center collapsed the carousel) and the zoom dialog square-cropping (object-cover to object-contain).
  • Variant option selector renders text labels for non-color options (Format was blank swatches; the Solace base only handled color via image/hex).
  • Full book dimensions in Details. We were showing width x spine-depth and dropping the tall measurement. Ingram’s length is the tall one; now W x H x D, width first, unlabeled (5.5 x 8.1 x 0.9). People can read it.
  • Price-less products treated as out of stock everywhere: filtered from listings, “Unavailable” label (no pulsing skeleton, no layout shift), add-to-cart disabled, OutOfStock in JSON-LD.
  • Cached-image skeleton clears on remount, so browser Back to a grid no longer shows a lingering gray pulse (ref callback checks img.complete during commit).

The Next 16 caching rabbit hole (the meat)

Three separate bugs, all invisible in next dev --turbopack because dev skips both type-checking and the data cache. Only a prod next build surfaces them.

  • Cart dropdown and bag badge didn’t update after add/remove. Next 16 split cache invalidation: revalidateTag(tag, "max") only purges, no read-your-own-writes in the same Server Action response. The new updateTag(tag) does it. Switched all cart and customer mutations.
  • Storefront hammered the backend on every render. The big one. The starter passed { next: { tags } } as the second arg to Medusa SDK methods, but that arg is the headers position, so cache options went out as an HTTP header named next (value [object Object]) and never reached fetch. Tags/revalidate were always a no-op; it only looked cached because Next 14 cached all GETs by default, and Next 15/16 dropped that. On top, (main)/layout.tsx had dynamic = "force-dynamic", which forces fetchCache: "force-no-store" and disables the data cache for the whole subtree. Fix: force-dynamic to await connection() (dynamic render without poisoning the cache), and wrap catalog reads in unstable_cache (caches the result regardless of how the SDK fetches; products 60s, taxonomy 3600s, tag-invalidatable). Verified by doing a local prod build and counting: warm product-page loads went 8 backend hits to 1.
  • Guest mode. getCustomer only returned null after a 401 round-trip; short-circuit when there’s no auth cookie. With the catalog cache, a logged-out warm load is essentially just the cart fetch.

Saved the whole thing to the reference_storefront_caching_next16 memory. Meta-lesson: run next build before merge; dev hid a type error and a silent cache no-op that would both have reached Railway.

Other fixes

  • Quick-add was broken (“Missing required parameters”). The tile reads useParams().countryCode, but this fork dropped the [countryCode] segment and hardcodes “us”, so the param is always undefined. Same orphaned multi-region assumption as the catalog day’s DYNAMIC_SERVER_USAGE thing. Defaulted both add-to-cart paths to “us”. Proper fix is a single country/region default instead of “us” scattered across six files plus two useParams reads; tracked.
  • Tax was inclusive, should be exclusive. A 30.01 + tax 32 + $2.12; Medusa was backing tax out. Store/region admin toggle, not code. Confirmed NJ taxes trade books at the standard 6.625% (only newspapers, periodicals, school textbooks exempt). Fixed on staging. Prod still needs it.

Shipping #23

  • Opened via Graphite, ran through Copilot. Two real bugs caught: addToCartCheapestVariant’s reduce threw on a price-less variant (filter to purchasable first); getProductsList computed nextPage from the post-filter page length so it almost always returned null (base on backend count; the /shop pagination uses the MeiliSearch count, a separate path, so it was fine). Two partially-right ones about updateTag tags matching nothing: the tag is decorative since cart/customer reads are intentionally uncached and per-user, but the refresh still works via updateTag’s re-render. Removed the dead tag header from retrieveCart, replied.

Deferred / tech debt (rolled up across the day)

  • Prod tax toggle. Staging done, prod outstanding. Money bug if missed.
  • Backend to storefront revalidate webhook. A Medusa subscriber that POSTs a storefront /revalidate route on product change, for instant catalog updates and to raise the 60s TTL.
  • Centralize the country/region default. One constant or env instead of “us” scattered + useParams reads. Deletes the quick-add bug class.
  • Inventory import switch (still): manage_inventory: true + level min(Ingram, 50) + stock location + re-scrape sync. The larger inventory question in project_inventory_model.
  • CI build gate, reinforced. Today is direct evidence: dev hid a type error and the cache no-op. A next build / tsc step on PRs catches that class before staging. Still no CI; the lint/format gate (#19) shipped but not build verification.
  • Smaller: weight row in Details (import stores grams, just convert), markdown in descriptions, import-side draft+warn for price-less books, backend image slimming, payload migration race, pnpm 11 (full notes above).

Next

Done: #23 merged to main; prod tax-inclusive toggle flipped off, so staging and prod both compute tax exclusive now. Suggested order from here:

  1. CI build gate (do this first). Dropped (next session, 2026-05-25). Reconsidered and killed it. The reasoning above was reactive and half-wrong: the #23 type error is caught by any next build, including Railway’s — so it was never reaching prod, only a question of timing. The #23 cache no-op (the bug that actually mattered) compiles and builds clean; no build step, CI or Railway, catches it — it was found by manually counting backend hits. So “build verification catches that whole class” is false for the bug that bit us. Decisive point: the real workflow is that I switch the Railway staging services onto the feature branch and build it against real infra before merge. So staging already is the pre-merge build check — a CI build job is fully redundant, not just marginal. Lint/format gate (#19) stays; no next build/tsc job added. (The behavioral class — silent cache no-ops, wrong-arg SDK calls — wants a smoke/integration test or a next build-plus-hit-count check, not a plain build gate. Left for if it recurs.)
  2. Backend to storefront revalidate webhook. A Medusa subscriber that POSTs a storefront /revalidate route (revalidateTag) on product change. Instant catalog updates, and lets the 60s product TTL go up. The clean close to the caching work.
  3. Centralize the country/region default. One constant or env instead of “us” scattered across ~6 files plus the useParams().countryCode reads in client components. Mechanical, low-risk, deletes the quick-add bug class.
  4. Inventory import switch (the strategic chunk, wants a dedicated session). manage_inventory: true + inventory level min(Ingram stock, 50) + stock location + a re-scrape sync. Pairs with importer update-in-place (currently create-only, so re-scrapes don’t push changed price/cover into existing products). The core order-on-demand model, project_inventory_model; eventually unify Square in-store + Ingram in one place.

Smaller backlog: weight row in Details (import stores grams, just convert), markdown in descriptions, import-side draft+warn for price-less books so Carrie sees them, catalog import UI at scale (hundreds of queued titles), scraper pagination across the full saved search.

Same day, much later: the PR cascade, Dashlane rabbit hole, Tailwind 4, and a codemod gotcha

Picked up from “Next” item #3 (country/region default) and basically swept the rest of the small/medium backlog plus the strategic moves the catalog and infra days kept punting on. Twelve PRs in one session, all stacked off main and merged one at a time. Order ended up reflecting risk: smallest correctness wins first, then the strategic ones, with Tailwind 4 last because its rebase blast radius was largest.

What landed (PRs in merge order)

  • #24 country/region default. Centralized the scattered "us" literals and the always-undefined useParams().countryCode reads into DEFAULT_COUNTRY_CODE in lib/constants. Dropped the dead countryCode arg from signout(). Killed the whole “Missing required parameters” quick-add bug class.
  • #25 tsconfig modernize. target: es5 → es2022, removed baseUrl (replaced with a catch-all "*": ["./src/*"] paths entry since ~79 bare types/* imports + a CSS side-effect import + an app/* import rely on it), fixed two Playwright POM subclasses that redeclared an inherited Locator field (TS2612 from modern class-field semantics). Cleared the TS 6 hard deprecation errors.
  • #26 deps currency. @medusajs 2.15.2 → 2.15.3 on backend; aligned storefront js-sdk/types/icons/ui-preset 2.14.1 → 2.15.3 and @medusajs/ui 4.1.8 → 4.1.13; Next 16.2.4 → 16.2.6 across storefront + payload (payload was on 16.2.1 — found in the dev log mid-investigation). Bumping the @medusajs React components leaked @types/react@18 into the React-19 storefront’s type resolution, so the fix is a workspace pnpm override forcing @types/react/@types/react-dom to 19 — the runtime is already on React 19 globally via the existing react override, so this just makes the types match what’s actually installed. Verified medusa build (server + React admin) clean under that.
  • #27 CI actions bump. actions/checkout + actions/setup-node + pnpm/action-setup v4 → v6 (Node 24 runtime; v4 was on the deprecated Node 20).
  • #28 revalidate webhook. Closed the #23 caching arc. Medusa subscriber on product.* / product-variant.* / product-category.* / product-collection.* events POSTs STOREFRONT_URL/api/revalidate with an x-revalidate-secret header; the storefront route validates and revalidateTag(tag, "max"). Best-effort, 3s AbortSignal.timeout, no-ops if env unset. Verified end-to-end on staging: edited a product, change appeared in <30s (inside the 60s TTL → confirmed the webhook purged, not the timer). Real value isn’t the 60s product window — it’s the 1-hour taxonomy window collapsing to instant; category renames stop being invisible for up to an hour. (Honest self-correction: I initially oversold the necessity. Carrie’s question of “is this even needed” pushed the right answer — at this scale, taxonomy is the only genuine win; products were always polish.)
  • #29 tsconfig catch-all removal. Followed up #25 by replacing the catch-all "*" with explicit types/* / styles/* / app/* mappings. Also dropped the dead @pages/* alias. Verified the build still resolves the bare CSS import (which tsc can’t see and was the kind of thing #25’s catch-all silently let through).
  • #31 blog filter navigation loop. Pre-existing bug surfaced by #30 visual QA: a useEffect in blog/components/filters/index.tsx called router.push(?…) on every searchParams change, which produced new searchParams and re-fired the effect — /blog GET storm in dev. Guarded to only push when stripping the redundant all-posts category; later (per Copilot) switched to router.replace(createUrl(pathname, params)) so back doesn’t land on the canonical form and re-trigger.
  • #32 order confirmation address — US format. The order-confirmation page rendered EU order ({postal_code} {city}, {COUNTRY} with province appended). Rewrote order/components/shipping-details to a multi-line US format via a small AddressBlock (name / [company] / street / [address_2] / City, Province ZIP / country / email, phone), fixed the literal undefined that showed when address_2 was empty, and DRY’d the shipping+billing duplication.
  • #33 nav drop “Shop all”. The Shop and Collections hover dropdowns rendered a “Shop all” button that linked back to /shop — same destination as the parent nav item. Guarded the button to render only for actual category dropdowns (which gives a useful “Shop all {Category}” destination); skip for the two top-level cases. Copilot caught a related bug while I was there: my condition was vacuously true when activeItem was null (which the caller passes), so guarded that too.
  • #34 checkout + account form polish (the big one — became three concerns in one PR).
    • Field reorder to US convention across all five address forms (checkout shipping/billing, checkout new/edit address, account modal): swapped postal_code → city → country → state to city → state → postal_code → country.
    • Added the Apt, suite, etc. (optional) (address_2) line between Address and City in every form. Wired through Yup schemas (3), Formik initialValues (checkout + account default + use-checkout-forms’s internal FormikErrorsType), and setAddresses in cart.ts (which was hardcoding address_2: "").
    • Address-summary multi-line US format on the checkout review (addresses/index.tsx) — same pattern as #32, plus fixed a copy-paste bug where the billing summary referenced shipping_address.province.
    • Robustness pass after Copilot review (10 comments): added the missing address_2 to checkoutFormValidationSchema.shipping_address; normalized initialValues to use explicit per-field cart?.X?.field ?? "" instead of cart?.X || { defaults } (so omitted optional keys can’t make inputs uncontrolled); added ?? "" coercion to every value= for address_2; added address_2 to setFormAddress; coerced formData.get(...)?.toString() ?? "" in setAddresses.
  • #35 strip explanatory comments across recent PRs. Carrie’s call: too many of my one-to-three-line “why” comments in the recent PRs. Eight-file sweep that removed the comment additions from 33. Logic untouched, 17 deletions, 0 insertions of consequence.
  • #30 Tailwind 4. Held last because its rebase blast radius was largest (touches ~88 component files for class renames). Full official @tailwindcss/upgrade codemod, manual peerDependencyRules override (@medusajs/ui-preset>tailwindcss: "4" + same for @tailwindcss/forms) since Medusa’s preset still peers ^3.4.3. Verified storefront next build clean; built CSS (118 KB) includes utilities, the Medusa preset theme tokens (--bg-base, ui-fg-base), our custom colors (bg-primary, hover:bg-hover), and every used keyframe — confirming the preset and plugins load through @config. Medusa flags v4 unofficial; visual QA on staging is the real gate, no regressions caught.

The Dashlane autofill rabbit hole (the meaty learning)

Background: Dashlane was filling “New Jersey” into the Company field on checkout, and the icon wasn’t appearing on address inputs. The user (correctly) shoved back twice when I tried to blame their saved Dashlane data — Dashlane works on every other site, so it’s our form.

Diagnostic: Dashlane reports a data-dashlane-classification attribute it injects post-hydration. Comparing ours vs Shopify on the same field:

  • Shopify company: data-dashlane-classification="company,company_name" — clean single class.
  • Ours company: data-dashlane-classification="address,region,extra,company,company_name" — multi-classified including address and region.

That’s why state lands in company: Dashlane sees the field as both company AND region, picks region, and dumps NJ there.

Two attempts that didn’t move the classifier:

  1. WHATWG shipping / billing section prefix on autocomplete (autocomplete="shipping organization", etc.). Spec-correct, what Shopify uses, but didn’t change Dashlane’s classification at all — re-checked the live element, same multi-classification.
  2. section-customer grouping prefix on top of that. Tried specifically to see if it would give one-click multi-field fill. No change in classification, no change in click count. Reverted.

The real cause: Dashlane heuristically tokenizes the name attribute and weighs it heavily, ignoring autocomplete for classification. Our name="shipping_address.company" contains the substring "address", which tags the field with address-y classifications. The dotted form is a Formik nested-path convention, baked into how the parent component reads/writes values.shipping_address.X.

The fix: decouple the DOM name from the Formik path.

  • DOM: name="ship-organization", name="ship-address-line1", etc. — canonical autocomplete tokens with a ship- / bill- section prefix that doesn’t contain "address".
  • Formik: untouched. State stays as shipping_address.X / billing_address.X.
  • Bridge: replace onChange={handleChange} (which routes by e.target.name) with explicit onChange={(e) => formik.setFieldValue("shipping_address.first_name", e.target.value)} per field.

addresses/index.tsx builds FormData programmatically from Object.entries(formik.values.shipping_address) on submit, so cart.ts setAddresses keeps reading shipping_address.X keys — no backend wiring touched. Plumbed setFieldValue through to BillingAddress (it was only receiving handleChange).

After the swap: classification dropped to clean company,company_name, Dashlane icon showed up on the address fields, state landed in State, NJ stopped going to Company. The fix is for the form structure, but it lives at the DOM-name layer — orthogonal to Formik nesting, orthogonal to Medusa wiring.

Multi-click vs single-click remained a Dashlane vault-organization thing (Identity vs Address are separate record types in its vault). Not pursued — the real bug was correctness, not click count.

Tailwind 4 codemod gotcha (#30 rebase)

After the rest of the PRs merged, #30 was 1 ahead / 6 behind main with conflicts in every component file the polish PRs had touched. Manually merging 90+ class-rename conflicts vs polish edits is a bad path; the right move was reset the branch to current main and re-run the codemod fresh. The codemod is deterministic; output ends up equivalent to the original migration.

First attempt did 5 file changes only (config + postcss + globals.css + lockfile + peer override) — the per-file class renames just didn’t happen. Build worked (Tailwind 4 understands v3 syntax via back-compat), CSS was correct, but the original #30’s 90+ file rename diff was gone. Pushed it; the user (correctly) flagged “what happened to all the files we changed?“.

Cause: @tailwindcss/upgrade gates its per-file rename pass on the installed tailwindcss version in node_modules. With v4 already installed there from the previous codemod run, the tool sees “v4 is here, migration already happened” and skips the source pass. git reset --hard doesn’t touch node_modules, so the stale v4 install poisoned the fresh attempt.

Fix:

  1. git reset --hard origin/main (restore source).
  2. pnpm install (this downgrades node_modules/tailwindcss to whatever main’s lockfile pins — 3.4.x).
  3. Now run npx --yes @tailwindcss/upgrade --force — full migration, 90+ file diff, including adding @config '../../tailwind.config.js' to globals.css (the partial run skipped this too, which is why the partial result wouldn’t have worked without the @config I had to add manually).

Captured the gotcha in a memory (reference_tailwind_upgrade_codemod_gotcha) so future sessions don’t repeat it.

Strategic decisions captured

  • Solace is unmaintained, no upstream. No solace remote, no maintainer, no future updates from them. The storefront is permanently owned code. Practical consequence: keeping structural closeness to the starter has zero value, so modernizing freely is pure upside (the [countryCode] cruft, the SDK cache bug, the form name flatten — all justified). The only meaningful “upstream” to track is @medusajs/* npm packages. Captured in project_platform_decision.
  • Tailwind 4 is feasible, not blocked. I initially called it blocked based on @medusajs/ui-preset’s ^3.4.3 peer dep + Medusa’s React 18 era. The user pointed me at medusa#11040 which showed the defaultTheme break is already fixed in ui-preset@2.15.3, and the Medusa UI standalone docs document a v4 path (marked unofficial but supported). Migration works through @config + peer override.
  • Inventory deferred OFF the prod-readiness path. Spent a chunk thinking through the order-on-demand model (online = order-on-demand via Ingram, Carrie fulfills per order; mirror Ingram stock capped at 50). The cap is arbitrary; the real call is what counts as “available” — on-hand only, or include on-order (incoming)? On-hand-only conflicts with the goal of pre-ordering popular forthcoming books (those have onHand=0). The 3-way status (in-stock / pre-orderable / genuinely-unavailable) is the right model, but the only “shouldn’t sell” case (out of print) is the least-detectable in Ingram’s data. Net: at launch scale, occasional unfulfillable orders are manageable; getting it wrong blocks real sales (including pre-orders). Ship prod always-purchasable; revisit with real order data as a Carrie merchandising decision. Decision doc: notes/plans/order-on-demand-inventory.md. Memory: project_inventory_model updated to reflect “off prod path, not just backlogged”.

Memory updates (so future sessions don’t redo this thinking)

  • feedback_minimal_comments escalated significantly. Previous version said “very few, very short comments — only the non-obvious why, in one line.” The user told me to stop adding comments 5 times across this session, in one PR alone. Updated to: default is zero comments. Code + commit message + PR description is the whole story. Even one-line comments get cut. Only acceptable comments are genuinely gnarly stuff and ask first.
  • reference_tailwind_upgrade_codemod_gotcha new. The node_modules-state trap above. Indexed in MEMORY.md.
  • project_platform_decision extended with the Solace-unmaintained finding and its consequence.
  • reference_workflow extended with the staging-is-the-pre-merge-build-check fact (Panat switches Railway services onto the feature branch before merging, so a CI build job is redundant — caught my original assumption about Railway only building main).
  • reference_docs fixed — the vault moved to notes/ from docs/ (docs/ is now for policies only).

What’s left

  • Prod cutover (still). Tax toggle on prod, prod CORS lockdown (staging is *, fine for staging, not for prod), separate R2 bucket, live Stripe, separate DBs/Redis/Meili, fresh secrets. The runbook in notes/plans/staging-to-prod-cutover.md should be reviewed against where staging actually landed today.
  • Inventory switch — parked per the decision doc. Revisit with real order data and Carrie.
  • E2e coverage — accumulated gaps tracked at notes/plans/e2e-test-coverage.md (Copilot flagged the address-block assertion on /order/confirmed during #32 review; data-testid="shipping-address-summary" and "billing-address-summary" already added). Decision pending on whether to add e2e to CI.
  • Optional canonical Tailwind cleanups — the codemod normalized !px-14px-14! and border-[1px]border across the codebase, but anything similar in newer code would re-introduce v3 syntax. Not urgent (v4 back-compat covers it).

Next session (2026-05-26)

Order by what unblocks launch:

  1. Merge #30 (Tailwind 4) after staging visual QA. Point storefront staging at the branch, click through nav, product, cart, checkout, account, dark-mode toggle, Medusa UI components (buttons/drawers/selects/tables). Medusa flags v4 unofficial, so eyes are the gate. Merge once satisfied — last open PR; unblocks downstream.
  2. Prod cutover — the actual launch blocker. notes/plans/staging-to-prod-cutover.md predates today’s churn and needs a pass (deps current, tax toggle, revalidate env, CORS, Tailwind 4). Then execute:
    • Tax toggle: flip on prod (mirror staging).
    • CORS lockdown: staging is *, prod needs explicit STORE_CORS / AUTH_CORS origins.
    • Fresh secrets: regen REVALIDATE_SECRET, JWT_SECRET, COOKIE_SECRET for prod.
    • Separate R2 bucket (blast radius), separate DBs/Redis/Meili, live Stripe keys + webhook endpoint.
    • Point Railway prod services at main.
    • Smoke: complete order, admin login, search, 3DS checkout, revalidate webhook fires.
  3. Shopify-style form layout refinement (the screenshot Carrie shared). Country at top of shipping, First/Last on one row, City/State/ZIP on one row, Email into a separate “Contact” section. Pure layout — small PR. State-as-dropdown is the bigger follow-up: needs a US states list + Select + country-conditional behavior; do as its own PR.
  4. E2e coverage decision (notes/plans/e2e-test-coverage.md). Decide Playwright-in-CI vs local-only. Then sweep the known gap (address-block assertion on /order/confirmed — testids already in place from #32).

Parked, not tomorrow: inventory switch (order-on-demand-inventory.md — off prod path), pnpm 11 (its own session, breaking), smaller backlog (weight row, markdown in descriptions, importer update-in-place, backend image slim, etc. — after launch).

Suggested cadence: #30 merge in the first hour (fast); bulk of the day on prod cutover (real unblock); layout polish if there’s time.