CI + tests setup plan

Why now

This is the gate before prod. PR #14 (Railway staging deploy) shipped a handful of issues that pnpm --filter <pkg> build would have caught at PR time if CI had been running:

Custom Meilisearch module’s ESM import that only medusa develop (swc/ts-node) tolerated
Two strict-null-check errors in workflow steps that medusa build (tsc) rejected
Payload’s missing migration generation (push: true is dev-only)
Storefront’s Solace-template next.config.js referencing undefined env vars Next 16 rejects
useSearchParams() in /reset-password without Suspense (Next 16 build error)

PR #15 (oxlint/oxfmt swap) caught a production bug that ESLint had been silent on: is_default_shipping: formData.get("is_default_shipping") === "on" || "true" ? true : false was always truthy because the string "true" short-circuits the OR. Live in main, addresses saved as default shipping regardless of checkbox. CI running oxlint on PR would have caught this when it landed.

Past the build/lint layer, we have zero automated verification of feature behavior. The 15 seeded products, the search index, the checkout flow, the admin auth — all manually clicked. We’ll keep regressing things on every refactor until there are guardrails.

What exists today

Package	Tooling	State
backend	jest configured (`test:unit`, `test:integration:http`, `test:integration:modules`)	Zero tests written
payload	playwright (`test:e2e`), vitest (`test:int`)	One stub test asserting the Payload blank-template homepage copy (“Welcome to your new project”)
storefront	playwright in deps, `e2e/` directory	All Solace template stubs — files containing only commented-out axios code or `export {}`

So test infrastructure exists but test coverage is zero.

What to actually test

Triaged by ROI for our app surface, not by general coverage targets:

Backend (jest integration tests)

syncProductsWorkflow: given products in DB with various statuses, indexes published ones to Meilisearch with the expected document shape (id, handle, title, subtitle, description, thumbnail, categories[{id,name,handle,parent_category_id}], tags, variants[{id,sku,title}], metadata). Verifies the field-mapping refinements in PR #14 don’t regress.
deleteProductsFromMeilisearchStep: given a list of product IDs, removes them from the index. Also: undo behavior re-indexes from the snapshot it took.
Meilisearch module updateSettings: idempotent against a fresh index (createIndex’s enqueued task waited for before settings sync) and a populated one. This is the race-condition fix from Copilot review round 3 — write a test or it’ll regress.
Stripe webhook handler: posts with a valid signature land, posts with an invalid signature 400. Don’t test the actual payment flow (that’s Stripe’s job) — test our verification.
Seed script idempotency: re-running seed.ts against an already-seeded DB doesn’t duplicate products. The script claims to be idempotent; verify it.

Payload (vitest integration + playwright smoke)

Migration smoke: spin up empty postgres, run payload migrate, assert all expected tables exist. Catches if someone forgets to regenerate migrations after adding a collection.
First-user creation through admin: 200 on POST to /admin/api/users with valid creds, 4xx without. Playwright optional here — the API check is enough.
Globals read via REST API: /api/globals/footer returns the expected shape. The storefront pulls from this in the live footer (PR #13).

Storefront (playwright e2e against a docker-composed stack)

Search dropdown: type “ursula” → results appear, click one → PDP loads. Hits the Meilisearch path end-to-end.
Add to cart → checkout to payment intent: full flow up to the Stripe Element render. Doesn’t submit payment (use Stripe’s test mode at most; ideally stop before).
Account login: register → log in → land on /account. Verifies the auth/customer paths.
Category page render: /categories/books renders with the seeded products. Catches storefront-Medusa wiring regressions.
JSON-LD on PDP: parse the page’s script[type="application/ld+json"] and assert the Product schema validates (we just shipped this in PR #12).

That’s 13 tests. Not coverage-completionist; just the surfaces that have actual logic + integration risk. Add more as features ship.

CI architecture

GitHub Actions, single workflow file, jobs run in parallel where possible.

.github/workflows/ci.yml

jobs:
  install:
    # pnpm install --frozen-lockfile, cache by lockfile hash
    # uploads node_modules as artifact (or relies on actions/cache)

  lint:
    needs: install
    # pnpm lint

  format-check:
    needs: install
    # pnpm format:check

  typecheck:
    needs: install
    strategy:
      matrix:
        package: [backend, payload, storefront]
    # pnpm --filter <pkg> exec tsc --noEmit (or build, since both apps need build to pass)

  build:
    needs: install
    strategy:
      matrix:
        package: [backend, payload, storefront]
    # pnpm --filter <pkg> build

  test-backend:
    needs: install
    services:
      postgres: postgres:16
      redis: redis:7
      meilisearch: getmeili/meilisearch:v1.12
    # pnpm --filter backend test:unit
    # pnpm --filter backend test:integration:http
    # pnpm --filter backend test:integration:modules

  test-payload:
    needs: install
    services:
      postgres: postgres:16
    # pnpm --filter payload test:int
    # (defer e2e until storefront is ready)

  test-storefront-e2e:
    needs: [build]
    services:
      postgres: postgres:16
      redis: redis:7
      meilisearch: getmeili/meilisearch:v1.12
    steps:
      - run backend in background
      - run seed
      - run payload migrate
      - run storefront build + start in background
      - pnpm --filter storefront test-e2e

Notes

Install caching: pnpm has the global store + content-addressable hardlinks. actions/cache keyed by pnpm-lock.yaml hash for ~/.local/share/pnpm/store. Reuse across jobs via actions/cache.
Service containers: GitHub Actions service containers work for postgres/redis/meilisearch. Connection on localhost:port. For storefront e2e we need backend + payload also running — those aren’t service containers (no prebuilt image), so spin them up in a step as background processes.
Concurrency: cancel-in-progress on the same PR (concurrency.group: ${{ github.ref }}). Don’t pay for redundant CI runs when someone pushes 3 commits in a row.
Required checks: lint, format-check, all typechecks, all builds. Tests are required once we have them passing reliably (start advisory, ratchet to required).
Node version: workspace declares >=24. Matrix isn’t useful — pin one version, bump deliberately.

Ephemeral postgres for backend tests

Two approaches:

GH Actions service container (postgres:16) + run migrations at job start. Simpler, runs in CI only.
testcontainers (@testcontainers/postgresql) in the test file itself. Works in CI and locally identically. Slower per-test if you spin up per-test; but you can scope a container to the test file’s lifetime.

Recommendation: service container in CI. Locally, devs use the already-running docker-compose postgres. Don’t introduce testcontainers complexity until we actually need it (e.g. parallel test runs that need isolated DBs).

How to split the work into PRs

Doing this as one giant PR repeats the PR #14/15 mistake of unreviewable diffs. Splits:

CI scaffolding (no tests): .github/workflows/ci.yml with install + lint + format-check + typecheck + build jobs. Required checks on PR. This PR catches all the build-level regressions immediately. Smallest, highest-leverage.
Backend integration tests: add jest setup (testcontainers or service-container-aware config), write the 5 backend tests. Wire test-backend job.
Payload tests: migration smoke + globals API check. Wire test-payload job.
Storefront e2e: the playwright suite + the e2e job that orchestrates the stack. Largest of the four because it needs the full stack running.

Ship 1 immediately. 2-4 as features get touched (write the test alongside the feature change, not as a backfill PR).

Open questions

Storefront e2e: docker-compose in CI or rebuild the stack manually in the job? docker-compose has a higher learning curve in Actions but is the same as local. Manual service-container + background-process approach is more brittle. Lean docker-compose — already what local dev uses.
Meilisearch in CI: do we pin v1.12 (matches local) or v1.44 (matches staging post-bump)? Staging matches what we actually run, but local dev is more representative of what devs hit. Probably v1.12 in CI; staging proved we can wipe-and-reindex on bump anyway.
Stripe in tests: don’t hit live or test-mode Stripe. Verify what we sign; mock the rest. Or skip end-to-end checkout testing in CI entirely (Stripe’s test mode is for manual QA, not CI).
Visual regression: skip for now. Playwright supports screenshot diffs but they’re brittle and add maintenance. Revisit if we get burned by visual bugs.
Coverage thresholds: no. Coverage is a vanity metric on a small codebase — write the tests that matter, don’t game a percentage.

Decision asks before starting

Confirm split-into-4-PRs approach (vs. one big test-infra PR)
Pick Meilisearch version for CI (v1.12 vs v1.44)
Confirm docker-compose-in-CI for storefront e2e (vs. raw service containers + background processes)

Cross-references

2026-05-18 — staging deploy + tooling swap context, includes “every bug was build-time catchable” thesis
task-briefs — Brief 4.1 (QA + Cross-Browser Testing) maps to the storefront e2e PR

Quartz 4

Explorer

ci-and-tests-plan