Catalog cron jobs

What’s worth running on a schedule vs on-demand. A cronjob earns its place when:

  1. The task is naturally periodic (data freshness, monitoring)
  2. The cost of stale data exceeds the cost of doing the work
  3. No human decision is needed per run

Strong candidates, ordered by leverage

Hourly

1. Stock-only sync.

  • The single highest-value cron. Stock changes hourly at Ingram; stale Medusa inventory means customers buy books that aren’t actually shippable.
  • Already designed in medusa-inventory (the lightweight just sync-stock command).
  • Cheap: ~15s per 100 books, no detail-page fetches if narrowed to stock fields.
  • Updates queue DB stock column → triggers a follow-on Medusa inventory_level update for any book whose total on_hand changed.

Daily (early morning, before store hours)

2. Pre-order release-day scan.

  • Query: where sync_status = 'imported' and pub_date = current_date.
  • For each match: trigger “now available” workflow. Concretely: storefront banner, email anyone with a wishlist match, flip product state from preorder→available, post to Discord.
  • Pure DB query, runs in seconds.
  • The kind of thing customers will praise without realizing why.

3. Pipeline heartbeat.

  • Ingest one known stable ISBN end-to-end (auth → CSV → ean_id → upsert).
  • If it fails, Discord ping with the error.
  • Catches: Imperva changes, iPage HTML drift, expired cookies, R2 outages, Medusa downtime. All the failure modes that otherwise only surface on the next manual real run.
  • ~30 seconds. Worth it.

Weekly (Sunday or Monday morning)

4. Full ingest of curated lists with diff notification.

  • Already in catalog-followon-ideas.
  • Run SFF (206002) + Indie Vault + Epic Fantasy + BookTok-{current month}.
  • Diff vs current queue state. Discord post:

    “This week’s catalog: +14 new books, 3 went OOP, 5 returned to stock, here’s the list.”

  • Turns the catalog from “needs operator attention” to “tell me when there’s something to look at.”

5. Hardcover rating refresh.

  • Once the sff-destination-strategy Hardcover integration ships.
  • Walk all imported products, refetch ratings + top reviews, update local cache.
  • Low value per call, high cumulative value as Hardcover’s data grows.

Worth flagging but skipping for now

  • Cookie refresh as its own cron. The cookie expires in hours, so any cron run more often than the cookie’s lifetime needs a fresh one. Two paths: (a) every cron starts with just login, (b) lazy refresh on the first call that sees session expired. (b) is simpler and works until we’re hitting iPage from cron more often than the cookie lives. Not urgent.
  • Going-out-of-print detector. Useful eventually, but real OOP events are rare; books in store only need a sweep once per few months. Manual works for now.
  • Recategorize. Only relevant after a rules change, which is human-triggered. Cron’ing it would just be noise.
  • Cover-change detection (catalog-followon-ideas entry). Not yet built, no urgency.

Operational scaffolding the crons need

Three things matter as much as the jobs themselves:

Observability. Every cron must report to Discord (success summary on a quiet channel, failure alert on a louder one). Silent failures rot until someone notices.

Idempotency. Every job must be safe to retry. The existing pipeline already is — upsert by isbn13, no destructive side effects. New jobs must preserve this.

Where they run. Two options:

  • Railway cron alongside the catalog service — simplest. Single config, shared env vars, same DB credentials, same Discord webhook. Recommended.
  • GitHub Actions scheduled workflows — cheaper for low-frequency jobs but loses shared env / DB access and adds CI-runtime variability. Only worth it if Railway cron pricing surprises us.

Suggested order

Roughly priced by leverage vs effort:

  1. Pipeline heartbeat (daily). Smallest, gives observability for everything that comes after. Build first.
  2. Stock-only sync (hourly). Highest customer-facing impact. Depends on medusa-inventory work landing first.
  3. Weekly full ingest + Discord diff. Highest operator-facing impact. Independent of inventory work.
  4. Pre-order release-day scan (daily). Cheap delight feature. Useful once we have wishlists / customer notification infra.
  5. Hardcover rating refresh (weekly). Depends on Hardcover integration (sff-destination-strategy) shipping first.

Cross-references