Ingram iPage pipeline

Automated pipeline to populate Emporium’s available-to-order catalog from Ingram iPage, replacing the manual CSV/Omnibus import proposed in extended-catalog-and-preorders

Problem

The extended-catalog-and-preorders plan assumed a weekly CSV export from Edelweiss Omnibus → Medusa CLI import. In practice, Omnibus is slow to browse and its CSV export is inconsistent. Ingram iPage has real-time stock data, a saved-search feature for SFF top sellers, and better availability signals. We can get richer metadata (per-DC stock, binding, page count, exact SRP) and cover images without waiting for Bookshop.org or Omnibus APIs.

The risk is that iPage is a legacy web app. Scraping it means browser automation, 2FA via email (Okta), and a hover-image endpoint that requires an authenticated session. We need a design that keeps customer data completely separate from the scraping process.

Approach

Three layers, zero trust between the scraper and the customer database.

Architecture

[Ingram iPage]
    |
    v
[Scraper]  -- authenticated browser session (Browserbase)
    |
    v
[Queue DB]  -- separate Railway Postgres, no customer data
    |
    v
[R2 / Railway Buckets]  -- cover images, S3-compatible
    |
    v
[Medusa Import Script]  -- runs inside Emporium backend
    |
    v
[Medusa DB]  -- customer orders, payments, addresses

Credential separation:

  • Scraper gets write-only access to the queue DB and R2 bucket.
  • Medusa gets read-only access to the queue DB.
  • Medusa DB credentials never leave the backend.
  • If the scraper session or credentials are compromised, the attacker sees unimported book titles and cover JPGs. No customer data.

Queue DB schema

Database: catalog-import (separate from medusa and payload)

Table: ingram_queue

CREATE TABLE ingram_queue (
    id               SERIAL PRIMARY KEY,
    isbn             TEXT NOT NULL,
    title            TEXT NOT NULL,
    author           TEXT,
    publisher        TEXT,
    publication_date TEXT,
    binding          TEXT,
    page_count       INT,
    srp_price        NUMERIC(10,2),      -- list price in cents or decimal
    weight           NUMERIC(10,2),      -- shipping calc weight in lbs or kg
    cover_url        TEXT,               -- public R2 URL (e.g. covers.dungeonbooks.com)
    category_hint    TEXT,               -- sci-fi, fantasy, horror, etc.
    stock_json       JSONB,              -- per-DC on-hand / on-order snapshot
    imported_at      TIMESTAMP,
    failed_reason    TEXT,
    created_at       TIMESTAMP DEFAULT NOW()
);
 
CREATE UNIQUE INDEX idx_queue_isbn_unimported
    ON ingram_queue(isbn)
    WHERE imported_at IS NULL;

Scraper (browser automation)

Runs weekly or on-demand via Hermes browser tools (Browserbase cloud). Not committed to the Emporium repo yet; lives in the agent context until we stabilize it.

Inputs:

  • IPAGE_USERNAME=lunalin
  • IPAGE_PASSWORD (agent secret)
  • IPAGE_ACCOUNT=SCRIPT WIZARDS LLC (20AR561)
  • IPAGE_SAVED_SEARCH_ID=206002 (SFF sorted by Ingram Demand)
  • QUEUE_DB_URL
  • R2_ENDPOINT, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_BUCKET_NAME, R2_PUBLIC_URL

Flow:

  1. Login to iPage via old login form (avoids Okta redirect loop).
  2. Request 2FA code via email; read Hermes inbox; submit code.
  3. JS-navigate (not browser_navigate) to saved search results to preserve session.
  4. Extract product rows: ISBN, title, author, publisher, binding, pages, price, weight.
  5. Trigger stock popups and read per-DC availability (PA, TN, IN, OR).
  6. For each title, fetch cover from hoverImage.jsp?ean=<ISBN>&size=640&howerType=Y using active session.
  7. PUT cover to R2: covers/raw/<isbn>.jpg.
  8. Insert row into queue DB with cover_url = '<R2_PUBLIC_URL>/covers/raw/<isbn>.jpg'.

Session survival rule: browser_navigate kills the Browserbase session. Navigation inside the page must use window.location.href = ... via browser_console.

Known iPage quirks:

  • Hover endpoint has a typo: howerType=Y (not hoverType).
  • Cover fetch blocked by CORS in browser context; Node fetch inside the scraper script works fine.
  • Zero on-hand with high on-order (e.g. ACOTAR #6, 199,999 on-order) is normal for not-yet-released titles.

Medusa import script

A standalone script in backend/src/scripts/import-ingram-queue.ts (or a scheduled job in src/jobs/). Connects to both databases.

Flow:

  1. SELECT * FROM ingram_queue WHERE imported_at IS NULL LIMIT 25.
  2. For each row:
    • Create or update product in Medusa via createProductsWorkflow.
    • Set manage_inventory: false (available to order, not on shelf).
    • Tag procurement_type: special_order.
    • Price = SRP (no markup for now).
    • Pass the R2 cover URL directly; Medusa downloads and stores it in its own file module.
    • Assign category using category_hint (fantasy / sci-fi / horror / etc.).
  3. UPDATE ingram_queue SET imported_at = NOW() WHERE id = $id.
  4. Log failures to failed_reason.

Cover handling:

  • Medusa’s file module fetches the image from R2 at import time.
  • No iPage session needed in the backend.
  • Fallback: if R2 URL is missing, try Open Library cover API (https://covers.openlibrary.org/b/isbn/<isbn>-L.jpg).

Storage (covers)

Option A: Railway Buckets

  • Fastest to set up inside the Railway project.
  • Egress costs accumulate if storefront serves images directly.
  • Fine for staging / early testing.

Option B: Cloudflare R2

  • Zero egress, global CDN.
  • Recommended for production.
  • Public bucket URL or custom domain (covers.dungeonbooks.com) via CNAME.

Bucket layout:

covers/
  raw/
    9781639739134.jpg
    9781250347374.jpg
    ...

Future: add /processed/ for web-optimized variants if needed.

Open questions

  • R2 bucket setup: bucket name will be dungeonbooks on Cloudflare R2; Panat to create and provide R2_ACCESS_KEY_ID / R2_SECRET_ACCESS_KEY
  • Railway Postgres for queue DB: spin up a new catalog-import DB (separate project or same Railway project, isolated from medusa / payload)
  • Medusa job scheduling: run import as a CLI script triggered manually, or a scheduled job inside the Medusa worker?
  • Category assignment heuristic: iPage BISAC categories are noisy. Do we map FIC009000 → “Fantasy” automatically, or maintain a manual mapping sheet?
  • Zero-stock titles: some top-sellers have 0 on-hand at all DCs but high on-order. Do we still list these as “available to order” with a longer ETA, or exclude them until on-hand > 0?
  • SRP pricing consistency: confirm with Carrie that SRP with no markup is the pricing policy for special-order titles.
  • Existing seed script: backend/src/scripts/seed-ingram-catalog.ts was drafted during a session and lives in the agent’s /tmp/emporium/ copy. It hardcodes 25 titles. Do we keep it as a one-off fallback, or replace it entirely with the queue-based pipeline?

Status

Draft — not yet approved. Replaces the Omnibus CSV path in extended-catalog-and-preorders for the SFF top-seller slice.