Ingram iPage pipeline

Automated pipeline to populate Emporium’s available-to-order catalog from Ingram iPage, replacing the manual CSV/Omnibus import proposed in extended-catalog-and-preorders

Problem

The extended-catalog-and-preorders plan assumed a weekly CSV export from Edelweiss Omnibus → Medusa CLI import. In practice, Omnibus is slow to browse and its CSV export is inconsistent. Ingram iPage has real-time stock data, a saved-search feature for SFF top sellers, and better availability signals. We can get richer metadata (per-DC stock, binding, page count, exact SRP) and cover images without waiting for Bookshop.org or Omnibus APIs.

The risk is that iPage is a legacy web app. Scraping it means browser automation, 2FA via email (Okta), and a hover-image endpoint that requires an authenticated session. We need a design that keeps customer data completely separate from the scraping process.

Approach

Three layers, zero trust between the scraper and the customer database.

Architecture

[Ingram iPage]
    |
    v
[Scraper]  -- authenticated browser session (Browserbase)
    |
    v
[Queue DB]  -- separate Railway Postgres, no customer data
    |
    v
[R2 / Railway Buckets]  -- cover images, S3-compatible
    |
    v
[Medusa Import Script]  -- runs inside Emporium backend
    |
    v
[Medusa DB]  -- customer orders, payments, addresses

Credential separation:

Scraper gets write-only access to the queue DB and R2 bucket.
Medusa gets read-only access to the queue DB.
Medusa DB credentials never leave the backend.
If the scraper session or credentials are compromised, the attacker sees unimported book titles and cover JPGs. No customer data.

Queue DB schema

Database: catalog-import (separate from medusa and payload)

Table: ingram_queue

CREATE TABLE ingram_queue (
    id               SERIAL PRIMARY KEY,
    isbn             TEXT NOT NULL,
    title            TEXT NOT NULL,
    author           TEXT,
    publisher        TEXT,
    publication_date TEXT,
    binding          TEXT,
    page_count       INT,
    srp_price        NUMERIC(10,2),      -- list price in cents or decimal
    weight           NUMERIC(10,2),      -- shipping calc weight in lbs or kg
    cover_url        TEXT,               -- public R2 URL (e.g. covers.dungeonbooks.com)
    category_hint    TEXT,               -- sci-fi, fantasy, horror, etc.
    stock_json       JSONB,              -- per-DC on-hand / on-order snapshot
    imported_at      TIMESTAMP,
    failed_reason    TEXT,
    created_at       TIMESTAMP DEFAULT NOW()
);
 
CREATE UNIQUE INDEX idx_queue_isbn_unimported
    ON ingram_queue(isbn)
    WHERE imported_at IS NULL;

Scraper (browser automation)

Runs weekly or on-demand via Hermes browser tools (Browserbase cloud). Not committed to the Emporium repo yet; lives in the agent context until we stabilize it.

Inputs:

IPAGE_USERNAME=lunalin
IPAGE_PASSWORD (agent secret)
IPAGE_ACCOUNT=SCRIPT WIZARDS LLC (20AR561)
IPAGE_SAVED_SEARCH_ID=206002 (SFF sorted by Ingram Demand)
QUEUE_DB_URL
R2_ENDPOINT, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_BUCKET_NAME, R2_PUBLIC_URL

Flow:

Login to iPage via old login form (avoids Okta redirect loop).
Request 2FA code via email; read Hermes inbox; submit code.
JS-navigate (not browser_navigate) to saved search results to preserve session.
Extract product rows: ISBN, title, author, publisher, binding, pages, price, weight.
Trigger stock popups and read per-DC availability (PA, TN, IN, OR).
For each title, fetch cover from hoverImage.jsp?ean=<ISBN>&size=640&howerType=Y using active session.
PUT cover to R2: covers/raw/<isbn>.jpg.
Insert row into queue DB with cover_url = '<R2_PUBLIC_URL>/covers/raw/<isbn>.jpg'.

Session survival rule: browser_navigate kills the Browserbase session. Navigation inside the page must use window.location.href = ... via browser_console.

Known iPage quirks:

Hover endpoint has a typo: howerType=Y (not hoverType).
Cover fetch blocked by CORS in browser context; Node fetch inside the scraper script works fine.
Zero on-hand with high on-order (e.g. ACOTAR #6, 199,999 on-order) is normal for not-yet-released titles.

Medusa import script

A standalone script in backend/src/scripts/import-ingram-queue.ts (or a scheduled job in src/jobs/). Connects to both databases.

Flow:

SELECT * FROM ingram_queue WHERE imported_at IS NULL LIMIT 25.
For each row:
- Create or update product in Medusa via createProductsWorkflow.
- Set manage_inventory: false (available to order, not on shelf).
- Tag procurement_type: special_order.
- Price = SRP (no markup for now).
- Pass the R2 cover URL directly; Medusa downloads and stores it in its own file module.
- Assign category using category_hint (fantasy / sci-fi / horror / etc.).
UPDATE ingram_queue SET imported_at = NOW() WHERE id = $id.
Log failures to failed_reason.

Cover handling:

Medusa’s file module fetches the image from R2 at import time.
No iPage session needed in the backend.
Fallback: if R2 URL is missing, try Open Library cover API (https://covers.openlibrary.org/b/isbn/<isbn>-L.jpg).

Storage (covers)

Option A: Railway Buckets

Fastest to set up inside the Railway project.
Egress costs accumulate if storefront serves images directly.
Fine for staging / early testing.

Option B: Cloudflare R2

Zero egress, global CDN.
Recommended for production.
Public bucket URL or custom domain (covers.dungeonbooks.com) via CNAME.

Bucket layout:

covers/
  raw/
    9781639739134.jpg
    9781250347374.jpg
    ...

Future: add /processed/ for web-optimized variants if needed.

Open questions

R2 bucket setup: bucket name will be dungeonbooks on Cloudflare R2; Panat to create and provide R2_ACCESS_KEY_ID / R2_SECRET_ACCESS_KEY
Railway Postgres for queue DB: spin up a new catalog-import DB (separate project or same Railway project, isolated from medusa / payload)
Medusa job scheduling: run import as a CLI script triggered manually, or a scheduled job inside the Medusa worker?
Category assignment heuristic: iPage BISAC categories are noisy. Do we map FIC009000 → “Fantasy” automatically, or maintain a manual mapping sheet?
Zero-stock titles: some top-sellers have 0 on-hand at all DCs but high on-order. Do we still list these as “available to order” with a longer ETA, or exclude them until on-hand > 0?
SRP pricing consistency: confirm with Carrie that SRP with no markup is the pricing policy for special-order titles.
Existing seed script: backend/src/scripts/seed-ingram-catalog.ts was drafted during a session and lives in the agent’s /tmp/emporium/ copy. It hardcodes 25 titles. Do we keep it as a one-off fallback, or replace it entirely with the queue-based pipeline?

Status

Draft — not yet approved. Replaces the Omnibus CSV path in extended-catalog-and-preorders for the SFF top-seller slice.

extended-catalog-and-preorders — parent plan, approved by Carrie 2026-04-29
emporium — store rebuild index
re-inventory-strategy — ISBN scan and catalog cleanup context
rebuild-spec — system diagram, Square sync, phased plan
bookstore-supply-chain — context on Ingram, Edelweiss, Bookshop.org

Quartz 4

Explorer

ingram-ipage-pipeline

Ingram iPage pipeline

Problem

Approach

Architecture

Queue DB schema

Scraper (browser automation)

Medusa import script

Storage (covers)

Open questions

Status

Graph View

Table of Contents

Backlinks

Quartz 4

Explorer

ingram-ipage-pipeline

Ingram iPage pipeline

Problem

Approach

Architecture

Queue DB schema

Scraper (browser automation)

Medusa import script

Storage (covers)

Open questions

Status

Related

Graph View

Table of Contents

Backlinks