Ingram iPage pipeline
Automated pipeline to populate Emporium’s available-to-order catalog from Ingram iPage, replacing the manual CSV/Omnibus import proposed in extended-catalog-and-preorders
Problem
The extended-catalog-and-preorders plan assumed a weekly CSV export from Edelweiss Omnibus → Medusa CLI import. In practice, Omnibus is slow to browse and its CSV export is inconsistent. Ingram iPage has real-time stock data, a saved-search feature for SFF top sellers, and better availability signals. We can get richer metadata (per-DC stock, binding, page count, exact SRP) and cover images without waiting for Bookshop.org or Omnibus APIs.
The risk is that iPage is a legacy web app. Scraping it means browser automation, 2FA via email (Okta), and a hover-image endpoint that requires an authenticated session. We need a design that keeps customer data completely separate from the scraping process.
Approach
Three layers, zero trust between the scraper and the customer database.
Architecture
[Ingram iPage]
|
v
[Scraper] -- authenticated browser session (Browserbase)
|
v
[Queue DB] -- separate Railway Postgres, no customer data
|
v
[R2 / Railway Buckets] -- cover images, S3-compatible
|
v
[Medusa Import Script] -- runs inside Emporium backend
|
v
[Medusa DB] -- customer orders, payments, addresses
Credential separation:
- Scraper gets write-only access to the queue DB and R2 bucket.
- Medusa gets read-only access to the queue DB.
- Medusa DB credentials never leave the backend.
- If the scraper session or credentials are compromised, the attacker sees unimported book titles and cover JPGs. No customer data.
Queue DB schema
Database: catalog-import (separate from medusa and payload)
Table: ingram_queue
CREATE TABLE ingram_queue (
id SERIAL PRIMARY KEY,
isbn TEXT NOT NULL,
title TEXT NOT NULL,
author TEXT,
publisher TEXT,
publication_date TEXT,
binding TEXT,
page_count INT,
srp_price NUMERIC(10,2), -- list price in cents or decimal
weight NUMERIC(10,2), -- shipping calc weight in lbs or kg
cover_url TEXT, -- public R2 URL (e.g. covers.dungeonbooks.com)
category_hint TEXT, -- sci-fi, fantasy, horror, etc.
stock_json JSONB, -- per-DC on-hand / on-order snapshot
imported_at TIMESTAMP,
failed_reason TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE UNIQUE INDEX idx_queue_isbn_unimported
ON ingram_queue(isbn)
WHERE imported_at IS NULL;Scraper (browser automation)
Runs weekly or on-demand via Hermes browser tools (Browserbase cloud). Not committed to the Emporium repo yet; lives in the agent context until we stabilize it.
Inputs:
IPAGE_USERNAME=lunalinIPAGE_PASSWORD(agent secret)IPAGE_ACCOUNT=SCRIPT WIZARDS LLC (20AR561)IPAGE_SAVED_SEARCH_ID=206002(SFF sorted by Ingram Demand)QUEUE_DB_URLR2_ENDPOINT,R2_ACCESS_KEY_ID,R2_SECRET_ACCESS_KEY,R2_BUCKET_NAME,R2_PUBLIC_URL
Flow:
- Login to iPage via old login form (avoids Okta redirect loop).
- Request 2FA code via email; read Hermes inbox; submit code.
- JS-navigate (not
browser_navigate) to saved search results to preserve session. - Extract product rows: ISBN, title, author, publisher, binding, pages, price, weight.
- Trigger stock popups and read per-DC availability (PA, TN, IN, OR).
- For each title, fetch cover from
hoverImage.jsp?ean=<ISBN>&size=640&howerType=Yusing active session. PUTcover to R2:covers/raw/<isbn>.jpg.- Insert row into queue DB with
cover_url = '<R2_PUBLIC_URL>/covers/raw/<isbn>.jpg'.
Session survival rule: browser_navigate kills the Browserbase session. Navigation inside the page must use window.location.href = ... via browser_console.
Known iPage quirks:
- Hover endpoint has a typo:
howerType=Y(nothoverType). - Cover fetch blocked by CORS in browser context; Node
fetchinside the scraper script works fine. - Zero on-hand with high on-order (e.g. ACOTAR #6, 199,999 on-order) is normal for not-yet-released titles.
Medusa import script
A standalone script in backend/src/scripts/import-ingram-queue.ts (or a scheduled job in src/jobs/). Connects to both databases.
Flow:
SELECT * FROM ingram_queue WHERE imported_at IS NULL LIMIT 25.- For each row:
- Create or update product in Medusa via
createProductsWorkflow. - Set
manage_inventory: false(available to order, not on shelf). - Tag
procurement_type: special_order. - Price = SRP (no markup for now).
- Pass the R2 cover URL directly; Medusa downloads and stores it in its own file module.
- Assign category using
category_hint(fantasy / sci-fi / horror / etc.).
- Create or update product in Medusa via
UPDATE ingram_queue SET imported_at = NOW() WHERE id = $id.- Log failures to
failed_reason.
Cover handling:
- Medusa’s file module fetches the image from R2 at import time.
- No iPage session needed in the backend.
- Fallback: if R2 URL is missing, try Open Library cover API (
https://covers.openlibrary.org/b/isbn/<isbn>-L.jpg).
Storage (covers)
Option A: Railway Buckets
- Fastest to set up inside the Railway project.
- Egress costs accumulate if storefront serves images directly.
- Fine for staging / early testing.
Option B: Cloudflare R2
- Zero egress, global CDN.
- Recommended for production.
- Public bucket URL or custom domain (
covers.dungeonbooks.com) via CNAME.
Bucket layout:
covers/
raw/
9781639739134.jpg
9781250347374.jpg
...
Future: add /processed/ for web-optimized variants if needed.
Open questions
- R2 bucket setup: bucket name will be
dungeonbookson Cloudflare R2; Panat to create and provideR2_ACCESS_KEY_ID/R2_SECRET_ACCESS_KEY - Railway Postgres for queue DB: spin up a new
catalog-importDB (separate project or same Railway project, isolated frommedusa/payload) - Medusa job scheduling: run import as a CLI script triggered manually, or a scheduled job inside the Medusa worker?
- Category assignment heuristic: iPage BISAC categories are noisy. Do we map
FIC009000→ “Fantasy” automatically, or maintain a manual mapping sheet? - Zero-stock titles: some top-sellers have 0 on-hand at all DCs but high on-order. Do we still list these as “available to order” with a longer ETA, or exclude them until on-hand > 0?
- SRP pricing consistency: confirm with Carrie that SRP with no markup is the pricing policy for special-order titles.
- Existing seed script:
backend/src/scripts/seed-ingram-catalog.tswas drafted during a session and lives in the agent’s/tmp/emporium/copy. It hardcodes 25 titles. Do we keep it as a one-off fallback, or replace it entirely with the queue-based pipeline?
Status
Draft — not yet approved. Replaces the Omnibus CSV path in extended-catalog-and-preorders for the SFF top-seller slice.
Related
- extended-catalog-and-preorders — parent plan, approved by Carrie 2026-04-29
- emporium — store rebuild index
- re-inventory-strategy — ISBN scan and catalog cleanup context
- rebuild-spec — system diagram, Square sync, phased plan
- bookstore-supply-chain — context on Ingram, Edelweiss, Bookshop.org