iPage Catalog Scrape - Session Recovery

Date: 2026-05-25 Status: Paused - need gateway restart to pick up Browserbase config Next Step: Restart Hermes gateway, then relogin to iPage once, scrape SFF saved search top 25

Config Changes (NEED GATEWAY RESTART)

~/.hermes/config.yaml patched:

  1. Removed duplicate cloud_provider: browser-use (line 114) so browserbase actually wins
  2. Bumped browser.inactivity_timeout from 120s to 1800s (30min)
browser:
  inactivity_timeout: 1800
  command_timeout: 30
  cloud_provider: browserbase  # no longer overridden!

iPage Login

Major Discovery: Same-Origin Fetch for Covers

Previously hoverImage.jsp killed the session on navigation. New approach:

Working cover servlet:

https://ipage.ingramcontent.com/ipage/servlet/ibg.common.titledetail.imageloader?ean={ean}&size=640&howerType=Y
  • howerType=Y is REQUIRED (yes it’s misspelled “hower”)
  • size=640 for full-resolution cover
  • Can fetch via same-origin fetch() from browser console without killing session!
  • Returns ~230KB JPEG for a typical paperback
  • Page thumbnail (no params) is only ~124x187px — too small

Tested successfully on product page for 9781481497305:

let url = 'https://ipage.ingramcontent.com/ipage/servlet/ibg.common.titledetail.imageloader?ean=9781481497305&size=640&howerType=Y';
let r = await fetch(url, {credentials:'include'});
let blob = await r.blob(); // blob.size=230024 type=image/jpeg

Session-safe navigation pattern:

// GO somewhere
window.location.href = 'https://ipage.ingramcontent.com/ipage/product/search/savedSearches.action?action=select&searchId=206002';
 
// After 2-3 seconds, check readyState before scraping
// repeat: document.readyState === 'complete'

Test Product: 9781481497305 (The Tangled Lands)

Scraped successfully from product detail page:

FieldValue
TitleThe Tangled Lands (Reprint)
AuthorsBacigalupi, Paolo / Buckell, Tobias S
PublisherS&s/Saga Press
BindingPaperback
SRP$18.99
Pub DateNovember 20, 2018
ISBN-101481497308
ISBN-139781481497305
BISACsFiction | Dystopian ; Fiction | Fantasy | Epic
Dewey813.6
LC CallPS3602.A3447
LCCN2017011760
Height0.9”
Length8.9”
Width5.9”
Weight0.7 lbs
Pages304
Carton Qty40
ReturnableYes

Additional Information Scraper JS (Tested Working)

This recursively walks all tables on the product detail page and finds the Additional Information block:

function getText(node) {
  var txt = '';
  for (var i = 0; i < node.childNodes.length; i++) {
    var c = node.childNodes[i];
    if (c.nodeType === 3) txt += c.textContent;
    else if (c.nodeType === 1) txt += ' ' + getText(c);
  }
  return txt.trim();
}
var tables = document.querySelectorAll('table');
var info = '';
for (var i = 0; i < tables.length; i++) {
  var text = getText(tables[i]);
  if (text.indexOf('Physical Info') !== -1 || text.indexOf('BISAC Categories') !== -1 || (text.indexOf('Additional Information') !== -1 && text.length > 50)) {
    info = text;
  }
}
info;

Returns clean text like:

BISAC Categories: - Fiction | Dystopian - Fiction | Fantasy | Epic
LC Subjects: - Imaginary places - Magic - Environmental degradation - Dictators - Fantasy fiction
Dewey: 813.6
LC Call Number: PS3602.A3447
LCCN: 2017011760
Features: Price on Product
Physical Info: 0.9" H x 8.9" L x 5.9" W (0.7 lbs) 304 pages
Carton Quantity: 40
Number of Units in Package: 1

Saved Search: SFF

  • ID: 206002
  • Direct URL:
https://ipage.ingramcontent.com/ipage/product/search/savedSearches.action?action=select&searchId=206002
  • Description: Science Fiction & Fantasy sorted by Ingram Demand

Database

  • Host: zephyr.proxy.rlwy.net:41163
  • DB: railway
  • User: postgres
  • Table: ingram_queue
  • SSL: rejectUnauthorized: false (Railway proxy requirement)
  • Password in .env is redacted by Hermes; actual pw known from prior chat (ROTATE AFTER PROJECT)

Next Steps (Post-Restart)

  1. Restart Hermes gateway → Browserbase sessions should persist
  2. Log into iPage once (lunalin + 2FA)
  3. Navigate via JS to SFF saved search
  4. Scrape top 25 metadata + stock via in-page JS (extract product IDs grid)
  5. For each ISBN, open product detail via JS href
  6. Scrape Additional Information (weight/dims/pages/BISAC)
  7. Fetch 640px cover via same-origin fetch() → convert to base64 blob
  8. Upload covers to R2 (dungeonbooks bucket)
  9. Upsert all into ingram_queue with imported_at = NULL

Open Questions

  • Base64 blob size limit: 230KB JPEG = ~307KB base64. Console JSON output may truncate large strings. Need chunking strategy or direct R2 upload from browser console (CORS unknown on R2).
  • Batch efficiency: Opening 25 product detail pages sequentially via JS is slow (~5s each = 2min). Consider parallelizing by opening multiple tabs? Or see if Additional Info is available via hidden XHR endpoint per row.
  • Session cookie export still blocked: document.cookie is HttpOnly. Backend node-fetch can’t reuse the browser session. Covers must be downloaded inside the browser context or via a separate Camofox pass with cookie export.