iPage Catalog Scrape - Session Recovery
Date: 2026-05-25 Status: Paused - need gateway restart to pick up Browserbase config Next Step: Restart Hermes gateway, then relogin to iPage once, scrape SFF saved search top 25
Config Changes (NEED GATEWAY RESTART)
~/.hermes/config.yaml patched:
- Removed duplicate
cloud_provider: browser-use(line 114) sobrowserbaseactually wins - Bumped
browser.inactivity_timeoutfrom 120s to 1800s (30min)
browser:
inactivity_timeout: 1800
command_timeout: 30
cloud_provider: browserbase # no longer overridden!iPage Login
- Portal: https://ipage.ingramcontent.com
- User: lunalin / Account 20AR561
- 2FA: Email to luna@dungeonbooks.com (via himalaya CLI)
- Post-login: always use JS
window.location.href = '...'for navigation.browser_navigatekills the session.
Major Discovery: Same-Origin Fetch for Covers
Previously hoverImage.jsp killed the session on navigation. New approach:
Working cover servlet:
https://ipage.ingramcontent.com/ipage/servlet/ibg.common.titledetail.imageloader?ean={ean}&size=640&howerType=Y
howerType=Yis REQUIRED (yes it’s misspelled “hower”)size=640for full-resolution cover- Can fetch via same-origin
fetch()from browser console without killing session! - Returns ~230KB JPEG for a typical paperback
- Page thumbnail (no params) is only ~124x187px — too small
Tested successfully on product page for 9781481497305:
let url = 'https://ipage.ingramcontent.com/ipage/servlet/ibg.common.titledetail.imageloader?ean=9781481497305&size=640&howerType=Y';
let r = await fetch(url, {credentials:'include'});
let blob = await r.blob(); // blob.size=230024 type=image/jpegSession-safe navigation pattern:
// GO somewhere
window.location.href = 'https://ipage.ingramcontent.com/ipage/product/search/savedSearches.action?action=select&searchId=206002';
// After 2-3 seconds, check readyState before scraping
// repeat: document.readyState === 'complete'Test Product: 9781481497305 (The Tangled Lands)
Scraped successfully from product detail page:
| Field | Value |
|---|---|
| Title | The Tangled Lands (Reprint) |
| Authors | Bacigalupi, Paolo / Buckell, Tobias S |
| Publisher | S&s/Saga Press |
| Binding | Paperback |
| SRP | $18.99 |
| Pub Date | November 20, 2018 |
| ISBN-10 | 1481497308 |
| ISBN-13 | 9781481497305 |
| BISACs | Fiction | Dystopian ; Fiction | Fantasy | Epic |
| Dewey | 813.6 |
| LC Call | PS3602.A3447 |
| LCCN | 2017011760 |
| Height | 0.9” |
| Length | 8.9” |
| Width | 5.9” |
| Weight | 0.7 lbs |
| Pages | 304 |
| Carton Qty | 40 |
| Returnable | Yes |
Additional Information Scraper JS (Tested Working)
This recursively walks all tables on the product detail page and finds the Additional Information block:
function getText(node) {
var txt = '';
for (var i = 0; i < node.childNodes.length; i++) {
var c = node.childNodes[i];
if (c.nodeType === 3) txt += c.textContent;
else if (c.nodeType === 1) txt += ' ' + getText(c);
}
return txt.trim();
}
var tables = document.querySelectorAll('table');
var info = '';
for (var i = 0; i < tables.length; i++) {
var text = getText(tables[i]);
if (text.indexOf('Physical Info') !== -1 || text.indexOf('BISAC Categories') !== -1 || (text.indexOf('Additional Information') !== -1 && text.length > 50)) {
info = text;
}
}
info;Returns clean text like:
BISAC Categories: - Fiction | Dystopian - Fiction | Fantasy | Epic
LC Subjects: - Imaginary places - Magic - Environmental degradation - Dictators - Fantasy fiction
Dewey: 813.6
LC Call Number: PS3602.A3447
LCCN: 2017011760
Features: Price on Product
Physical Info: 0.9" H x 8.9" L x 5.9" W (0.7 lbs) 304 pages
Carton Quantity: 40
Number of Units in Package: 1
Saved Search: SFF
- ID:
206002 - Direct URL:
https://ipage.ingramcontent.com/ipage/product/search/savedSearches.action?action=select&searchId=206002
- Description: Science Fiction & Fantasy sorted by Ingram Demand
Database
- Host:
zephyr.proxy.rlwy.net:41163 - DB:
railway - User:
postgres - Table:
ingram_queue - SSL:
rejectUnauthorized: false(Railway proxy requirement) - Password in .env is redacted by Hermes; actual pw known from prior chat (ROTATE AFTER PROJECT)
Next Steps (Post-Restart)
- Restart Hermes gateway → Browserbase sessions should persist
- Log into iPage once (lunalin + 2FA)
- Navigate via JS to SFF saved search
- Scrape top 25 metadata + stock via in-page JS (extract product IDs grid)
- For each ISBN, open product detail via JS href
- Scrape Additional Information (weight/dims/pages/BISAC)
- Fetch 640px cover via same-origin
fetch()→ convert to base64 blob - Upload covers to R2 (
dungeonbooksbucket) - Upsert all into
ingram_queuewithimported_at = NULL
Open Questions
- Base64 blob size limit: 230KB JPEG = ~307KB base64. Console JSON output may truncate large strings. Need chunking strategy or direct R2 upload from browser console (CORS unknown on R2).
- Batch efficiency: Opening 25 product detail pages sequentially via JS is slow (~5s each = 2min). Consider parallelizing by opening multiple tabs? Or see if Additional Info is available via hidden XHR endpoint per row.
- Session cookie export still blocked:
document.cookieis HttpOnly. Backendnode-fetchcan’t reuse the browser session. Covers must be downloaded inside the browser context or via a separate Camofox pass with cookie export.