Big day on catalog. Stack of four catalog PRs cleared, plus an emporium PR for BISAC routing and a new Recategorize admin button. Live-ran the 100-book ingest twice (16.9s then 15.3s after the refactor). Discovered an architectural unlock midway through that simplified the whole pipeline. Also burned a couple hours on an availability-filter experiment that turned out to be the wrong direction; reverted cleanly.
What we shipped
-
#5 catalog: BISAC parsing fix. iPage’s “INGRAM Categories” panel (Topical, Sex & Gender, Ethnic Orientation) was getting folded into the
bisaccolumn because_extract_fielddidn’t stop at that header. Last BISAC entry got “INGRAM Categories:” glued on, and subsequent Ingram facets were treated as additional BISAC paths. One-line fix innext_headers. Also dropped the dead syncfetch_product_detailleft over from the async refactor. -
#6 catalog: skip cover fetch+upload when cover_url already in queue DB. Dumb version of the cover optimization. Zero new state, zero schema change. Trade-off: late cover reveals (ACOTAR 6 placeholder→real art) won’t be picked up until we add cover-change detection.
--refetch-coversis the escape hatch. -
#7 catalog: ean_id pipeline +
enrichsubcommand. The real architectural win. iPage’sproductdetail?ean_id=<isbn>returns the same parsed page as the session-boundqueryString=<lastSearch>&R=<productId>&dNo=0form. This lets us drop the lastSearch token plumbing entirely.parse_sff_gridis gone (~70 lines). Discovery now goes throughdownload_csv(the CSV tool we already had). Newjust enrich <isbn>+command for direct re-enrichment — closes the loop on the BISAC retag without needing a full SFF scrape. -
#8 catalog: rename refactor.
packages/scraper/→packages/ipage/,ipage_extract.py→pipeline.py,set_cookie.py→cookie.py,scrape/rescrapeCLI verbs →ingest/enrich, droppedSAVED_SEARCH_IDconstant. The package name now telegraphs that this is the iPage adapter (with room for sibling source packages later).ingest+enrichpair reads cleanly. -
emporium #36: BISAC importer fixes. Per-path classification with top-level discrimination —
Fiction | Alternative Historycan no longer leak the word “history” into the Non-Fiction rule. Added Romance + Young Adult + Magical Realism categories with the right routing rules. Added the BISG canonical-top-levels whitelist as defense against scraper noise. Built the Recategorize admin button (with Preview) that runscategoryNamesForagainst all imported books and applies the diff viabatchLinkProductsToCategoryWorkflow.
What we learned
-
productdetail?ean_id=<isbn>is the cleanest iPage URL form. No session-boundlastSearchtoken required. Stable URL, scriptable, doesn’t depend on a grid prime. The grid checkbox’sproductIdIS thettl_iddirectly (single 8-digit ID, no dash-splitting needed). This is what made the whole rename + restructure possible — discovery and enrichment can now be cleanly separated. -
The SFF availability filter is opaque and over-strict. Carrie set up the saved search to filter to “available and in-stock from my warehouse.” Pulled the filtered top 100, found 16 books were excluded vs the queue. After classification + cross-checking with live stock fetches: 8 were preorders (legitimate exclusion), 4 were OOS-but-on-order (debatable), and 4 had positive on-hand somewhere — Between Two Fires had 19 at PA primary + 410 network-wide + 3075 on order, definitely a healthy stock position, but the filter excluded it anyway. Conclusion: filter rule is something like “min N at primary DC” but the threshold isn’t visible to us. Deleted 15 from Medusa thinking we should trust the filter. Restored all 16 after Carrie realized the filter was a bad idea. Right model is: use SFF top-N unfiltered for discovery, track real inventory in Medusa, let storefront UI handle availability presentation.
-
Magical Realism deserves its own category. BISAC code FIC061000 spans two completely different reading experiences — Latin American literary tradition (García Márquez, Allende) and cozy/contemporary fantasy with magical undercurrents (T. Kingfisher’s A Wizard’s Guide to Defensive Baking). Routing it to Literary Fiction or Fantasy alone mis-shelves half the books. New “Magical Realism” category in store + dedicated routing rule. Babel ends up in Literary Fiction + Magical Realism + Fantasy via three separate BISAC paths, which is correct.
-
Literary Fiction needs a narrow rule. Old rule matched “Literary”, “Classics”, “Coming of Age”, “Absurdist.” False-flagged Fahrenheit 451, Princess Bride, Lost Book of Lancelot. Tightened to
/Literary/only. Classics is a flag (Brave New World is SF, Sherlock Holmes is mystery); Coming of Age is a story shape (Throne of Glass is coming-of-age too); neither is a literary commitment. -
updateProductsWorkflowin Medusa v2 doesn’t acceptcategory_idson update. Only on create. The right API for mutating product-category links isbatchLinkProductsToCategoryWorkflow— category-centric: per category, lists of products to add/remove. First retag attempt failed for all 48 books with “Entity ‘Product’ does not have property ‘category_ids’“. Pivoted the implementation to batch per-category diffs. Also surfaces amissingCategoriesfield in the toast so the operator knows which rule-emitted names don’t yet exist in the live store and need to be added in admin first. -
gt syncshould be run from the live branch, not main. When the stack’s bottom merges and yougt syncfrom main, it deletes the merged branches but doesn’t re-parent live descendants — they end up orphaned pointing at dead parents. From the live branch, sync auto-restacks onto trunk and drops the merged-into-trunk commits. -
Medusa locations are physical+address-bound, no “virtual” concept. Per the docs. Each location requires a real address. One sales channel → one fulfillment location for orders placed through it. No automatic multi-location routing. For our drop-ship arrangement with Ingram: single location named “Ingram (Drop-Ship)” with Ingram’s actual La Vergne TN address is the honest model. Don’t mirror the 4-DC structure — those are Ingram’s internal routing, not anything we operate.
What’s next
Wrote five plans late in the day, in roughly increasing strategic scope:
- medusa-inventory — single Ingram drop-ship stock location, sync-stock command, hourly cadence. Implementation plan for fixing “what’s actually orderable” in the storefront.
- ipage-as-api — the realization that
packages/ipage/is now an undocumented Ingram API, and four exposure surfaces (Discord bot → MCP server → skill → HTTP API), with a build order. - catalog-cron-jobs — what’s worth running on a schedule (heartbeat, hourly stock sync, weekly ingest+diff, daily pre-order release-day scan, weekly Hardcover refresh) vs on demand. Plus the operational scaffolding (observability via Discord, idempotency, where they run).
- catalog-followon-ideas — the broader brainstorm: weekly automated ingest, recommendations engine, Bookshop.org and Amazon BookTok scrapes, going-out-of-print early warning, back-office agent, white-label opportunity. Tactical follow-on list with priority order.
- sff-destination-strategy — the why behind everything else. What it means to be the one-stop SF/F destination, why Goodreads is dead (and Hardcover is the answer), discovery features Amazon can’t compete on (subgenre taxonomy, series management, curator picks, vibe search), buying features Amazon won’t do (special editions, signings, subscription boxes). Frames the gap between existing plans and the vision. The doc to hand a new collaborator.
Open threads:
- #8 rename refactor — merged.
- Indie Vault — Carrie to grab the URL from iPage’s Hot List widget; potentially a better discovery source than SFF top-N for an indie store. Reserved-inventory program Ingram curates for indies.
- SFF filter decision — drop the in-stock filter entirely on saved search 206002, let Medusa inventory handle availability. Validated this morning by the 16-book mass-deletion-then-restore experiment.
- Cover-change detection — captured in catalog-followon-ideas, only urgent if placeholder→final-art rotation surfaces as a visible storefront problem.
Numbers from today’s runs:
- 100-book live ingest pre-refactor: 16.9s, all 100 enriched, 100 covers uploaded
- 100-book live ingest post-refactor (CSV discovery + ean_id enrichment): 15.3s, all 100 enriched, 100% cover reuse via skip-existing
- 4000 books found in SFF saved-search CSV cap (vs 500 with grid pageSize); iPage’s download servlet caps at 4000 rows
- 44 books in Epic Fantasy curated list, 17 in BookTok-May
- Cleaned BISAC for ~77 books in queue, of which 48 actually changed category memberships after the importer rules tightened