Pync — CTO Technical Plan

Infrastructure, architecture, hiring, and risk assessment for 300 → 5,000 user scale

Confidential — Internal

Executive Summary

Pync is well-positioned technically for the 300-pilot to 5,000-user scale target over 6 months. The existing Spring Boot backend and React Native app are adequate for the pilot phase. The critical risk areas are:

On-device ML fitting the Apollo3 memory budget with acceptable accuracy
Transitioning from contractor-built backend to NYC-hired engineering team without quality regression
Building fleet management and observability infrastructure that consumer hardware startups commonly underinvest in

This document outlines the 180-day technical plan, key architectural decisions, hiring sequence, budget implications, and open risks that require founder alignment.

Product Context

Assumptions driving technical decisions:

Target hardware: Apollo3 Blue MCU collar + WiFi-connected dock. BLE between collar and phone/dock, MQTT from dock to backend.
Pricing (confirmed from pitch deck 2026-04-20): $249 re t ai l ha r d w a re ($ 219 affiliate, $189 w h o l es a l e) . S u b scr i pt i o n t i ers : He ll o P y n c$ 9.99/mo ( $109.89/ yr), Cl oser P y n c$ 15.99/mo ($169.99/yr). One-month free trial.
Pilot: 350 units in Thailand. Expansion path beyond pilot not yet finalized. Note: Pync Inc. is NYC-registered and Chewy has completed US product evaluations, so US distribution is closer than initial plan assumed.
Team structure: engineering hub NYC, APAC satellite office (Singapore or Bangkok), Thailand-based firmware and hardware collaboration.
Revenue model: hardware margin + recurring subscription for health and activity insights.

Technical strategy flows from the above. Changes to these assumptions require re-evaluation of architecture choices.

Strategic Technical Decisions

Infrastructure Platform

Railway (Metal tier, Singapore region) for production hosting through the first 25,000 users. AWS migration deferred until compliance, fleet-scale device management, or specific managed services justify the operational cost.

Rationale: Railway offers 3-5x better unit economics at current scale, cleaner egress cost structure, and faster iteration. AWS becomes compelling only with specific needs Railway cannot meet (HIPAA compliance, AWS IoT Core fleet scale, or complex multi-region architecture).

Backend Strategy: Strangler Pattern, Not Rewrite

The inherited Spring Boot monolith remains in place and continues to serve production traffic through the transition. New functionality ships as Go services behind a clean API boundary. The monolith is carved down over 12-18 months as high-volume and high-change paths migrate.

Migration priority order:

Device telemetry ingest (MQTT + batch uploads) — highest volume, cleanest boundary
Device registration, pairing, and OTA orchestration — tightly coupled to hardware team
Real-time notifications and WebSocket fanout — Go concurrency advantage
ML inference serving (Python FastAPI) — natural boundary with ML pipeline

Remaining in Spring indefinitely: user accounts, billing integration, subscription management. These are stable domains where rewrite introduces risk without meaningful upside.

Data Layer

PostgreSQL (operational data) + TimescaleDB (time-series telemetry) co-located. Redis for caching and real-time state. S3 for raw IMU dumps, audio clips, firmware artifacts, and ML training data.

Retention policy: 7 days hot, 90 days warm with compression, pre-aggregated rollups indefinitely, raw data to S3 Parquet for ML retraining. This avoids unbounded database growth while preserving training data.

Device Connectivity

Collar speaks BLE to phone and dock only. Dock speaks MQTT over WiFi to backend (HiveMQ Cloud). Phone speaks HTTPS REST to backend with batch telemetry uploads. This hybrid architecture minimizes collar battery cost while providing near-continuous telemetry when dog is home (via dock).

Observability

Self-hosted LGTM stack (Loki, Grafana, Tempo, Mimir) on Railway for infrastructure observability, leveraging Railway’s private networking to eliminate egress costs. OpenTelemetry as instrumentation layer across all services. PostHog for product analytics, feature flags, and session replay. Memfault for firmware observability. Datadog credits applied surgically for synthetic monitoring and database monitoring only.

Payments and Billing

Stripe across three entities (Delaware for US, Thai entity for Thailand, Singapore entity for global). Paddle evaluated but not selected given Singapore entity can serve as merchant of record for international subscriptions at lower fee structure.

ML Architecture

Dual-system design: server-side PyTorch training on cloud GPU, on-device TFLM inference on Apollo3. Quantization pipeline with automated accuracy regression testing. Server-side inference available for premium subscription features that exceed on-device capability.

Critical dependency: the TCN + Transformer + Cross-Attention model must fit Apollo3’s ~664KB free flash budget after INT8 quantization while maintaining useful accuracy. This has not yet been validated in production. Product claims and roadmap timelines should be contingent on this validation.

Key Risk Register

Risk	Severity	Mitigation
On-device ML model does not fit Apollo3 memory budget with useful accuracy	High	Validate end-to-end pipeline in first 60 days. If blocked, reduce v1 feature scope to motion-only classification.
Contractor handoff incomplete or low-quality code inherited	Medium-High	Documentation deliverables gated to final payment. Independent code audit pre-handoff. Network agency bridge for transition.
NYC senior engineering hiring slower than 6-month plan requires	Medium-High	Parallelize hiring pipeline from day one. Consider agency bridge for months 1-3. Accept some hires push to month 9-12.
BLE-only product positioning weak at $250 vs cellular competitors	Medium	Lean into health/behavior intelligence positioning, not GPS tracking. Subscription value justifies hardware price.
Subscription attach rate below 40% projection	Medium	Hard feature walls at paid tier. Extended trial with strong onboarding. Monitor attach rate weekly from launch.
Firmware OTA failures brick production devices	High	Dual-bank partition with tested rollback. Staged rollout with automatic halt on crash rate threshold. Memfault coredump analysis.
Regulatory exposure from audio capture and GPS location data	Medium	Process audio on-device, transmit features only. Privacy review before launch. Explicit consent flows. Data retention policy enforced in code.
Railway reliability or cost issues at scale	Low-Medium	OpenTelemetry instrumentation ensures portability. Migration path to AWS documented but not executed.

Hiring Plan

First 12 months of senior engineering hires, NYC-based unless noted:

Month	Role	Location	Base Comp (USD)
2-3	Senior Backend Engineer (Go)	NYC	$240k-280k
3-4	Senior Firmware Engineer (Embedded C)	NYC or Remote US	$230k-280k
4-6	Senior On-Device ML Engineer	NYC	$280k-340k
5-7	Senior Mobile Engineer (React Native)	NYC	$220k-270k
7-9	Backend Engineer (Mid-Senior)	NYC or APAC	$160k-220k
9-12	SRE / Platform Engineer	NYC	$220k-280k

Fully loaded annual cost (base + equity + benefits + infrastructure) for engineering team of 6-7 senior hires: approximately $2.5 - 3.5 M . A P A C s a t e ll i t eo ff i ce a dd s ana dd i t i o na l es t ima t e d$ 400-800k depending on team composition and local cost structure.

Critical hire order: backend engineer first to start strangler pattern execution. Firmware engineer is the hardest to hire and should be started in parallel. On-device ML engineer is both hard to hire and critical — consider contractor engagement (Edge Impulse Professional Services or similar) to unblock v1 while search continues.

Infrastructure Budget

Estimated monthly infrastructure cost through 5,000 users:

Component	Estimated Monthly Cost	Notes
Railway compute (app + LGTM)	$600-1,200	Scales with service count and request volume
TimescaleDB Cloud	$200-500	Singapore region, managed Postgres + Timescale
HiveMQ Cloud (MQTT broker)	$300-700	APAC region, pay per connection
S3 / R2 (object storage)	$100-400	Firmware, backups, raw telemetry archive
Memfault (firmware observability)	$500-1,500	Scales with device count
Clerk (authentication)	$100-300	Scales with MAU
PostHog Cloud (analytics + flags)	$0-500	Free tier covers early stage
Sentry or error tracking	$100-300	Or consolidated into LGTM
Stripe fees	2.9% + $0.30 per transaction	Pass-through, scales with revenue
CI/CD (GitHub Actions)	$200-500	Scales with build volume
Misc SaaS (Linear, Notion, etc.)	$300-500	Team tooling
TOTAL (excluding Stripe)	$2,400-6,400	Approximately $0.50 -$ 1.30 per user per month at 5,000 users

Datadog credits ($100k) applied against synthetic monitoring, database monitoring, and select Datadog-only features over 12-24 months. Observability is primarily self-hosted LGTM for cost reasons.

180-Day Execution Plan

Days 1-30: Stabilize and Observe

Complete contractor handoff with documentation deliverables, credential rotation, and independent code audit
Install foundational observability: error tracking, LGTM stack, OpenTelemetry instrumentation on existing backend
Direct pilot user contact — all 300 users if possible — to validate product assumptions and surface technical issues not in metrics
Validate Apollo3 memory and ML model fit via firmware team collaboration
Begin senior backend engineer hiring search

Days 31-60: Foundation

First senior backend engineer onboarded
Architecture decision records (ADRs) written for all strategic decisions
Staging environment stood up if not already present
CI/CD pipelines established for backend and mobile
Firmware signing key management verified and secured
First strangler carve-out underway: MQTT telemetry ingest service in Go

Days 61-90: First Service in Production

Go telemetry ingest service deployed to production behind feature flag, mirrored with existing Spring implementation
TimescaleDB migration for telemetry tables complete
Firmware observability via Memfault integrated and rolled out to pilot fleet
Senior firmware engineer hiring in active pipeline
On-device ML engineer search active; contractor engagement signed if search trailing

Days 91-120: Scale Preparation

Telemetry ingest cutover complete; Spring telemetry path deprecated
Load testing infrastructure built; system validated at 2x projected 5,000-user load
Device provisioning and OTA orchestration service designed and in development
Mobile app versioning policy enforced; OpenAPI contract testing in CI
Hardware-in-the-loop test rig design complete; build underway

Days 121-150: Pre-Launch Hardening

External security penetration test engaged
Privacy and compliance review complete across data flows
Subscription billing integration production-ready (Stripe)
Backup and disaster recovery procedures tested
Incident response playbook documented and rehearsed

Days 151-180: Scale Launch

5,000-user capacity validated end-to-end
Paid subscriptions live with feature gating
Fleet management dashboards operational for customer support and engineering
SRE / platform engineer hiring underway
APAC satellite office established if timeline allows

Decisions Requested From Founders

The following require founder alignment within the first 30 days to avoid blocking execution:

Subscription tier definition. Specific feature split between free, $15, an d$ 20 tiers. Attach rate projections are aspirational until this is defined.
Target market sequence. Thailand → SEA → US, or Thailand → US directly? This scopes compliance and infrastructure regions.
Subscription pricing model. Annual vs monthly, trial length, refund policy, churn projections.
Audio capture scope. In or out for v1? Regulatory, privacy, and technical implications flow from this.
Capital plan. 6-month runway at projected burn must be confirmed. If raise is imminent, technical diligence preparation begins now.
Dock product specifications. Still pending. Blocking several architectural decisions including MQTT need and edge ML scope.
Manufacturing and firmware release coordination. Hardware production timeline affects backend readiness requirements.

Closing

The technical path to 5,000 users in 6 months is achievable with the plan outlined above. The principal risks are execution (hiring pace, contractor handoff quality, on-device ML validation) rather than architectural. Railway scales adequately, Apollo3 is a reasonable silicon choice for this product category, and the strangler pattern approach contains inherited code risk.

The gap between the current pilot state and 5,000-user readiness is primarily observability, fleet management, and the on-device ML pipeline. None are fundamentally unsolved problems, but all require dedicated senior engineering attention and should not be deferred.

Founder alignment on the decisions listed above is the critical unblock. Everything else is execution.

Quartz 4

Explorer

pync-founder-technical-plan