Pync — CTO Technical Plan
Infrastructure, architecture, hiring, and risk assessment for 300 → 5,000 user scale
Confidential — Internal
Executive Summary
Pync is well-positioned technically for the 300-pilot to 5,000-user scale target over 6 months. The existing Spring Boot backend and React Native app are adequate for the pilot phase. The critical risk areas are:
- On-device ML fitting the Apollo3 memory budget with acceptable accuracy
- Transitioning from contractor-built backend to NYC-hired engineering team without quality regression
- Building fleet management and observability infrastructure that consumer hardware startups commonly underinvest in
This document outlines the 180-day technical plan, key architectural decisions, hiring sequence, budget implications, and open risks that require founder alignment.
Product Context
Assumptions driving technical decisions:
- Target hardware: Apollo3 Blue MCU collar + WiFi-connected dock. BLE between collar and phone/dock, MQTT from dock to backend.
- Pricing (confirmed from pitch deck 2026-04-20): 219 affiliate, 9.99/mo (15.99/mo ($169.99/yr). One-month free trial.
- Pilot: 350 units in Thailand. Expansion path beyond pilot not yet finalized. Note: Pync Inc. is NYC-registered and Chewy has completed US product evaluations, so US distribution is closer than initial plan assumed.
- Team structure: engineering hub NYC, APAC satellite office (Singapore or Bangkok), Thailand-based firmware and hardware collaboration.
- Revenue model: hardware margin + recurring subscription for health and activity insights.
Technical strategy flows from the above. Changes to these assumptions require re-evaluation of architecture choices.
Strategic Technical Decisions
Infrastructure Platform
Railway (Metal tier, Singapore region) for production hosting through the first 25,000 users. AWS migration deferred until compliance, fleet-scale device management, or specific managed services justify the operational cost.
Rationale: Railway offers 3-5x better unit economics at current scale, cleaner egress cost structure, and faster iteration. AWS becomes compelling only with specific needs Railway cannot meet (HIPAA compliance, AWS IoT Core fleet scale, or complex multi-region architecture).
Backend Strategy: Strangler Pattern, Not Rewrite
The inherited Spring Boot monolith remains in place and continues to serve production traffic through the transition. New functionality ships as Go services behind a clean API boundary. The monolith is carved down over 12-18 months as high-volume and high-change paths migrate.
Migration priority order:
- Device telemetry ingest (MQTT + batch uploads) — highest volume, cleanest boundary
- Device registration, pairing, and OTA orchestration — tightly coupled to hardware team
- Real-time notifications and WebSocket fanout — Go concurrency advantage
- ML inference serving (Python FastAPI) — natural boundary with ML pipeline
Remaining in Spring indefinitely: user accounts, billing integration, subscription management. These are stable domains where rewrite introduces risk without meaningful upside.
Data Layer
PostgreSQL (operational data) + TimescaleDB (time-series telemetry) co-located. Redis for caching and real-time state. S3 for raw IMU dumps, audio clips, firmware artifacts, and ML training data.
Retention policy: 7 days hot, 90 days warm with compression, pre-aggregated rollups indefinitely, raw data to S3 Parquet for ML retraining. This avoids unbounded database growth while preserving training data.
Device Connectivity
Collar speaks BLE to phone and dock only. Dock speaks MQTT over WiFi to backend (HiveMQ Cloud). Phone speaks HTTPS REST to backend with batch telemetry uploads. This hybrid architecture minimizes collar battery cost while providing near-continuous telemetry when dog is home (via dock).
Observability
Self-hosted LGTM stack (Loki, Grafana, Tempo, Mimir) on Railway for infrastructure observability, leveraging Railway’s private networking to eliminate egress costs. OpenTelemetry as instrumentation layer across all services. PostHog for product analytics, feature flags, and session replay. Memfault for firmware observability. Datadog credits applied surgically for synthetic monitoring and database monitoring only.
Payments and Billing
Stripe across three entities (Delaware for US, Thai entity for Thailand, Singapore entity for global). Paddle evaluated but not selected given Singapore entity can serve as merchant of record for international subscriptions at lower fee structure.
ML Architecture
Dual-system design: server-side PyTorch training on cloud GPU, on-device TFLM inference on Apollo3. Quantization pipeline with automated accuracy regression testing. Server-side inference available for premium subscription features that exceed on-device capability.
Critical dependency: the TCN + Transformer + Cross-Attention model must fit Apollo3’s ~664KB free flash budget after INT8 quantization while maintaining useful accuracy. This has not yet been validated in production. Product claims and roadmap timelines should be contingent on this validation.
Key Risk Register
| Risk | Severity | Mitigation |
|---|---|---|
| On-device ML model does not fit Apollo3 memory budget with useful accuracy | High | Validate end-to-end pipeline in first 60 days. If blocked, reduce v1 feature scope to motion-only classification. |
| Contractor handoff incomplete or low-quality code inherited | Medium-High | Documentation deliverables gated to final payment. Independent code audit pre-handoff. Network agency bridge for transition. |
| NYC senior engineering hiring slower than 6-month plan requires | Medium-High | Parallelize hiring pipeline from day one. Consider agency bridge for months 1-3. Accept some hires push to month 9-12. |
| BLE-only product positioning weak at $250 vs cellular competitors | Medium | Lean into health/behavior intelligence positioning, not GPS tracking. Subscription value justifies hardware price. |
| Subscription attach rate below 40% projection | Medium | Hard feature walls at paid tier. Extended trial with strong onboarding. Monitor attach rate weekly from launch. |
| Firmware OTA failures brick production devices | High | Dual-bank partition with tested rollback. Staged rollout with automatic halt on crash rate threshold. Memfault coredump analysis. |
| Regulatory exposure from audio capture and GPS location data | Medium | Process audio on-device, transmit features only. Privacy review before launch. Explicit consent flows. Data retention policy enforced in code. |
| Railway reliability or cost issues at scale | Low-Medium | OpenTelemetry instrumentation ensures portability. Migration path to AWS documented but not executed. |
Hiring Plan
First 12 months of senior engineering hires, NYC-based unless noted:
| Month | Role | Location | Base Comp (USD) |
|---|---|---|---|
| 2-3 | Senior Backend Engineer (Go) | NYC | $240k-280k |
| 3-4 | Senior Firmware Engineer (Embedded C) | NYC or Remote US | $230k-280k |
| 4-6 | Senior On-Device ML Engineer | NYC | $280k-340k |
| 5-7 | Senior Mobile Engineer (React Native) | NYC | $220k-270k |
| 7-9 | Backend Engineer (Mid-Senior) | NYC or APAC | $160k-220k |
| 9-12 | SRE / Platform Engineer | NYC | $220k-280k |
Fully loaded annual cost (base + equity + benefits + infrastructure) for engineering team of 6-7 senior hires: approximately 400-800k depending on team composition and local cost structure.
Critical hire order: backend engineer first to start strangler pattern execution. Firmware engineer is the hardest to hire and should be started in parallel. On-device ML engineer is both hard to hire and critical — consider contractor engagement (Edge Impulse Professional Services or similar) to unblock v1 while search continues.
Infrastructure Budget
Estimated monthly infrastructure cost through 5,000 users:
| Component | Estimated Monthly Cost | Notes |
|---|---|---|
| Railway compute (app + LGTM) | $600-1,200 | Scales with service count and request volume |
| TimescaleDB Cloud | $200-500 | Singapore region, managed Postgres + Timescale |
| HiveMQ Cloud (MQTT broker) | $300-700 | APAC region, pay per connection |
| S3 / R2 (object storage) | $100-400 | Firmware, backups, raw telemetry archive |
| Memfault (firmware observability) | $500-1,500 | Scales with device count |
| Clerk (authentication) | $100-300 | Scales with MAU |
| PostHog Cloud (analytics + flags) | $0-500 | Free tier covers early stage |
| Sentry or error tracking | $100-300 | Or consolidated into LGTM |
| Stripe fees | 2.9% + $0.30 per transaction | Pass-through, scales with revenue |
| CI/CD (GitHub Actions) | $200-500 | Scales with build volume |
| Misc SaaS (Linear, Notion, etc.) | $300-500 | Team tooling |
| TOTAL (excluding Stripe) | $2,400-6,400 | Approximately 1.30 per user per month at 5,000 users |
Datadog credits ($100k) applied against synthetic monitoring, database monitoring, and select Datadog-only features over 12-24 months. Observability is primarily self-hosted LGTM for cost reasons.
180-Day Execution Plan
Days 1-30: Stabilize and Observe
- Complete contractor handoff with documentation deliverables, credential rotation, and independent code audit
- Install foundational observability: error tracking, LGTM stack, OpenTelemetry instrumentation on existing backend
- Direct pilot user contact — all 300 users if possible — to validate product assumptions and surface technical issues not in metrics
- Validate Apollo3 memory and ML model fit via firmware team collaboration
- Begin senior backend engineer hiring search
Days 31-60: Foundation
- First senior backend engineer onboarded
- Architecture decision records (ADRs) written for all strategic decisions
- Staging environment stood up if not already present
- CI/CD pipelines established for backend and mobile
- Firmware signing key management verified and secured
- First strangler carve-out underway: MQTT telemetry ingest service in Go
Days 61-90: First Service in Production
- Go telemetry ingest service deployed to production behind feature flag, mirrored with existing Spring implementation
- TimescaleDB migration for telemetry tables complete
- Firmware observability via Memfault integrated and rolled out to pilot fleet
- Senior firmware engineer hiring in active pipeline
- On-device ML engineer search active; contractor engagement signed if search trailing
Days 91-120: Scale Preparation
- Telemetry ingest cutover complete; Spring telemetry path deprecated
- Load testing infrastructure built; system validated at 2x projected 5,000-user load
- Device provisioning and OTA orchestration service designed and in development
- Mobile app versioning policy enforced; OpenAPI contract testing in CI
- Hardware-in-the-loop test rig design complete; build underway
Days 121-150: Pre-Launch Hardening
- External security penetration test engaged
- Privacy and compliance review complete across data flows
- Subscription billing integration production-ready (Stripe)
- Backup and disaster recovery procedures tested
- Incident response playbook documented and rehearsed
Days 151-180: Scale Launch
- 5,000-user capacity validated end-to-end
- Paid subscriptions live with feature gating
- Fleet management dashboards operational for customer support and engineering
- SRE / platform engineer hiring underway
- APAC satellite office established if timeline allows
Decisions Requested From Founders
The following require founder alignment within the first 30 days to avoid blocking execution:
- Subscription tier definition. Specific feature split between free, 20 tiers. Attach rate projections are aspirational until this is defined.
- Target market sequence. Thailand → SEA → US, or Thailand → US directly? This scopes compliance and infrastructure regions.
- Subscription pricing model. Annual vs monthly, trial length, refund policy, churn projections.
- Audio capture scope. In or out for v1? Regulatory, privacy, and technical implications flow from this.
- Capital plan. 6-month runway at projected burn must be confirmed. If raise is imminent, technical diligence preparation begins now.
- Dock product specifications. Still pending. Blocking several architectural decisions including MQTT need and edge ML scope.
- Manufacturing and firmware release coordination. Hardware production timeline affects backend readiness requirements.
Closing
The technical path to 5,000 users in 6 months is achievable with the plan outlined above. The principal risks are execution (hiring pace, contractor handoff quality, on-device ML validation) rather than architectural. Railway scales adequately, Apollo3 is a reasonable silicon choice for this product category, and the strangler pattern approach contains inherited code risk.
The gap between the current pilot state and 5,000-user readiness is primarily observability, fleet management, and the on-device ML pipeline. None are fundamentally unsolved problems, but all require dedicated senior engineering attention and should not be deferred.
Founder alignment on the decisions listed above is the critical unblock. Everything else is execution.