Pync — CTO Working Document

Architecture, decisions, hiring, operations — personal reference, updated continuously


Purpose

Personal working document. Detailed architecture notes, decision rationale, open questions, hiring pipeline, risk register, and operational tracking. Founder-facing document is separate and summarizes outward. This document is for tracking the real state and reasoning.

Update weekly during work trial. Version in Git or Notion once team is hired.


Current State Assessment

Inherited Stack Inventory

Frontend (React Native / Expo):

  • React Native 0.81.0, React 19.1.0, TypeScript 5.8.3
  • Context API state, React Navigation 7.x, Axios 1.12.2
  • AsyncStorage for local state — SECURITY ISSUE: auth tokens should not be in AsyncStorage, migrate to Keychain / SecureStore
  • React Native Maps 1.25.3, Vision Camera 4.7.2, Lottie 7.3.2
  • i18next with Chinese, English, Korean, Thai, Japanese support
  • React Native Chart Kit — consider replacement with Victory Native or Recharts Native as product matures

Backend (Java / Spring Boot):

  • Java 17, Spring Boot 3.5.6, Undertow (version string 3.5.6 is suspect — likely copy-paste error, verify actual version)
  • Spring Data JDBC + MyBatis-Plus — two ORMs, code smell, contain and gradually eliminate
  • Redis + Redisson for caching
  • MQTT integration at 3.5.6 (Spring Boot version pattern — actual MQTT client library version unclear)
  • Sa-Token + JWT for authentication — Chinese-origin, sparse English docs, hard to hire for, replace with Clerk
  • OpenAPI 3 documentation — verify accuracy and completeness
  • MinIO + AWS S3 both listed — indecision, pick one, recommend S3
  • Netty 4.2.6, Protobuf 4.32.1, FastJSON2 2.0.59 (historical CVE concerns in 1.x, 2.x is improved but monitor)
  • Lombok, Hutool — Java ecosystem standard

AI / ML:

  • Python 3.10, PyTorch 2.8.0, Ubuntu 22.04, Miniconda, Docker
  • TCN + Transformer + CNN + Cross-Attention for motion + audio classification
  • No mention of quantization pipeline, on-device runtime, or deployment mechanism
  • Unclear whether current model has been compiled and tested on Apollo3 target

Hardware:

  • Ambiq Apollo3 Blue — ARM Cortex-M4F, 96 MHz, 2MB flash, 768KB RAM, BLE 5, 15-channel ADC
  • Partition: Bootloader 48KB, APP 648KB, OTA 648KB (dual-bank for rollback), OTAFlag 8KB, Free 664KB (for model), FileSystem 32KB
  • RAM: BLE 96KB, APP 212KB, Drive 148KB, Send cache 312KB (reducible to free RAM for model inference)
  • Current firmware size ~468KB; model fits in 664KB free after quantization
  • Dock spec pending — critical dependency

Known Issues to File

  1. AsyncStorage auth token storage — migrate to react-native-keychain / Expo SecureStore
  2. Two ORMs in Spring backend — incremental cleanup, new code uses single pattern
  3. MinIO + S3 duality — pick one, remove the other
  4. Sa-Token hiring constraint — plan migration to Clerk over 6-12 months
  5. Unverified Undertow version — audit actual dependency tree
  6. No observability installed — highest priority to address in first 30 days
  7. No firmware crash reporting — Memfault integration priority
  8. No HIL test infrastructure — plan for month 3-4
  9. No documented backup or DR procedures — establish before 5k user scale

Target Architecture

Service Topology at 5,000 Users

Spring Boot monolith handles: user accounts, authentication (Clerk JWT validation), subscription management (Stripe webhook handler), household/pet/device CRUD, admin tools.

Go services (new, built during strangler):

  • Telemetry Ingest — MQTT subscriber, HTTPS batch upload receiver, writes to TimescaleDB, publishes to internal pubsub
  • Device Registry + OTA Orchestrator — provisioning, pairing validation, OTA campaign management, Memfault integration
  • Notification Fanout — consumes events, produces push notifications (APNs, FCM) and email (Postmark or Resend)

Python services:

  • ML Training Pipeline — PyTorch training, model versioning, quantization, artifact production
  • ML Inference Service — FastAPI, server-side inference for premium features, drift detection

Shared infrastructure:

  • PostgreSQL + TimescaleDB (Timescale Cloud Singapore)
  • Redis (Railway managed or Upstash)
  • HiveMQ Cloud (APAC region)
  • S3 (ap-southeast-1) with replication to us-east-1 for training data
  • LGTM stack on Railway (Loki, Grafana, Tempo, Mimir)

Data Flow

Collar → BLE → Phone/Dock → (HTTPS batch for phone, MQTT persistent for dock) → Backend → TimescaleDB + S3 archive

Inbound telemetry pipeline:

  1. Collar buffers 100Hz IMU locally, runs on-device ML, emits activity events + raw IMU snapshots
  2. Phone/Dock receives via BLE, buffers until backend reachable
  3. Phone uploads via HTTPS POST /v1/devices/{id}/telemetry/batch; Dock publishes to MQTT topic
  4. Ingest service validates, writes to TimescaleDB hypertable in batched INSERT
  5. Raw IMU snapshots persisted to S3 as Parquet for ML retraining (daily partition)
  6. Internal event published to Redis Streams for downstream (notifications, dashboards)

Critical Design Principles

  • Additive-only API changes until major version bump (v1 → v2)
  • Backward-compatible database migrations always; destructive migrations require approval
  • Feature flags gate all cross-component features until rollout complete
  • Idempotency keys required on all write endpoints
  • Rate limiting enforced per-user, per-device, per-IP
  • Observability instrumentation required before merge (OTel traces, structured logs, metrics)
  • Every incident produces at least one regression test

Architecture Decision Log

Each row represents a decision made during CTO trial. ADR format: one-page doc in Git repo.

ADRDecisionStatusKey Rationale
001Railway over AWS through 25k usersApprovedCost, iteration speed, Railway Metal Singapore matches pilot region
002Strangler pattern, not rewrite of Spring monolithApprovedPreserve working code, contain risk, focus on new value
003Go for new backend servicesApprovedConcurrency fit, hiring pool NYC, CTO experience at scale
004TimescaleDB for time-series telemetryApprovedSQL-compatible, operationally simple vs InfluxDB/ClickHouse
005HiveMQ Cloud for MQTT brokerApprovedManaged, MQTT 5, reasonable pricing, not Chinese-origin
006Self-hosted LGTM for observabilityApprovedZero egress cost on Railway private network, CTO experience
007Datadog credits used surgically (synthetics, DBM)ApprovedMaximize $100k value without lock-in
008Clerk for user authApprovedModern, React Native-native, team familiarity with similar products
009Device auth: bootstrap secret → JWT rotationApprovedPragmatic alternative to full mTLS PKI at startup stage
010Stripe across three entities, not PaddleApprovedSingapore entity enables MoR globally without Paddle premium
011C for Apollo3 firmwareApprovedVendor SDK, toolchain maturity, hiring pool
012On-device TFLM via Ambiq neuralSPOTProvisionalPending model size validation
013GitHub Actions + Railway native deploysApprovedSufficient for stage; migrate to Buildkite+Argo at scale
014PostHog for product analytics + flagsApprovedConsolidates analytics, flags, session replay
015Memfault for firmware observabilityApprovedBest-in-class, no viable alternative

Open Questions Requiring Input

From Founders

  • Subscription tier feature breakdown — what’s free vs 20?
  • Target market sequence post-Thailand?
  • Audio capture in v1 — yes or no? Affects privacy, regulatory, and technical scope
  • Capital plan and runway — 6-month burn compatibility with hiring plan?
  • Timeline to fundraise — am I part of technical diligence conversations?

From Hardware / Firmware Team

  • Dock specifications — still pending, blocks MQTT architecture confirmation
  • Why Apollo3 Blue vs Apollo4/5 with NN accelerator?
  • Current quantized model size and measured accuracy on real silicon?
  • Where does audio capture fit in the partition if it’s in scope?
  • GNSS module present anywhere, or is geolocation phone-derived only?
  • Expected battery life at 25/50/100Hz IMU sampling rates?
  • Firmware signing infrastructure — who holds the key, how is it managed?
  • OTA rollback implementation status — dual-bank partition exists, but is rollback logic tested?

From Contractor Team

  • Complete dependency tree of Spring backend — any hidden dependencies on Chinese services?
  • Production database current schema and volume?
  • Any hardcoded credentials, API keys, or service endpoints in code?
  • Infrastructure account ownership — who holds AWS / Railway / DNS / app store accounts?
  • IP assignment status in contractor agreements — verified with legal?

From Data Scientist (Thailand)

  • Has current ML model been compiled and tested on Apollo3 target, or only trained in PyTorch?
  • What’s the measured accuracy after INT8 quantization vs FP32 baseline?
  • What’s the training dataset size, source, labeling methodology?
  • Interest in full-time role, location flexibility, ML vs data science focus?

Hiring Pipeline — NYC Focus

Open Roles

RolePriorityTarget StartPipeline Status
Senior Backend Engineer (Go)P0Month 2-3Not started — start day 1
Senior Firmware EngineerP0Month 3-4Not started — start week 2, long lead
Senior On-Device ML EngineerP0Month 4-6Not started — parallel with contractor bridge
Senior Mobile Engineer (RN)P1Month 5-7Start month 2
Backend Engineer (Mid-Senior)P1Month 7-9Start month 4
SRE / Platform EngineerP2Month 9-12Start month 7

Hiring Channels

  • Direct outreach to senior engineers from: Oura, Whoop, Fitbit, Peloton, Verkada, Tonal, Tesla, Apple Watch team
  • Referrals from personal network — first priority, highest signal
  • Technical recruiter engagement for backend and mobile — NYC specialists (Paradigm Talent, SignalFire talent network, etc.)
  • LinkedIn Recruiter for passive candidates
  • YC Work at a Startup, Otta / Welcome to the Jungle, Wellfound
  • Hardware-specific: TinyML forums, Embedded Weekly sponsor reach, Hackster.io

Interview Loop

Target 4 stages: (1) recruiter screen, (2) CTO screen, (3) technical deep-dive with CTO + existing team member, (4) values / founder conversation. Avoid 6+ round marathon loops — NYC senior talent has options and long loops lose candidates.

Technical assessment: take-home for mid-level, live pairing for senior. No whiteboard algorithms. Real problems from the codebase.

Compensation Philosophy

Top-of-market base (top 25%), moderate equity, no exotic benefits. Lead with mission and role scope. At 15-20 subscription, this is a serious consumer hardware company, not a lifestyle SaaS.


Operational Cadence

Weekly Rhythm

  • Monday: CEO 1:1, weekly priorities
  • Tuesday: Engineering team standup (when team exists), async before
  • Wednesday: APAC sync window (evening NYC / morning BKK)
  • Thursday: Product/design review, hardware team sync
  • Friday: Metrics review, hiring pipeline review, personal planning

Monthly Rhythm

  • Technical metrics review (SLOs, incidents, deploy frequency, MTTR)
  • Hiring pipeline review — conversion rates, offer acceptance, time-to-close
  • Infrastructure cost review — budget vs actual, projections
  • Risk register review — status updates, new risks, closed risks
  • ADR additions — major decisions documented

Quarterly Rhythm

  • Architecture review — what’s working, what’s not, what’s next
  • Team retrospective — velocity, morale, process
  • Roadmap alignment with CEO / CPO — next quarter priorities
  • Budget planning for next quarter

Personal Risk Register

RiskCategoryMitigation
E2 visa constraint during work trial and post-conversionLegalC2C through own LLC, confirm with immigration lawyer, green card via spouse 1-2 years
Equity structure compatibility with visa and redomicile planLegal/TaxEngage US startup lawyer + international tax advisor before signing
Covered expatriate tax exposure if redomicile after 8+ years as green card holderTaxPlan exit before 8-year mark; model exit tax with tax advisor
Thailand remittance tax rules on foreign income (2024 change)TaxTax advisor planning for multi-jurisdiction income
First-engineering-hire CTO scope creep into everythingRoleProtect time for strategic work; hire first engineer fast
Founder alignment on decisions post-trialPoliticalGet critical decisions in writing during trial
Cultural/communication gap with Thailand-based teamManagementTravel quota 30% APAC in year 1; documentation discipline
Burnout from NYC/BKK evening schedulePersonalStructured overlap windows, not ad-hoc late nights

Conversion Criteria — Trial to Permanent CTO

Before converting from C2C to permanent CTO role, confirmations required:

  1. Equity grant signed and executed (not verbal) with terms negotiated through personal counsel
  2. Clear authority defined in writing: hiring, budget, vendor decisions, architecture, firing
  3. Visa path resolved — either extended C2C or formal employment structure with immigration lawyer sign-off
  4. Runway confirmed — at least 12 months of burn at planned team size, or active raise timeline shared
  5. Cap table reviewed — understand dilution, preferred stack, board composition
  6. Hard decision point: 90 days ideal, 180 days maximum. Beyond that, convert or part ways.

Notes to Self

Things to remember during the trial:

  • Don’t ship new features in first 30 days. Absorb the system.
  • Talk to pilot users directly. All 300 if possible.
  • Recorded KT sessions with contractors are highest leverage artifacts.
  • Founders often underestimate hiring timeline. Push back with realistic numbers early.
  • Dock spec is a gating dependency for multiple decisions. Escalate if not resolved in month 1.
  • Model-on-Apollo3 validation is the single biggest technical risk. Make it a weekly status item.
  • NYC office / hybrid decision affects first hires. Don’t defer.
  • Keep the ADR discipline even when tired. Future-you needs the context.
  • Protect one full day a week for deep work. Meetings expand to fill available space.
  • Weekly personal journal entry on trust signals from founders. Three months of data decides conversion.