Pync — CTO Working Document

Architecture, decisions, hiring, operations — personal reference, updated continuously

Purpose

Personal working document. Detailed architecture notes, decision rationale, open questions, hiring pipeline, risk register, and operational tracking. Founder-facing document is separate and summarizes outward. This document is for tracking the real state and reasoning.

Update weekly during work trial. Version in Git or Notion once team is hired.

Current State Assessment

Inherited Stack Inventory

Frontend (React Native / Expo):

React Native 0.81.0, React 19.1.0, TypeScript 5.8.3
Context API state, React Navigation 7.x, Axios 1.12.2
AsyncStorage for local state — SECURITY ISSUE: auth tokens should not be in AsyncStorage, migrate to Keychain / SecureStore
React Native Maps 1.25.3, Vision Camera 4.7.2, Lottie 7.3.2
i18next with Chinese, English, Korean, Thai, Japanese support
React Native Chart Kit — consider replacement with Victory Native or Recharts Native as product matures

Backend (Java / Spring Boot):

Java 17, Spring Boot 3.5.6, Undertow (version string 3.5.6 is suspect — likely copy-paste error, verify actual version)
Spring Data JDBC + MyBatis-Plus — two ORMs, code smell, contain and gradually eliminate
Redis + Redisson for caching
MQTT integration at 3.5.6 (Spring Boot version pattern — actual MQTT client library version unclear)
Sa-Token + JWT for authentication — Chinese-origin, sparse English docs, hard to hire for, replace with Clerk
OpenAPI 3 documentation — verify accuracy and completeness
MinIO + AWS S3 both listed — indecision, pick one, recommend S3
Netty 4.2.6, Protobuf 4.32.1, FastJSON2 2.0.59 (historical CVE concerns in 1.x, 2.x is improved but monitor)
Lombok, Hutool — Java ecosystem standard

AI / ML:

Python 3.10, PyTorch 2.8.0, Ubuntu 22.04, Miniconda, Docker
TCN + Transformer + CNN + Cross-Attention for motion + audio classification
No mention of quantization pipeline, on-device runtime, or deployment mechanism
Unclear whether current model has been compiled and tested on Apollo3 target

Hardware:

Ambiq Apollo3 Blue — ARM Cortex-M4F, 96 MHz, 2MB flash, 768KB RAM, BLE 5, 15-channel ADC
Partition: Bootloader 48KB, APP 648KB, OTA 648KB (dual-bank for rollback), OTAFlag 8KB, Free 664KB (for model), FileSystem 32KB
RAM: BLE 96KB, APP 212KB, Drive 148KB, Send cache 312KB (reducible to free RAM for model inference)
Current firmware size ~468KB; model fits in 664KB free after quantization
Dock spec pending — critical dependency

Known Issues to File

AsyncStorage auth token storage — migrate to react-native-keychain / Expo SecureStore
Two ORMs in Spring backend — incremental cleanup, new code uses single pattern
MinIO + S3 duality — pick one, remove the other
Sa-Token hiring constraint — plan migration to Clerk over 6-12 months
Unverified Undertow version — audit actual dependency tree
No observability installed — highest priority to address in first 30 days
No firmware crash reporting — Memfault integration priority
No HIL test infrastructure — plan for month 3-4
No documented backup or DR procedures — establish before 5k user scale

Target Architecture

Service Topology at 5,000 Users

Spring Boot monolith handles: user accounts, authentication (Clerk JWT validation), subscription management (Stripe webhook handler), household/pet/device CRUD, admin tools.

Go services (new, built during strangler):

Telemetry Ingest — MQTT subscriber, HTTPS batch upload receiver, writes to TimescaleDB, publishes to internal pubsub
Device Registry + OTA Orchestrator — provisioning, pairing validation, OTA campaign management, Memfault integration
Notification Fanout — consumes events, produces push notifications (APNs, FCM) and email (Postmark or Resend)

Python services:

ML Training Pipeline — PyTorch training, model versioning, quantization, artifact production
ML Inference Service — FastAPI, server-side inference for premium features, drift detection

Shared infrastructure:

PostgreSQL + TimescaleDB (Timescale Cloud Singapore)
Redis (Railway managed or Upstash)
HiveMQ Cloud (APAC region)
S3 (ap-southeast-1) with replication to us-east-1 for training data
LGTM stack on Railway (Loki, Grafana, Tempo, Mimir)

Data Flow

Collar → BLE → Phone/Dock → (HTTPS batch for phone, MQTT persistent for dock) → Backend → TimescaleDB + S3 archive

Inbound telemetry pipeline:

Collar buffers 100Hz IMU locally, runs on-device ML, emits activity events + raw IMU snapshots
Phone/Dock receives via BLE, buffers until backend reachable
Phone uploads via HTTPS POST /v1/devices/{id}/telemetry/batch; Dock publishes to MQTT topic
Ingest service validates, writes to TimescaleDB hypertable in batched INSERT
Raw IMU snapshots persisted to S3 as Parquet for ML retraining (daily partition)
Internal event published to Redis Streams for downstream (notifications, dashboards)

Critical Design Principles

Additive-only API changes until major version bump (v1 → v2)
Backward-compatible database migrations always; destructive migrations require approval
Feature flags gate all cross-component features until rollout complete
Idempotency keys required on all write endpoints
Rate limiting enforced per-user, per-device, per-IP
Observability instrumentation required before merge (OTel traces, structured logs, metrics)
Every incident produces at least one regression test

Architecture Decision Log

Each row represents a decision made during CTO trial. ADR format: one-page doc in Git repo.

ADR	Decision	Status	Key Rationale
001	Railway over AWS through 25k users	Approved	Cost, iteration speed, Railway Metal Singapore matches pilot region
002	Strangler pattern, not rewrite of Spring monolith	Approved	Preserve working code, contain risk, focus on new value
003	Go for new backend services	Approved	Concurrency fit, hiring pool NYC, CTO experience at scale
004	TimescaleDB for time-series telemetry	Approved	SQL-compatible, operationally simple vs InfluxDB/ClickHouse
005	HiveMQ Cloud for MQTT broker	Approved	Managed, MQTT 5, reasonable pricing, not Chinese-origin
006	Self-hosted LGTM for observability	Approved	Zero egress cost on Railway private network, CTO experience
007	Datadog credits used surgically (synthetics, DBM)	Approved	Maximize $100k value without lock-in
008	Clerk for user auth	Approved	Modern, React Native-native, team familiarity with similar products
009	Device auth: bootstrap secret → JWT rotation	Approved	Pragmatic alternative to full mTLS PKI at startup stage
010	Stripe across three entities, not Paddle	Approved	Singapore entity enables MoR globally without Paddle premium
011	C for Apollo3 firmware	Approved	Vendor SDK, toolchain maturity, hiring pool
012	On-device TFLM via Ambiq neuralSPOT	Provisional	Pending model size validation
013	GitHub Actions + Railway native deploys	Approved	Sufficient for stage; migrate to Buildkite+Argo at scale
014	PostHog for product analytics + flags	Approved	Consolidates analytics, flags, session replay
015	Memfault for firmware observability	Approved	Best-in-class, no viable alternative

Open Questions Requiring Input

From Founders

Subscription tier feature breakdown — what’s free vs $15 v s$ 20?
Target market sequence post-Thailand?
Audio capture in v1 — yes or no? Affects privacy, regulatory, and technical scope
Capital plan and runway — 6-month burn compatibility with hiring plan?
Timeline to fundraise — am I part of technical diligence conversations?

From Hardware / Firmware Team

Dock specifications — still pending, blocks MQTT architecture confirmation
Why Apollo3 Blue vs Apollo4/5 with NN accelerator?
Current quantized model size and measured accuracy on real silicon?
Where does audio capture fit in the partition if it’s in scope?
GNSS module present anywhere, or is geolocation phone-derived only?
Expected battery life at 25/50/100Hz IMU sampling rates?
Firmware signing infrastructure — who holds the key, how is it managed?
OTA rollback implementation status — dual-bank partition exists, but is rollback logic tested?

From Contractor Team

Complete dependency tree of Spring backend — any hidden dependencies on Chinese services?
Production database current schema and volume?
Any hardcoded credentials, API keys, or service endpoints in code?
Infrastructure account ownership — who holds AWS / Railway / DNS / app store accounts?
IP assignment status in contractor agreements — verified with legal?

From Data Scientist (Thailand)

Has current ML model been compiled and tested on Apollo3 target, or only trained in PyTorch?
What’s the measured accuracy after INT8 quantization vs FP32 baseline?
What’s the training dataset size, source, labeling methodology?
Interest in full-time role, location flexibility, ML vs data science focus?

Hiring Pipeline — NYC Focus

Open Roles

Role	Priority	Target Start	Pipeline Status
Senior Backend Engineer (Go)	P0	Month 2-3	Not started — start day 1
Senior Firmware Engineer	P0	Month 3-4	Not started — start week 2, long lead
Senior On-Device ML Engineer	P0	Month 4-6	Not started — parallel with contractor bridge
Senior Mobile Engineer (RN)	P1	Month 5-7	Start month 2
Backend Engineer (Mid-Senior)	P1	Month 7-9	Start month 4
SRE / Platform Engineer	P2	Month 9-12	Start month 7

Hiring Channels

Direct outreach to senior engineers from: Oura, Whoop, Fitbit, Peloton, Verkada, Tonal, Tesla, Apple Watch team
Referrals from personal network — first priority, highest signal
Technical recruiter engagement for backend and mobile — NYC specialists (Paradigm Talent, SignalFire talent network, etc.)
LinkedIn Recruiter for passive candidates
YC Work at a Startup, Otta / Welcome to the Jungle, Wellfound
Hardware-specific: TinyML forums, Embedded Weekly sponsor reach, Hackster.io

Interview Loop

Target 4 stages: (1) recruiter screen, (2) CTO screen, (3) technical deep-dive with CTO + existing team member, (4) values / founder conversation. Avoid 6+ round marathon loops — NYC senior talent has options and long loops lose candidates.

Technical assessment: take-home for mid-level, live pairing for senior. No whiteboard algorithms. Real problems from the codebase.

Compensation Philosophy

Top-of-market base (top 25%), moderate equity, no exotic benefits. Lead with mission and role scope. At $250 ha r d w a re w i t h$ 15-20 subscription, this is a serious consumer hardware company, not a lifestyle SaaS.

Operational Cadence

Weekly Rhythm

Monday: CEO 1:1, weekly priorities
Tuesday: Engineering team standup (when team exists), async before
Wednesday: APAC sync window (evening NYC / morning BKK)
Thursday: Product/design review, hardware team sync
Friday: Metrics review, hiring pipeline review, personal planning

Monthly Rhythm

Technical metrics review (SLOs, incidents, deploy frequency, MTTR)
Hiring pipeline review — conversion rates, offer acceptance, time-to-close
Infrastructure cost review — budget vs actual, projections
Risk register review — status updates, new risks, closed risks
ADR additions — major decisions documented

Quarterly Rhythm

Architecture review — what’s working, what’s not, what’s next
Team retrospective — velocity, morale, process
Roadmap alignment with CEO / CPO — next quarter priorities
Budget planning for next quarter

Personal Risk Register

Risk	Category	Mitigation
E2 visa constraint during work trial and post-conversion	Legal	C2C through own LLC, confirm with immigration lawyer, green card via spouse 1-2 years
Equity structure compatibility with visa and redomicile plan	Legal/Tax	Engage US startup lawyer + international tax advisor before signing
Covered expatriate tax exposure if redomicile after 8+ years as green card holder	Tax	Plan exit before 8-year mark; model exit tax with tax advisor
Thailand remittance tax rules on foreign income (2024 change)	Tax	Tax advisor planning for multi-jurisdiction income
First-engineering-hire CTO scope creep into everything	Role	Protect time for strategic work; hire first engineer fast
Founder alignment on decisions post-trial	Political	Get critical decisions in writing during trial
Cultural/communication gap with Thailand-based team	Management	Travel quota 30% APAC in year 1; documentation discipline
Burnout from NYC/BKK evening schedule	Personal	Structured overlap windows, not ad-hoc late nights

Conversion Criteria — Trial to Permanent CTO

Before converting from C2C to permanent CTO role, confirmations required:

Equity grant signed and executed (not verbal) with terms negotiated through personal counsel
Clear authority defined in writing: hiring, budget, vendor decisions, architecture, firing
Visa path resolved — either extended C2C or formal employment structure with immigration lawyer sign-off
Runway confirmed — at least 12 months of burn at planned team size, or active raise timeline shared
Cap table reviewed — understand dilution, preferred stack, board composition
Hard decision point: 90 days ideal, 180 days maximum. Beyond that, convert or part ways.

Notes to Self

Things to remember during the trial:

Don’t ship new features in first 30 days. Absorb the system.
Talk to pilot users directly. All 300 if possible.
Recorded KT sessions with contractors are highest leverage artifacts.
Founders often underestimate hiring timeline. Push back with realistic numbers early.
Dock spec is a gating dependency for multiple decisions. Escalate if not resolved in month 1.
Model-on-Apollo3 validation is the single biggest technical risk. Make it a weekly status item.
NYC office / hybrid decision affects first hires. Don’t defer.
Keep the ADR discipline even when tired. Future-you needs the context.
Protect one full day a week for deep work. Meetings expand to fill available space.
Weekly personal journal entry on trust signals from founders. Three months of data decides conversion.

Quartz 4

Explorer

pync-cto-working-document