Pync — CTO Working Document
Architecture, decisions, hiring, operations — personal reference, updated continuously
Purpose
Personal working document. Detailed architecture notes, decision rationale, open questions, hiring pipeline, risk register, and operational tracking. Founder-facing document is separate and summarizes outward. This document is for tracking the real state and reasoning.
Update weekly during work trial. Version in Git or Notion once team is hired.
Current State Assessment
Inherited Stack Inventory
Frontend (React Native / Expo):
- React Native 0.81.0, React 19.1.0, TypeScript 5.8.3
- Context API state, React Navigation 7.x, Axios 1.12.2
- AsyncStorage for local state — SECURITY ISSUE: auth tokens should not be in AsyncStorage, migrate to Keychain / SecureStore
- React Native Maps 1.25.3, Vision Camera 4.7.2, Lottie 7.3.2
- i18next with Chinese, English, Korean, Thai, Japanese support
- React Native Chart Kit — consider replacement with Victory Native or Recharts Native as product matures
Backend (Java / Spring Boot):
- Java 17, Spring Boot 3.5.6, Undertow (version string 3.5.6 is suspect — likely copy-paste error, verify actual version)
- Spring Data JDBC + MyBatis-Plus — two ORMs, code smell, contain and gradually eliminate
- Redis + Redisson for caching
- MQTT integration at 3.5.6 (Spring Boot version pattern — actual MQTT client library version unclear)
- Sa-Token + JWT for authentication — Chinese-origin, sparse English docs, hard to hire for, replace with Clerk
- OpenAPI 3 documentation — verify accuracy and completeness
- MinIO + AWS S3 both listed — indecision, pick one, recommend S3
- Netty 4.2.6, Protobuf 4.32.1, FastJSON2 2.0.59 (historical CVE concerns in 1.x, 2.x is improved but monitor)
- Lombok, Hutool — Java ecosystem standard
AI / ML:
- Python 3.10, PyTorch 2.8.0, Ubuntu 22.04, Miniconda, Docker
- TCN + Transformer + CNN + Cross-Attention for motion + audio classification
- No mention of quantization pipeline, on-device runtime, or deployment mechanism
- Unclear whether current model has been compiled and tested on Apollo3 target
Hardware:
- Ambiq Apollo3 Blue — ARM Cortex-M4F, 96 MHz, 2MB flash, 768KB RAM, BLE 5, 15-channel ADC
- Partition: Bootloader 48KB, APP 648KB, OTA 648KB (dual-bank for rollback), OTAFlag 8KB, Free 664KB (for model), FileSystem 32KB
- RAM: BLE 96KB, APP 212KB, Drive 148KB, Send cache 312KB (reducible to free RAM for model inference)
- Current firmware size ~468KB; model fits in 664KB free after quantization
- Dock spec pending — critical dependency
Known Issues to File
- AsyncStorage auth token storage — migrate to react-native-keychain / Expo SecureStore
- Two ORMs in Spring backend — incremental cleanup, new code uses single pattern
- MinIO + S3 duality — pick one, remove the other
- Sa-Token hiring constraint — plan migration to Clerk over 6-12 months
- Unverified Undertow version — audit actual dependency tree
- No observability installed — highest priority to address in first 30 days
- No firmware crash reporting — Memfault integration priority
- No HIL test infrastructure — plan for month 3-4
- No documented backup or DR procedures — establish before 5k user scale
Target Architecture
Service Topology at 5,000 Users
Spring Boot monolith handles: user accounts, authentication (Clerk JWT validation), subscription management (Stripe webhook handler), household/pet/device CRUD, admin tools.
Go services (new, built during strangler):
- Telemetry Ingest — MQTT subscriber, HTTPS batch upload receiver, writes to TimescaleDB, publishes to internal pubsub
- Device Registry + OTA Orchestrator — provisioning, pairing validation, OTA campaign management, Memfault integration
- Notification Fanout — consumes events, produces push notifications (APNs, FCM) and email (Postmark or Resend)
Python services:
- ML Training Pipeline — PyTorch training, model versioning, quantization, artifact production
- ML Inference Service — FastAPI, server-side inference for premium features, drift detection
Shared infrastructure:
- PostgreSQL + TimescaleDB (Timescale Cloud Singapore)
- Redis (Railway managed or Upstash)
- HiveMQ Cloud (APAC region)
- S3 (ap-southeast-1) with replication to us-east-1 for training data
- LGTM stack on Railway (Loki, Grafana, Tempo, Mimir)
Data Flow
Collar → BLE → Phone/Dock → (HTTPS batch for phone, MQTT persistent for dock) → Backend → TimescaleDB + S3 archive
Inbound telemetry pipeline:
- Collar buffers 100Hz IMU locally, runs on-device ML, emits activity events + raw IMU snapshots
- Phone/Dock receives via BLE, buffers until backend reachable
- Phone uploads via HTTPS
POST /v1/devices/{id}/telemetry/batch; Dock publishes to MQTT topic - Ingest service validates, writes to TimescaleDB hypertable in batched INSERT
- Raw IMU snapshots persisted to S3 as Parquet for ML retraining (daily partition)
- Internal event published to Redis Streams for downstream (notifications, dashboards)
Critical Design Principles
- Additive-only API changes until major version bump (v1 → v2)
- Backward-compatible database migrations always; destructive migrations require approval
- Feature flags gate all cross-component features until rollout complete
- Idempotency keys required on all write endpoints
- Rate limiting enforced per-user, per-device, per-IP
- Observability instrumentation required before merge (OTel traces, structured logs, metrics)
- Every incident produces at least one regression test
Architecture Decision Log
Each row represents a decision made during CTO trial. ADR format: one-page doc in Git repo.
| ADR | Decision | Status | Key Rationale |
|---|---|---|---|
| 001 | Railway over AWS through 25k users | Approved | Cost, iteration speed, Railway Metal Singapore matches pilot region |
| 002 | Strangler pattern, not rewrite of Spring monolith | Approved | Preserve working code, contain risk, focus on new value |
| 003 | Go for new backend services | Approved | Concurrency fit, hiring pool NYC, CTO experience at scale |
| 004 | TimescaleDB for time-series telemetry | Approved | SQL-compatible, operationally simple vs InfluxDB/ClickHouse |
| 005 | HiveMQ Cloud for MQTT broker | Approved | Managed, MQTT 5, reasonable pricing, not Chinese-origin |
| 006 | Self-hosted LGTM for observability | Approved | Zero egress cost on Railway private network, CTO experience |
| 007 | Datadog credits used surgically (synthetics, DBM) | Approved | Maximize $100k value without lock-in |
| 008 | Clerk for user auth | Approved | Modern, React Native-native, team familiarity with similar products |
| 009 | Device auth: bootstrap secret → JWT rotation | Approved | Pragmatic alternative to full mTLS PKI at startup stage |
| 010 | Stripe across three entities, not Paddle | Approved | Singapore entity enables MoR globally without Paddle premium |
| 011 | C for Apollo3 firmware | Approved | Vendor SDK, toolchain maturity, hiring pool |
| 012 | On-device TFLM via Ambiq neuralSPOT | Provisional | Pending model size validation |
| 013 | GitHub Actions + Railway native deploys | Approved | Sufficient for stage; migrate to Buildkite+Argo at scale |
| 014 | PostHog for product analytics + flags | Approved | Consolidates analytics, flags, session replay |
| 015 | Memfault for firmware observability | Approved | Best-in-class, no viable alternative |
Open Questions Requiring Input
From Founders
- Subscription tier feature breakdown — what’s free vs 20?
- Target market sequence post-Thailand?
- Audio capture in v1 — yes or no? Affects privacy, regulatory, and technical scope
- Capital plan and runway — 6-month burn compatibility with hiring plan?
- Timeline to fundraise — am I part of technical diligence conversations?
From Hardware / Firmware Team
- Dock specifications — still pending, blocks MQTT architecture confirmation
- Why Apollo3 Blue vs Apollo4/5 with NN accelerator?
- Current quantized model size and measured accuracy on real silicon?
- Where does audio capture fit in the partition if it’s in scope?
- GNSS module present anywhere, or is geolocation phone-derived only?
- Expected battery life at 25/50/100Hz IMU sampling rates?
- Firmware signing infrastructure — who holds the key, how is it managed?
- OTA rollback implementation status — dual-bank partition exists, but is rollback logic tested?
From Contractor Team
- Complete dependency tree of Spring backend — any hidden dependencies on Chinese services?
- Production database current schema and volume?
- Any hardcoded credentials, API keys, or service endpoints in code?
- Infrastructure account ownership — who holds AWS / Railway / DNS / app store accounts?
- IP assignment status in contractor agreements — verified with legal?
From Data Scientist (Thailand)
- Has current ML model been compiled and tested on Apollo3 target, or only trained in PyTorch?
- What’s the measured accuracy after INT8 quantization vs FP32 baseline?
- What’s the training dataset size, source, labeling methodology?
- Interest in full-time role, location flexibility, ML vs data science focus?
Hiring Pipeline — NYC Focus
Open Roles
| Role | Priority | Target Start | Pipeline Status |
|---|---|---|---|
| Senior Backend Engineer (Go) | P0 | Month 2-3 | Not started — start day 1 |
| Senior Firmware Engineer | P0 | Month 3-4 | Not started — start week 2, long lead |
| Senior On-Device ML Engineer | P0 | Month 4-6 | Not started — parallel with contractor bridge |
| Senior Mobile Engineer (RN) | P1 | Month 5-7 | Start month 2 |
| Backend Engineer (Mid-Senior) | P1 | Month 7-9 | Start month 4 |
| SRE / Platform Engineer | P2 | Month 9-12 | Start month 7 |
Hiring Channels
- Direct outreach to senior engineers from: Oura, Whoop, Fitbit, Peloton, Verkada, Tonal, Tesla, Apple Watch team
- Referrals from personal network — first priority, highest signal
- Technical recruiter engagement for backend and mobile — NYC specialists (Paradigm Talent, SignalFire talent network, etc.)
- LinkedIn Recruiter for passive candidates
- YC Work at a Startup, Otta / Welcome to the Jungle, Wellfound
- Hardware-specific: TinyML forums, Embedded Weekly sponsor reach, Hackster.io
Interview Loop
Target 4 stages: (1) recruiter screen, (2) CTO screen, (3) technical deep-dive with CTO + existing team member, (4) values / founder conversation. Avoid 6+ round marathon loops — NYC senior talent has options and long loops lose candidates.
Technical assessment: take-home for mid-level, live pairing for senior. No whiteboard algorithms. Real problems from the codebase.
Compensation Philosophy
Top-of-market base (top 25%), moderate equity, no exotic benefits. Lead with mission and role scope. At 15-20 subscription, this is a serious consumer hardware company, not a lifestyle SaaS.
Operational Cadence
Weekly Rhythm
- Monday: CEO 1:1, weekly priorities
- Tuesday: Engineering team standup (when team exists), async before
- Wednesday: APAC sync window (evening NYC / morning BKK)
- Thursday: Product/design review, hardware team sync
- Friday: Metrics review, hiring pipeline review, personal planning
Monthly Rhythm
- Technical metrics review (SLOs, incidents, deploy frequency, MTTR)
- Hiring pipeline review — conversion rates, offer acceptance, time-to-close
- Infrastructure cost review — budget vs actual, projections
- Risk register review — status updates, new risks, closed risks
- ADR additions — major decisions documented
Quarterly Rhythm
- Architecture review — what’s working, what’s not, what’s next
- Team retrospective — velocity, morale, process
- Roadmap alignment with CEO / CPO — next quarter priorities
- Budget planning for next quarter
Personal Risk Register
| Risk | Category | Mitigation |
|---|---|---|
| E2 visa constraint during work trial and post-conversion | Legal | C2C through own LLC, confirm with immigration lawyer, green card via spouse 1-2 years |
| Equity structure compatibility with visa and redomicile plan | Legal/Tax | Engage US startup lawyer + international tax advisor before signing |
| Covered expatriate tax exposure if redomicile after 8+ years as green card holder | Tax | Plan exit before 8-year mark; model exit tax with tax advisor |
| Thailand remittance tax rules on foreign income (2024 change) | Tax | Tax advisor planning for multi-jurisdiction income |
| First-engineering-hire CTO scope creep into everything | Role | Protect time for strategic work; hire first engineer fast |
| Founder alignment on decisions post-trial | Political | Get critical decisions in writing during trial |
| Cultural/communication gap with Thailand-based team | Management | Travel quota 30% APAC in year 1; documentation discipline |
| Burnout from NYC/BKK evening schedule | Personal | Structured overlap windows, not ad-hoc late nights |
Conversion Criteria — Trial to Permanent CTO
Before converting from C2C to permanent CTO role, confirmations required:
- Equity grant signed and executed (not verbal) with terms negotiated through personal counsel
- Clear authority defined in writing: hiring, budget, vendor decisions, architecture, firing
- Visa path resolved — either extended C2C or formal employment structure with immigration lawyer sign-off
- Runway confirmed — at least 12 months of burn at planned team size, or active raise timeline shared
- Cap table reviewed — understand dilution, preferred stack, board composition
- Hard decision point: 90 days ideal, 180 days maximum. Beyond that, convert or part ways.
Notes to Self
Things to remember during the trial:
- Don’t ship new features in first 30 days. Absorb the system.
- Talk to pilot users directly. All 300 if possible.
- Recorded KT sessions with contractors are highest leverage artifacts.
- Founders often underestimate hiring timeline. Push back with realistic numbers early.
- Dock spec is a gating dependency for multiple decisions. Escalate if not resolved in month 1.
- Model-on-Apollo3 validation is the single biggest technical risk. Make it a weekly status item.
- NYC office / hybrid decision affects first hires. Don’t defer.
- Keep the ADR discipline even when tired. Future-you needs the context.
- Protect one full day a week for deep work. Meetings expand to fill available space.
- Weekly personal journal entry on trust signals from founders. Three months of data decides conversion.