1. Perfect LifeOS architecture (target-state, decades, 2 people) **Recommendation:** Use **PostgreSQL as the canonical system of record**, and treat everything else (vector indexes, search indexes, caches) as **rebuildable projections**. Keep time-series and semantic retrieval *logically* integrated through an **Event + Observation spine**, even if some data lands in specialized physical stores later. ### 1.1 Non‑negotiable principles 1. **One canonical truth layer** (durable, queryable, boring): Postgres. 2. **Append-only ingestion + provenance**: always store raw payloads and normalized facts side-by-side. 3. **Stable IDs**: every “thing that happened” becomes an **event_id** anchored in time; every measurement becomes an **observation** referencing person_id + timestamp (+ optional event_id). 4. **Rebuildable indexes**: embeddings/vector DB and any search layer must be derivable from canonical tables + raw objects. 5. **Memory hierarchy for AI**: raw → chunks → embeddings → summaries → “state snapshots”. ### 1.2 Storage layout (what goes where) | Layer | Technology | Holds | Why | | -------------------------- | ------------------------------------ | ----------------------------------------------------------------------------- | ------------------------------------------------ | | Canonical facts & metadata | **Postgres** | people, events, observations, meals, labs, finances, doc metadata, chunk text | Cross-domain joins + durable truth | | High-frequency signals | **TimescaleDB** (Postgres extension) | CGM, HRV/RHR streams, location pings | Compression/retention + fast time-window queries | | Raw artifacts | **S3-compatible object store** | PDFs, email MIME blobs, audio, large transcripts, attachments | Keep Postgres lean; store large blobs once | | Semantic index | **Weaviate (or pgvector later)** | embeddings + metadata projection | Fast ANN + hybrid retrieval (kept rebuildable) | **Direct callout:** I would not build on MinIO today. Their GitHub repo was archived on Feb 13, 2026, which is the wrong direction for a decades-long foundation. ([GitHub][1]) So the “perfect” move is: **design to the S3 API**, but pick an actively maintained backend (SeaweedFS / Garage / etc.). In the Phase 1 blueprint below, I use **SeaweedFS** in single-container mode with S3 enabled. ([GitHub][2]) ### 1.3 The data modeling philosophy (the spine) **Recommendation:** Unify around **one spine** in Postgres, and let domain schemas “hang off” it. **Core tables (conceptually):** * **person**: Mark + wife (stable UUIDs) * **event**: “Dinner”, “Workout”, “Lab draw”, “Conversation session”, “Supplement intake”, “Trip”, “Argument”, etc. * **observation**: Any structured measurement at a time (and optionally linked to an event) * **raw_external_payload**: request/response capture from external systems (Nutritionix, Whoop exports, lab PDFs, email providers, etc.) * **(later) document + chunk**: unstructured text objects with chunking + embeddings + links to events **Why this spine is “perfect” for cross-correlation** * “What did I eat before high HRV days?” becomes: * filter observations/signals for HRV threshold * pull the prior-night meal events * correlate against meal items + nutrients * “What were my ideas about X?” becomes: * semantic retrieval over chunks * re-rank + group by event/time/project ### 1.4 Retrieval architecture for AI context injection **Recommendation:** Do **hybrid retrieval + a summary ladder**, not “dump the DB into context”. **At conversation start, build context in layers:** 1. **State snapshot** (cheap, structured): * last 24h sleep summary, recovery, training load, glucose excursions, today’s calendar, recent meals, recent supplements 2. **Relevant memories** (semantic + keyword): * vector search over chunks for the user’s query + conversation topic * keyword/FTS for precise entities (supplement names, lab markers, account names) 3. **Time-window joins**: * if query is physiological (“zinc today?”), automatically pull the last N days of: * supplements taken, GI symptoms, sleep, HRV, workouts, labs (if recent), glucose response patterns 4. **Rerank + compress**: * keep only the highest-signal items, and include citations back to event_ids/doc_ids so the assistant can drill down. **Is pgvector “enough”?** At personal scale (even millions of chunks), yes. The “perfect” stance is: * store embeddings in canonical DB (or at least canonical text + chunk IDs) * optionally maintain Weaviate as a **projection index** for speed This keeps you migration-proof. For Weaviate readiness + ops, it provides a readiness endpoint at `GET /v1/.well-known/ready`. ([Weaviate Documentation][3]) For API key auth via Docker env vars, Weaviate documents `AUTHENTICATION_APIKEY_ALLOWED_KEYS`, `AUTHENTICATION_APIKEY_USERS`, and disabling anonymous access. ([Weaviate Documentation][4]) ### 1.5 Ingestion pattern (the part that usually breaks by year 3) **Recommendation:** Every connector follows the same contract: 1. **Land raw** * Store raw payload (JSON, PDF, MIME, audio) with: * `source_system`, `source_ref`, `captured_at`, `sha256` 2. **Normalize** * Write domain facts (meals, labs, transactions) and tie them to `event_id` 3. **Index** (rebuildable) * Chunk/embed text; push to vector index; store chunk metadata in Postgres 4. **Summarize** * Write daily/weekly summaries into Postgres (state snapshots) This pattern is what preserves perfect migrations: if Weaviate/Qdrant/pgvector changes, you rebuild from canonical. ### 1.6 Migration-proofing decisions (do this now) **Recommendation:** Make these irreversible choices *reversible*: * **Never depend on a vector DB as the only copy of truth** * **Keep raw artifacts in open formats** (JSON, PDF, EML/MBOX, WAV/FLAC) * **Use stable UUIDs** for person_id/event_id/doc_id/chunk_id * **Version embeddings** (`embedding_model`, `embedding_dim`, `embedding_version`) * **Keep a deterministic chunking strategy** (so rebuilds produce compatible chunk IDs when possible) --- 2. Phase 1 implementation blueprint (Meals + Recipes, but spine-ready) **Recommendation:** Ship Phase 1 with exactly what you described: * full Event + Observation spine exists * only nutrition tables are populated + tested * API-only interface * 5 Docker services healthy I created a ready-to-run scaffold (compose + SQL init + FastAPI service) with your constraints baked in: [Download the Phase 1 blueprint zip](sandbox:/mnt/data/lifeos_phase1_blueprint.zip) ### 2.1 What’s inside the zip **Services (5):** 1. `mb_lifeos_core_db` — Postgres core 2. `mb_lifeos_signals_db` — TimescaleDB signals (runs, even if unused in v1) Timescale’s official image lists multiple architectures including ARM variants. ([Docker Hub][5]) 3. `mb_lifeos_object_store` — SeaweedFS S3-compatible object store (single container, S3 enabled) ([GitHub][2]) 4. `mb_lifeos_weaviate` — Weaviate with API key auth + readiness healthcheck ([Weaviate Documentation][3]) 5. `mb_lifeos_api` — FastAPI “LifeOS API” (Phase 1 endpoints) **Networking:** * Uses a dedicated Docker network: `mb_lifeos_net` (no collisions with your other stacks). * By default **no host ports are published** (port inventory safe). * Optional `docker-compose.debug-ports.yml` publishes loopback-only ports for testing. **Secrets:** * Designed for Doppler injection (`doppler run … docker compose up`). * No secrets hardcoded. Doppler’s compose workflow relies on explicitly listing the env vars you want exposed to containers. ([Doppler][6]) ### 2.2 Schema included (core + nutrition) Core DB init scripts include: * `core.person` (Mark + wife seeded with stable UUIDs from day one) * `core.event` (the spine anchor) * `core.observation` (generic structured measurements) * `raw.external_payload` (captures Nutritionix request/response JSON) * nutrition tables: * `nutrition.meal` * `nutrition.meal_item` * `nutrition.meal_item_nutrient` (optional normalized nutrient breakdown) * `nutrition.recipe`, `nutrition.recipe_ingredient` (schema present; can be empty in v1) ### 2.3 LifeOS API endpoints (Phase 1 “done”) The API service implements: * `POST /v1/meals/log` * takes `"text": "4 eggs and a coffee"` * calls Nutritionix natural language endpoint * writes raw payload + normalized meal rows * `GET /v1/meals/today` * returns today’s meals + totals Nutritionix’s natural language nutrients endpoint uses headers like `x-app-id`, `x-app-key`, and `x-remote-user-id`. ([developer.nutritionix.com][7]) ### 2.4 How this matches your Phase 1 “done” definition * **All 5 services running healthy in Docker** * each service has a healthcheck * **Schema applied, both persons seeded** * core DB init SQL seeds Mark + wife IDs * **“log my breakfast: 4 eggs and a coffee” works** * API calls Nutritionix, stores results * **“what did I eat today” works** * API aggregates today’s meals ### 2.5 Caddy + Tailscale routing Included `Caddyfile.snippet` shows how to route: * `lifeos..ts.net` → API * `weaviate..ts.net` → Weaviate * `s3..ts.net` → object store (Uses Docker DNS names on `mb_lifeos_net`, so you don’t need host ports.) --- 3. The “perfect” long-term interface layer (no SQL, no dashboards) **Recommendation:** Don’t rely on “natural language → SQL” alone. Use a **tool-first LifeOS API** with: * explicit, stable endpoints for each domain (log_meal, get_meals, log_supplement, query_labs, query_hrv_range, etc.) * one “planner” endpoint for cross-domain queries that returns: * a query plan + executed results + provenance links * a retrieval endpoint for semantic memory (search_chunks + filters) Think of it like: * **Tools for writes** * **Hybrid retrieval + typed queries for reads** * **Planner orchestrator** when the question spans domains This is what keeps the assistant reliable over years. --- 4. Decide-before-you-build list (these are the expensive mistakes) **Recommendation:** Lock these 7 decisions now: 1. **Canonical spine contract** * event_id + person_id + occurred_at are mandatory for anything meaningful 2. **Raw-first ingestion policy** * every connector must land raw + normalized + provenance IDs 3. **Chunking + embedding versioning** * store `chunk_id`, `embedding_model`, `embedding_version`, `dim` so rebuilds don’t break memory 4. **Index is not truth** * Weaviate/Qdrant/pgvector can change; canonical tables must be enough to rebuild 5. **Object store abstraction** * standardize on `s3://bucket/key` addressing even if backend swaps later 6. **Time semantics** * everything is `timestamptz` + explicit user timezone (America/New_York) for daily boundaries 7. **API-only interaction contract** * no ad hoc DB writes; all writes go through LifeOS API so provenance stays intact --- If you want, I can also generate the **Phase 2 schema expansion** (documents/chunks/embeddings + email model + lab panels + supplements/medications) in the same spine style, so your builder agent can keep shipping modules without rethinking foundations. [1]: https://github.com/minio/minio/issues/21647 "https://github.com/minio/minio/issues/21647" [2]: https://github.com/seaweedfs/seaweedfs "https://github.com/seaweedfs/seaweedfs" [3]: https://docs.weaviate.io/deploy/configuration/status?utm_source=chatgpt.com "Cluster status and metadata | Weaviate Documentation" [4]: https://docs.weaviate.io/deploy/configuration/authentication "Authentication | Weaviate Documentation" [5]: https://hub.docker.com/r/timescale/timescaledb/tags "https://hub.docker.com/r/timescale/timescaledb/tags" [6]: https://docs.doppler.com/docs/docker-compose "https://docs.doppler.com/docs/docker-compose" [7]: https://developer.nutritionix.com/docs/v2/authentication "https://developer.nutritionix.com/docs/v2/authentication"