Trade-offs & System Design
Each section covers a key architectural decision, the alternatives considered, and the reasoning behind the chosen approach — including what was rejected and why.
1. Serverless Postgres (Neon) vs. Traditional RDS
The Decision
MeetAI uses Neon DB (serverless Postgres) instead of a traditional managed RDS instance (e.g., AWS RDS, DigitalOcean Managed Postgres).
Why Neon?
| Factor | Neon DB (Serverless) | Traditional RDS |
|---|---|---|
| Cold start | ~100ms connection via HTTP driver | Always-on, 0ms (but you pay 24/7) |
| Scale-to-zero | Yes — $0 when idle | No — minimum ~$15/mo even idle |
| Connection model | HTTP-based pooling (@neondatabase/serverless) | Persistent TCP connections (pool exhaustion risk) |
| Branching | Git-like DB branches for preview deploys | Manual snapshot/restore |
| Vercel integration | Native — env vars auto-provisioned | Manual setup |
| Cost at low scale | Free tier: 0.5 GB, generous compute | $15-50/mo minimum |
The Deeper Trade-off
The fundamental trade-off is latency predictability vs. cost efficiency. Neon’s serverless driver (HTTP-based) adds ~10-30ms per query compared to a persistent TCP connection. For MeetAI, this is acceptable because:
- Transcript writes are batched (10 lines per HTTP call) — the per-line overhead is amortized
- Dashboard reads are server-rendered (RSC) — the DB latency is hidden in the page render
- Real-time captions don’t touch the DB at all — they flow through LiveKit data channels
The only latency-sensitive DB operation is the post-meeting summarization pipeline, which runs asynchronously via Inngest — users never wait for it.
Connection Architecture
// src/db/index.ts — Using the serverless driver with pool support
import { Pool } from '@neondatabase/serverless';
import { drizzle } from 'drizzle-orm/neon-serverless';
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
export const db = drizzle(pool); // Supports transactionsWhy Pool instead of neon() HTTP driver? The Pool driver maintains a WebSocket-based connection pool that supports transactions — needed for multi-step operations like creating a meeting + adding the host as a participant atomically. The plain neon() HTTP driver is stateless and doesn’t support transactions.
When Would RDS Be Better?
- Sub-5ms query requirements — e.g., a real-time multiplayer game leaderboard
- Heavy write throughput — e.g., IoT sensor data at 10K writes/sec
- Complex long-running transactions — e.g., financial systems with distributed locks
- Predictable, always-on traffic — no benefit from scale-to-zero
2. SFU (LiveKit) vs. Raw WebRTC (Peer-to-Peer)
The Decision
MeetAI uses LiveKit’s SFU architecture instead of establishing direct peer-to-peer WebRTC connections between participants.
P2P vs. SFU vs. MCU
| Aspect | P2P (Mesh) | SFU (LiveKit) | MCU |
|---|---|---|---|
| Upload streams | N-1 per user | 1 per user | 1 per user |
| Server CPU | None | Low (forwarding only) | High (transcoding) |
| Scalability | 4-5 users max | 100+ users | 50+ users |
| Latency | Lowest (direct) | Low (1 hop) | Higher (transcode delay) |
| AI agent access | Impossible (no server) | Native (agent joins room) | Possible but complex |
| Cost | $0 server | $$ (SFU infra) | $$$ (CPU-intensive) |
The deciding factor for MeetAI wasn’t just scalability — it was AI agent integration. In a P2P mesh, there’s no server-side entity that can receive audio streams, process them through an LLM, and inject responses back. LiveKit’s SFU provides a first-class Agent Framework where server-side agents are treated as regular room participants. This is architecturally impossible with pure P2P WebRTC.
Why Not MCU?
An MCU (Multipoint Control Unit) could also host agents, but it transcodes all streams into a single mixed output. This means:
- Higher latency — transcode adds 200-500ms
- Higher cost — CPU-intensive video mixing
- Loss of per-track control — the agent can’t isolate individual speakers for accurate transcription
- No selective subscription — all participants get the same mixed stream
Follow-up Answer: “An SFU forwards individual tracks without transcoding. This means the AI agent receives each participant’s audio as a separate stream, enabling accurate per-speaker transcription. An MCU would mix all audio into one stream, making speaker diarization significantly harder.”
LiveKit-Specific Advantages
- Agent Framework — First-class support for server-side agents with audio pipeline
- Data Channels — Reliable/unreliable data messaging (used for live captions)
- RPC Mechanism — Targeted request-response between agent and specific participants
- Room Metadata — JSON payload attached to rooms, read by agents on connect
- Webhook System —
room_finishedevents trigger post-processing pipelines
3. Event-driven Queues (Inngest) vs. Cron Jobs
The Decision
MeetAI uses Inngest’s event-driven durable functions instead of cron-based polling for post-meeting AI processing.
The Fundamental Difference
| Aspect | Cron Jobs | Inngest (Event-driven) |
|---|---|---|
| Trigger | Time-based polling | Event-based (instant reaction) |
| Latency | 0-60s delay (depends on cron interval) | ~100ms after event |
| Wasted work | Polls even when nothing changed | Only runs when events fire |
| Retry granularity | Entire job retries from scratch | Individual steps retry |
| Observability | Roll your own logging | Built-in dashboard, step traces |
| Concurrency | Manual locking (DB flags, Redis) | Handled by Inngest runtime |
| Idempotency | Manual implementation | Step-level deduplication |
The killer feature of Inngest isn’t just event-driven execution — it’s step-level durability. The summarization pipeline has 4 steps:
- Mark meeting as “processing” (DB write)
- Format transcript (DB read + transform)
- Call Gemini API (external, slow, flaky)
- Save summary (DB write)
If Step 3 (Gemini API) fails after 30 seconds, a cron job would re-run all 4 steps — including the potentially expensive DB reads. Inngest resumes from Step 3 only, because Steps 1-2 are checkpointed.
Why Not a Simple Queue (BullMQ, SQS)?
A basic message queue solves the “don’t poll” problem but doesn’t provide:
| Feature | BullMQ / SQS | Inngest |
|---|---|---|
| Step-level checkpointing | No — entire job retries | Yes |
| Built-in dashboard | BullMQ Board (separate) | Yes (dev + prod) |
| Serverless-native | Needs a worker process | Runs on your existing server |
| No infrastructure | Redis / SQS setup required | Hosted service (or self-host) |
| Type-safe events | Manual schema validation | TypeScript event schemas |
Follow-up Answer: “The distinction is between a queue (BullMQ) and a workflow engine (Inngest). A queue delivers a message and retries the consumer. A workflow engine breaks the consumer into individually retriable steps with checkpoint semantics. For multi-step AI pipelines where one step can take 30+ seconds and fail, step-level retry is essential.”
Cost Comparison at MeetAI Scale
Assuming ~500 meetings/month (small SaaS):
| Solution | Monthly Cost | Operational Overhead |
|---|---|---|
| Cron + Vercel Serverless | $0 (Vercel free tier) | High (manual retries, logging, locking) |
| BullMQ + Redis (Upstash) | ~$10 (Redis) + worker hosting | Medium (Redis ops, worker management) |
| Inngest (hosted) | $0 (free tier: 5K runs) | Low (zero infra, built-in dashboard) |
4. Drizzle ORM vs. Prisma
The Decision
MeetAI uses Drizzle ORM instead of Prisma for database access.
| Aspect | Drizzle | Prisma |
|---|---|---|
| Query style | SQL-like (relational) | Object-based (ActiveRecord-like) |
| Bundle size | ~50KB | ~2MB (engine binary) |
| Cold start impact | Minimal | +300-500ms (engine init) |
| Serverless fit | Excellent | Poor (engine overhead) |
| Schema definition | TypeScript (code-first) | .prisma DSL (separate language) |
| Type inference | From schema code directly | Generated client (prisma generate) |
| Edge runtime | Full support | Limited (edge-compatible client separate) |
In a Vercel serverless environment, every cold start pays the cost of initializing the ORM. Prisma’s query engine binary adds 300-500ms to cold starts. Drizzle is a thin TypeScript layer that adds negligible overhead. For an app like MeetAI with frequent serverless invocations (API routes, server components, webhooks), this difference compounds significantly.
5. JSONB Transcript vs. Normalized Table
The Decision
Transcript data is stored as a JSONB array on the meetings row instead of a separate transcript_lines table.
| Approach | JSONB Column | Normalized Table |
|---|---|---|
| Read pattern | Single row fetch = full transcript | JOIN or separate query with pagination |
| Write pattern | Array concat (jsonb_set) | INSERT per line |
| Query individual lines | Not indexed (full scan) | Indexed columns |
| Data co-location | With meeting (cache-friendly) | Separate table (extra I/O) |
| Schema flexibility | Schemaless within JSONB | Rigid column types |
This is a classic read-optimized vs. write-optimized trade-off. MeetAI’s access pattern is overwhelmingly read-heavy:
- Write: Agent batches 10 lines per HTTP call →
jsonb_setarray concat (amortized) - Read: Dashboard always loads full transcript → single row fetch (no JOIN)
A normalized transcript_lines table would add a JOIN for every transcript render and create N rows per meeting (potentially thousands). JSONB keeps the data co-located and avoids the N+1 problem entirely.
When to Normalize Instead
- Full-text search on individual lines — JSONB
@>operators are slower than GIN-indexed text columns - Per-line metadata updates — e.g., marking lines as “action items” independently
- Cross-meeting transcript search — “Find all meetings where I discussed ‘API design‘“
6. Agent-side Transcript Storage vs. Client-side Buffering
The Decision
Transcript lines are stored from the agent process (server-side), not from the user’s browser (client-side).
| Factor | Client-side | Agent-side |
|---|---|---|
| Reliability | beforeunload is unreliable (tab crash, mobile) | Agent lifecycle is controlled |
| Data completeness | Misses lines if client disconnects | Agent sees all conversation turns |
| Security | Client can tamper with transcript | Agent is trusted server-side process |
| Ordering | Race conditions across multiple clients | Single source of truth (sequential index) |
| Battery/bandwidth | Client resources consumed for storage | Offloaded to agent server |
“I moved transcript storage from client-side to agent-side because the beforeunload event is fundamentally unreliable for data persistence. On mobile browsers, tab switches can kill the page without firing the event. The agent process has a deterministic lifecycle — it can flush its buffer in the shutdown() handler, which executes before the process exits. This changed our transcript loss rate from ~5% to effectively 0%.”
Summary: Decision Matrix
| Decision | Chose | Over | Primary Reason |
|---|---|---|---|
| Database | Neon (Serverless Postgres) | RDS, PlanetScale | Scale-to-zero cost model, Vercel integration |
| Real-time | LiveKit (SFU) | Raw WebRTC P2P | AI agent integration requires server-side audio access |
| Background Jobs | Inngest | Cron, BullMQ | Step-level durability for multi-step AI pipelines |
| ORM | Drizzle | Prisma | Serverless cold start performance, bundle size |
| Transcript Storage | JSONB column | Normalized table | Read-optimized for dashboard pattern |
| Transcript Writer | Agent-side | Client-side | Reliability of server-side lifecycle vs. beforeunload |