Skip to Content
Trade-offs & System Design

Trade-offs & System Design

Each section covers a key architectural decision, the alternatives considered, and the reasoning behind the chosen approach — including what was rejected and why.


1. Serverless Postgres (Neon) vs. Traditional RDS

The Decision

MeetAI uses Neon DB (serverless Postgres) instead of a traditional managed RDS instance (e.g., AWS RDS, DigitalOcean Managed Postgres).

Why Neon?

FactorNeon DB (Serverless)Traditional RDS
Cold start~100ms connection via HTTP driverAlways-on, 0ms (but you pay 24/7)
Scale-to-zeroYes — $0 when idleNo — minimum ~$15/mo even idle
Connection modelHTTP-based pooling (@neondatabase/serverless)Persistent TCP connections (pool exhaustion risk)
BranchingGit-like DB branches for preview deploysManual snapshot/restore
Vercel integrationNative — env vars auto-provisionedManual setup
Cost at low scaleFree tier: 0.5 GB, generous compute$15-50/mo minimum

The Deeper Trade-off

🎯

The fundamental trade-off is latency predictability vs. cost efficiency. Neon’s serverless driver (HTTP-based) adds ~10-30ms per query compared to a persistent TCP connection. For MeetAI, this is acceptable because:

  1. Transcript writes are batched (10 lines per HTTP call) — the per-line overhead is amortized
  2. Dashboard reads are server-rendered (RSC) — the DB latency is hidden in the page render
  3. Real-time captions don’t touch the DB at all — they flow through LiveKit data channels

The only latency-sensitive DB operation is the post-meeting summarization pipeline, which runs asynchronously via Inngest — users never wait for it.

Connection Architecture

// src/db/index.ts — Using the serverless driver with pool support import { Pool } from '@neondatabase/serverless'; import { drizzle } from 'drizzle-orm/neon-serverless'; const pool = new Pool({ connectionString: process.env.DATABASE_URL }); export const db = drizzle(pool); // Supports transactions
💡

Why Pool instead of neon() HTTP driver? The Pool driver maintains a WebSocket-based connection pool that supports transactions — needed for multi-step operations like creating a meeting + adding the host as a participant atomically. The plain neon() HTTP driver is stateless and doesn’t support transactions.

When Would RDS Be Better?

  • Sub-5ms query requirements — e.g., a real-time multiplayer game leaderboard
  • Heavy write throughput — e.g., IoT sensor data at 10K writes/sec
  • Complex long-running transactions — e.g., financial systems with distributed locks
  • Predictable, always-on traffic — no benefit from scale-to-zero

2. SFU (LiveKit) vs. Raw WebRTC (Peer-to-Peer)

The Decision

MeetAI uses LiveKit’s SFU architecture instead of establishing direct peer-to-peer WebRTC connections between participants.

P2P vs. SFU vs. MCU

AspectP2P (Mesh)SFU (LiveKit)MCU
Upload streamsN-1 per user1 per user1 per user
Server CPUNoneLow (forwarding only)High (transcoding)
Scalability4-5 users max100+ users50+ users
LatencyLowest (direct)Low (1 hop)Higher (transcode delay)
AI agent accessImpossible (no server)Native (agent joins room)Possible but complex
Cost$0 server$$ (SFU infra)$$$ (CPU-intensive)
🎯

The deciding factor for MeetAI wasn’t just scalability — it was AI agent integration. In a P2P mesh, there’s no server-side entity that can receive audio streams, process them through an LLM, and inject responses back. LiveKit’s SFU provides a first-class Agent Framework where server-side agents are treated as regular room participants. This is architecturally impossible with pure P2P WebRTC.

Why Not MCU?

An MCU (Multipoint Control Unit) could also host agents, but it transcodes all streams into a single mixed output. This means:

  1. Higher latency — transcode adds 200-500ms
  2. Higher cost — CPU-intensive video mixing
  3. Loss of per-track control — the agent can’t isolate individual speakers for accurate transcription
  4. No selective subscription — all participants get the same mixed stream
💡

Follow-up Answer: “An SFU forwards individual tracks without transcoding. This means the AI agent receives each participant’s audio as a separate stream, enabling accurate per-speaker transcription. An MCU would mix all audio into one stream, making speaker diarization significantly harder.”

LiveKit-Specific Advantages

  • Agent Framework — First-class support for server-side agents with audio pipeline
  • Data Channels — Reliable/unreliable data messaging (used for live captions)
  • RPC Mechanism — Targeted request-response between agent and specific participants
  • Room Metadata — JSON payload attached to rooms, read by agents on connect
  • Webhook Systemroom_finished events trigger post-processing pipelines

3. Event-driven Queues (Inngest) vs. Cron Jobs

The Decision

MeetAI uses Inngest’s event-driven durable functions instead of cron-based polling for post-meeting AI processing.

The Fundamental Difference

AspectCron JobsInngest (Event-driven)
TriggerTime-based pollingEvent-based (instant reaction)
Latency0-60s delay (depends on cron interval)~100ms after event
Wasted workPolls even when nothing changedOnly runs when events fire
Retry granularityEntire job retries from scratchIndividual steps retry
ObservabilityRoll your own loggingBuilt-in dashboard, step traces
ConcurrencyManual locking (DB flags, Redis)Handled by Inngest runtime
IdempotencyManual implementationStep-level deduplication
🎯

The killer feature of Inngest isn’t just event-driven execution — it’s step-level durability. The summarization pipeline has 4 steps:

  1. Mark meeting as “processing” (DB write)
  2. Format transcript (DB read + transform)
  3. Call Gemini API (external, slow, flaky)
  4. Save summary (DB write)

If Step 3 (Gemini API) fails after 30 seconds, a cron job would re-run all 4 steps — including the potentially expensive DB reads. Inngest resumes from Step 3 only, because Steps 1-2 are checkpointed.

Why Not a Simple Queue (BullMQ, SQS)?

A basic message queue solves the “don’t poll” problem but doesn’t provide:

FeatureBullMQ / SQSInngest
Step-level checkpointingNo — entire job retriesYes
Built-in dashboardBullMQ Board (separate)Yes (dev + prod)
Serverless-nativeNeeds a worker processRuns on your existing server
No infrastructureRedis / SQS setup requiredHosted service (or self-host)
Type-safe eventsManual schema validationTypeScript event schemas
💡

Follow-up Answer: “The distinction is between a queue (BullMQ) and a workflow engine (Inngest). A queue delivers a message and retries the consumer. A workflow engine breaks the consumer into individually retriable steps with checkpoint semantics. For multi-step AI pipelines where one step can take 30+ seconds and fail, step-level retry is essential.”

Cost Comparison at MeetAI Scale

Assuming ~500 meetings/month (small SaaS):

SolutionMonthly CostOperational Overhead
Cron + Vercel Serverless$0 (Vercel free tier)High (manual retries, logging, locking)
BullMQ + Redis (Upstash)~$10 (Redis) + worker hostingMedium (Redis ops, worker management)
Inngest (hosted)$0 (free tier: 5K runs)Low (zero infra, built-in dashboard)

4. Drizzle ORM vs. Prisma

The Decision

MeetAI uses Drizzle ORM instead of Prisma for database access.

AspectDrizzlePrisma
Query styleSQL-like (relational)Object-based (ActiveRecord-like)
Bundle size~50KB~2MB (engine binary)
Cold start impactMinimal+300-500ms (engine init)
Serverless fitExcellentPoor (engine overhead)
Schema definitionTypeScript (code-first).prisma DSL (separate language)
Type inferenceFrom schema code directlyGenerated client (prisma generate)
Edge runtimeFull supportLimited (edge-compatible client separate)
🎯

In a Vercel serverless environment, every cold start pays the cost of initializing the ORM. Prisma’s query engine binary adds 300-500ms to cold starts. Drizzle is a thin TypeScript layer that adds negligible overhead. For an app like MeetAI with frequent serverless invocations (API routes, server components, webhooks), this difference compounds significantly.


5. JSONB Transcript vs. Normalized Table

The Decision

Transcript data is stored as a JSONB array on the meetings row instead of a separate transcript_lines table.

ApproachJSONB ColumnNormalized Table
Read patternSingle row fetch = full transcriptJOIN or separate query with pagination
Write patternArray concat (jsonb_set)INSERT per line
Query individual linesNot indexed (full scan)Indexed columns
Data co-locationWith meeting (cache-friendly)Separate table (extra I/O)
Schema flexibilitySchemaless within JSONBRigid column types
🎯

This is a classic read-optimized vs. write-optimized trade-off. MeetAI’s access pattern is overwhelmingly read-heavy:

  • Write: Agent batches 10 lines per HTTP call → jsonb_set array concat (amortized)
  • Read: Dashboard always loads full transcript → single row fetch (no JOIN)

A normalized transcript_lines table would add a JOIN for every transcript render and create N rows per meeting (potentially thousands). JSONB keeps the data co-located and avoids the N+1 problem entirely.

When to Normalize Instead

  • Full-text search on individual lines — JSONB @> operators are slower than GIN-indexed text columns
  • Per-line metadata updates — e.g., marking lines as “action items” independently
  • Cross-meeting transcript search — “Find all meetings where I discussed ‘API design‘“

6. Agent-side Transcript Storage vs. Client-side Buffering

The Decision

Transcript lines are stored from the agent process (server-side), not from the user’s browser (client-side).

FactorClient-sideAgent-side
Reliabilitybeforeunload is unreliable (tab crash, mobile)Agent lifecycle is controlled
Data completenessMisses lines if client disconnectsAgent sees all conversation turns
SecurityClient can tamper with transcriptAgent is trusted server-side process
OrderingRace conditions across multiple clientsSingle source of truth (sequential index)
Battery/bandwidthClient resources consumed for storageOffloaded to agent server
🎯

“I moved transcript storage from client-side to agent-side because the beforeunload event is fundamentally unreliable for data persistence. On mobile browsers, tab switches can kill the page without firing the event. The agent process has a deterministic lifecycle — it can flush its buffer in the shutdown() handler, which executes before the process exits. This changed our transcript loss rate from ~5% to effectively 0%.”


Summary: Decision Matrix

DecisionChoseOverPrimary Reason
DatabaseNeon (Serverless Postgres)RDS, PlanetScaleScale-to-zero cost model, Vercel integration
Real-timeLiveKit (SFU)Raw WebRTC P2PAI agent integration requires server-side audio access
Background JobsInngestCron, BullMQStep-level durability for multi-step AI pipelines
ORMDrizzlePrismaServerless cold start performance, bundle size
Transcript StorageJSONB columnNormalized tableRead-optimized for dashboard pattern
Transcript WriterAgent-sideClient-sideReliability of server-side lifecycle vs. beforeunload
Last updated on