LiveKit Ops · EU
Self-Hosting LiveKit in Production: 2026 Ops Guide
The demo takes an afternoon. Production takes a plan. LiveKit is the best self-hostable SFU available today, but the gap between “it works on my machine” and “it survives Monday at 10:00” is exactly where most teams get hurt: sizing guesses, a Kubernetes setup that fights WebRTC, no monitoring until the first outage. This guide is the checklist I run when I take a self-hosted LiveKit deployment to production — with ballpark numbers from public benchmarks and production load tests. Treat every figure here as a starting point and verify it against your own traffic shape; room size distribution changes everything.
When self-hosting LiveKit makes sense
Two reasons hold up in practice. First, jurisdiction: if you sell EU data residency — telehealth, notarization, legal — you need media and telemetry pinned to EU infrastructure, which a managed control plane can’t fully promise. Second, economics: past a certain volume of participant-minutes, per-minute pricing grows faster than the revenue it supports, while an SFU node costs the same whether it’s 10% or 70% loaded. If neither applies to you, use LiveKit Cloud and spend your engineering time elsewhere — self-hosting is a business decision, not an ideology.
Sizing a self-hosted LiveKit server
An SFU doesn’t mix media, it forwards it. So capacity scales with subscribed tracks, not rooms or “users”. In a meeting-shaped room of N participants, each publishing camera and mic and subscribing to everyone else, the server forwards roughly N × (N−1) video subscriptions plus N × (N−1) audio subscriptions. Ten 4-person rooms are a very different load from one 40-person room with the same headcount.
Start with bandwidth, then validate CPU. LiveKit’s public 16-core benchmark reports a 150-publisher / 150-subscriber 720p room at 85% CPU and a 1-to-3,000 livestream at 92% CPU, which is useful as a ceiling, not a promise for your room shape. For small rooms on a typical 1 Gbps port, network is usually the first hard wall:
- A 720p simulcast publisher uploads ≈2–2.5 Mbps; each video subscription costs 1–2.5 Mbps of egress depending on which simulcast layer the subscriber gets.
- 100 two-way participants in 4-person rooms = 25 rooms × 12 video subscriptions = 300 forwarded video tracks, plus roughly 300 audio tracks. Budget ≈0.4–0.8 Gbps of egress before protocol overhead, recording and TURN relay traffic.
- Hetzner’s 20 TB included traffic helps at low duty cycles, but it is not the main production argument: at sustained peak it burns fast. The argument is 20 TB included and ~€1/TB after — versus about $90/TB on metered clouds.
| Hetzner instance | vCPU | Realistic role |
|---|---|---|
| CX32 (shared) | 4 | Pilots and internal tools, up to ~50 concurrent two-way participants |
| CCX23 (dedicated) | 4 | First production node, ~100–150 two-way participants in small rooms |
| CCX33 (dedicated) | 8 | A few hundred two-way participants; comfortable single-node default |
| CCX43 (dedicated) | 16 | Large rooms or >1 Gbps ports; pair with a separate egress node |
Two rules regardless of instance type. Keep 30–40% CPU headroom: WebRTC degrades before the CPU graph looks scary, because congestion control and keyframe storms bite first. And don’t guess — measure with the official load tester against your actual room shape:
lk load-test \
--url wss://lk.example.com \
--api-key $LK_KEY --api-secret $LK_SECRET \
--room load-test \
--video-publishers 50 --subscribers 200 \
--duration 5m
Single node or LiveKit on Kubernetes?
Start with the boring answer: one big dedicated box runs livekit-server, TURN and Redis under docker compose or systemd, with host networking and zero orchestration tax. Most workloads under a few hundred concurrent participants fit this shape, upgrades are a systemd restart during a quiet hour, and there is nothing between your users’ UDP packets and the SFU. Scale vertically first.
Kubernetes earns its keep when you’re already k8s-native or need rolling upgrades and node pools across regions. LiveKit runs well there — the official Helm chart works — but WebRTC makes the defaults wrong:
hostNetwork: trueis effectively mandatory: media needs a real UDP port range (e.g. 50000–60000) on the node’s public IP, not a ClusterIP.- One livekit-server pod per node (DaemonSet or anti-affinity) — two SFUs can’t share one node’s port range.
- Only signaling (WebSocket, port 7880) goes behind your ingress or load balancer; media flows directly to node IPs. A vanilla LoadBalancer in front of RTP is how you get mysterious one-way video.
- Plan graceful drains: set a generous
terminationGracePeriodSecondsand drain nodes so rooms end or migrate instead of dying mid-call on every deploy.
A minimal production livekit.yaml looks the same in both worlds:
port: 7880
rtc:
tcp_port: 7881
port_range_start: 50000
port_range_end: 60000
use_external_ip: true
redis:
address: redis:6379
turn:
enabled: true
domain: turn.example.com
tls_port: 5349
Scaling beyond one node
Adding Redis turns livekit-server into a cluster: nodes register through Redis, each room is pinned to exactly one node, and signaling is relayed so any node can accept any participant. The operational consequence people miss: your largest single room must fit on one node — the cluster spreads rooms, not participants of one room. If your product has a “webinar for 2,000” feature, size for that room first and the cluster second.
For multi-region, run a node pool plus TURN per region and route users by latency (geo-DNS or your own logic at token-issue time). Keep egress workers on separate nodes from the SFU: compositing is CPU-bursty, and SIP↔WebRTC p95 latency can jump from “fine” to “everyone notices” when room-composite egress lands on the SFU node.
TURN, firewalls, and the enterprise-network tax
In consumer traffic, most connections go direct or through a NAT that STUN handles. Sell to enterprises and the picture changes: corporate firewalls that allow only 443/TCP are common, and 10–20% of real-world sessions end up needing TURN. Run LiveKit’s embedded TURN with TLS on 5349 (or 443 if you can dedicate the IP), give it a proper certificate, and test with a client forced to relay-only before your customers do it for you. When a prospect says “it doesn’t work from the office”, this is almost always where you’ll be digging.
Monitoring LiveKit in production
livekit-server exports Prometheus metrics (set prometheus_port: 6789, or the prometheus: block in newer configs). Scrape them, but be deliberate about what pages you:
- Node CPU > 70% for 5 minutes. Not 90 — by then subscribers are already seeing quality drops from congestion control.
- Process RSS slope, not absolute. Alert when RSS grows monotonically for hours (a practical threshold: >20 MB/h sustained over 6 h). That’s the signature of a goroutine or buffer leak, and it gives you days of warning before the 03:40 OOM.
- Join-time p95 from a synthetic probe. A headless bot that joins, publishes and subscribes every few minutes measures the only latency users actually feel. Alert above 500 ms.
- ICE/TURN failure rate > 2%. Rising relay failures usually mean a certificate, port or pool-allocation problem — this is a leading indicator of “customers can’t join” tickets.
- Egress with zero publishers. A room-composite recorder running against an empty room burns CPU and storage for nothing; reap idle rooms and alert on recordings without publishers. This can triple a storage bill in a month.
Also ship logs somewhere queryable. Room-close reasons, ICE candidate-pair failures and SIP re-INVITE bursts don’t show up in metrics, and every debugging session starts with them.
The failures that actually page you
Patterns worth pre-empting before users discover them:
- Goroutine leaks in the signaling path after SIP re-INVITE storms — flat CPU, climbing RSS, OOM at night. Caught by the RSS-slope alert above.
- Voice-agent workers (e.g. turn-detector models) with unbounded audio buffers ballooning to gigabytes on long silent rooms; livekit/agents #4869 is a public example of the same failure class. Bound the buffers, add restart policies, cap worker memory.
- TURN relay pools misallocated across regions, adding half a second of silence for SIP callers on join. Pin relay pools per region and watch join p95.
- Egress running on rooms nobody closed. An idle-room reaper is twenty lines of code against the server API and pays for itself the first month.
FAQ
How many participants can one LiveKit server handle?
Wrong unit — count subscribed tracks. On a typical 1 Gbps port, bandwidth caps you before CPU: roughly 200–400 two-way participants in small rooms, depending on simulcast layer mix, audio, TURN and recording. The CPU ceiling can be higher, especially on one-to-many streams; run lk load-test against your own traffic pattern.
Do I need Kubernetes to run LiveKit?
No. A single dedicated server with docker compose or systemd is the most reliable shape for most workloads and the easiest to debug. Kubernetes is justified when you already operate it well or genuinely need multi-node orchestration — and then only with host networking and per-node pod placement.
Can LiveKit media and telemetry stay inside the EU?
Yes — that’s the core argument for self-hosting. Run the SFU, TURN and egress on EU providers (Hetzner, OVH, Scaleway), keep monitoring EU-hosted, and nothing about a call transits US-controlled infrastructure. Document the data flow once and your DPA conversations get dramatically shorter.
Should I self-host LiveKit or use LiveKit Cloud?
If you have no jurisdiction constraint and your volume is modest, use Cloud — honestly. Self-host when EU residency is a feature you sell, or when your participant-minute volume makes the economics obvious. If you’re unsure, an afternoon of load testing and a cost model will tell you; ideology won’t.
Running LiveKit in production?
I do production audits, EU-jurisdiction deployments and ops retainers for self-hosted LiveKit — fixed scope, everything documented, no lock-in.
See how I work