Skip to content
Go back

Jitsi Meet Self-Hosted

By SumGuy 13 min read
Jitsi Meet Self-Hosted

Zoom Calls Every Tuesday and You’re Done Asking Permission

Zoom calls every Tuesday. Three participants, no big deal. Except you’ve learned that Zoom records everything by default, keeps transcripts locked in their cloud, and—here’s the thing—you have no idea if your conference room footage is sitting in some data warehouse waiting to be sold off.

You’ve got infrastructure. A homelab, maybe a bare-metal server collecting dust, or a VPS you’re already paying for. So why rent someone else’s video conferencing when you can run your own?

Jitsi Meet is the answer. It’s an open-source video conferencing platform that doesn’t require you to be a distributed systems engineer to run. You deploy it, throw it behind a reverse proxy, point some DNS at it, and boom—you’ve got a fully encrypted, zero-knowledge video call system that’s actually privacy-respecting.

But here’s the real talk: Jitsi is not Zoom. It won’t have that polish, the killer AI-powered features, or the comfort of knowing someone charges money to keep it running. What it will give you is control, privacy, and the smug satisfaction of knowing you’re hosting your own infrastructure. That’s worth the slight edge of “yeah, sometimes the audio drops for a second.”

Let’s build this thing.


What Is Jitsi Meet, Really?

Jitsi Meet is open-source video conferencing software maintained by the Jitsi Foundation (acquired by 8x8 a few years back, but it’s still genuinely open-source). The architecture is modular, which is both its strength and the reason you need to understand what’s happening under the hood.

When you hit a Jitsi instance, you’re actually interacting with several components working together:

The magic is that all of this runs on a single server for small deployments. One VM, four services, all containerized. For three people on a Tuesday, your $20/month VPS is enough.

But here’s the catch: the videobridge is stateful and CPU-hungry. Each participant adds overhead. At around 10 simultaneous participants, you start feeling it. At 20, your server is sweating. At 50, you need a different architecture.

We’ll get to scaling later. For now, assume you’re deploying on a single VM.


Architecture: How This Thing Actually Works

Before you docker-compose your way to a working instance, understand the traffic flow. It matters.

When you join a call in Jitsi, here’s what happens:

  1. Browser loads your-jitsi.example.com — Nginx/reverse proxy serves the React frontend (HTTPS).
  2. Browser initiates signaling via XMPP (WebSocket over HTTPS) to Prosody. This is how the browser tells the system “I want to join room X with user Y.”
  3. Jicofo orchestrates. It tells the JVB “allocate a bridge for this conference” and tells each participant’s browser “here’s your peer connection info.”
  4. Browsers establish WebRTC peer connections directly to the JVB. The video and audio flow peer-to-peer initially, but the JVB rebroadcasts to ensure everyone gets the stream. This is the Selective Forwarding Unit (SFU) model — different from Zoom’s media server approach.

That peer-to-peer connection is the critical part: the JVB must be reachable on UDP port 10000 from every participant’s machine. This is why NAT traversal becomes your biggest headache.

For most home and office connections, UDP/10000 outbound works fine. But when participants sit behind carrier-grade NAT (bad ISP), symmetric NAT (restrictive corporate networks), or multiple layers of firewalls, the connection dies. Enter: TURN servers.

A TURN server (Traversal Using Relays around NAT) is a fallback relay. If WebRTC can’t establish a direct peer connection, TURN forces the media through a relay server. It’s higher latency and more CPU-intensive, but it guarantees connectivity.

For a homelab, you either add a TURN server to your Jitsi setup, or you pray your participants aren’t behind the worst NAT possible. (Spoiler: some will be. Add a TURN server.)


Deployment: The Docker Way

The official Jitsi repository provides a battle-tested Docker Compose setup. You’re not starting from scratch; you’re just customizing it.

The minimum setup is dead simple:

.env
# .env — keep this in your docker-compose directory
DOMAIN=jitsi.example.com
LETSENCRYPT_EMAIL=[email protected]
# JWT auth (set this to enable)
ENABLE_AUTH=1
AUTH_TYPE=jwt
JWT_APP_ID=myapp
JWT_APP_SECRET=your-super-secret-key-min-32-chars-long!
# TURN server (OPTIONAL but HIGHLY RECOMMENDED)
JVB_STUN_SERVERS=stun.l.google.com:19302,stun1.l.google.com:19302
TURN_SERVER=turn.example.com
TURN_PORT=443
TURN_TRANSPORT=tcp
TURN_SECRET=your-turn-secret
# Memory and threading for JVB
JVB_INIT_MEMORY=2g
JVB_MAX_MEMORY=3g
JVB_THREAD_COUNT=4
# Misc
TZ=UTC
JICOFO_AUTH_TYPE=jwt
JIBRI_RECORDING_USE_HOST_CLOCK=true

This .env file controls nearly everything. The DOMAIN must match your DNS and SSL cert. The JWT_APP_SECRET must be at least 32 characters (seriously, don’t skimp).

Now the Docker Compose file. Clone the official repo first:

Terminal window
git clone https://github.com/jitsi/docker-jitsi-meet.git
cd docker-jitsi-meet
cp .env.example .env
# Edit .env with your settings (see above)

The official docker-compose.yml is massive and imports a lot of config from .env. For self-hosting, you probably want a docker-compose.override.yml to adjust resource limits and add TURN support:

docker-compose.override.yml
version: '3.8'
services:
# JVB needs CPU and memory — don't starve it
jvb:
mem_limit: 3500m
memswap_limit: 4000m
cpus: "2.0"
environment:
JVB_TCP_HARVESTER_DISABLED: "false"
JVB_TCP_PORT: 4443
# Fallback ICE servers (critical for NAT traversal)
JVB_STUN_SERVERS: "stun.l.google.com:19302,stun1.l.google.com:19302"
JVB_OPTS: "-Dnet.java.sip.communicator.impl.protocol.jabber.SEND_PRESENCE_SUBSCRIPTION_FIRST=true"
# Prosody (XMPP server)
prosody:
mem_limit: 1000m
memswap_limit: 1200m
cpus: "1.0"
# Jicofo (conference orchestrator)
jicofo:
mem_limit: 800m
memswap_limit: 1000m
cpus: "1.0"
# Web frontend
web:
mem_limit: 500m
memswap_limit: 600m
cpus: "0.5"
# Optional: Jibri for recording
# jibri:
# mem_limit: 2000m
# memswap_limit: 2500m
# cpus: "2.0"
# cap_add:
# - SYS_ADMIN
# devices:
# - /dev/snd:/dev/snd

These resource limits prevent one container from consuming your entire server. JVB gets the most (it’s doing the heavy lifting), Prosody and Jicofo are modest, and the web frontend is tiny.


Networking: The Part That Will Haunt You

You’ve got Docker running. Services are up. You hit https://jitsi.example.com in a browser and the UI loads. You click “create room” and it looks good.

Then you try to join from your phone or a friend’s computer behind their home router, and suddenly the audio is terrible or the connection drops.

This is the JVB UDP port issue.

The JVB listens on UDP/10000. For WebRTC to work, that port must be:

  1. Open at your firewall (inbound from the internet).
  2. Port-forwarded if your Jitsi server is behind NAT (which it probably is in a homelab).
  3. Reachable by the participant’s browser (not blocked by their ISP or corporate network).

Here’s the firewall config for a typical homelab:

Terminal window
# On your Jitsi server (UFW example)
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow 10000/udp
sudo ufw allow 22/tcp
# If you're behind a router/firewall, forward:
# WAN:10000/UDP → LAN:10000/UDP (to your Jitsi server's internal IP)
# Check it's working from outside:
# (from a machine on the public internet)
nc -u -v -w 3 your-jitsi.example.com 10000
# should NOT hang — if it does, port forwarding is broken

But here’s the thing: UDP/10000 outbound might also be blocked by your ISP (carrier-grade NAT is a real nightmare). And corporate networks? Forget about it.

That’s why TURN servers exist. If UDP/10000 direct isn’t available, TURN relays the media over TCP/443 (which almost everything can reach). The overhead is real—more CPU, more latency—but it’s better than a broken call.

Deploying a TURN server is out of scope here (it’s a whole separate thing), but for testing, you can use Google’s public STUN servers (they’re in the .env already). For production in a homelab, consider running coturn alongside your Jitsi instance on a cheap VPS, or use a managed TURN service.


Authentication: Lock It Down

By default, Jitsi allows anyone on the internet who knows your domain to create and join rooms. Great for a public meeting space. Terrible for privacy.

The simplest auth method is JWT (JSON Web Tokens). Set ENABLE_AUTH=1 and AUTH_TYPE=jwt in .env, and now room creation and guest access require a valid JWT token signed with your secret.

The token flow:

  1. Your app generates a JWT signed with JWT_APP_SECRET, including claims like sub (user ID) and room (room name).
  2. Participant joins: https://jitsi.example.com/room#jwt=TOKEN.
  3. Jitsi verifies the signature and grants access.

Here’s a quick Python example to generate a token:

generate_jwt.py
import jwt
import time
SECRET = "your-super-secret-key-min-32-chars-long!"
APP_ID = "myapp"
def generate_token(user_id, room_name, user_name="Guest"):
payload = {
"aud": "jitsi",
"iss": APP_ID,
"sub": APP_ID,
"room": room_name,
"exp": int(time.time()) + 3600, # 1 hour validity
"context": {
"user": {
"id": user_id,
"name": user_name,
}
}
}
token = jwt.encode(payload, SECRET, algorithm="HS256")
return token
# Usage
token = generate_token("user123", "standup", "Alice")
print(f"Join at: https://jitsi.example.com/standup#jwt={token}")

For tighter integration, you can use AUTH_TYPE=ldap or AUTH_TYPE=saml to hook into your enterprise directory. Or use ENABLE_GUEST_DENY=1 to require authentication for everything.


Recording: Jibri and the Phantom Browser

Jitsi can record calls using Jibri (Jitsi Broadcasting Infrastructure). It’s a bit weird: Jibri is literally a headless browser instance that joins the call and records the screen. It works, but it’s resource-hungry.

To enable recording, uncomment the jibri service in docker-compose.yml and ensure:

# In docker-compose.yml
jibri:
image: jitsi/jibri:${JIBRI_TAG}
cap_add:
- SYS_ADMIN
devices:
- /dev/snd:/dev/snd
environment:
JIBRI_RECORDER_UI_ENABLED: "false"
JIBRI_RECORDING_USE_HOST_CLOCK: "true"
JIBRI_XMPP_USER: jibri
JIBRI_XMPP_PASSWORD: ${JIBRI_XMPP_PASSWORD}

Jibri requires actual audio hardware (or a dummy device in Docker), which is annoying in containerized environments. It also eats CPU. For a homelab, only enable it if you’re regularly recording and can spare the resources.

Recordings save to /config/recordings/ inside the container—mount that as a volume and pull them to your NAS or S3 afterward.


End-to-End Encryption: The Privacy Flex

By default, Jitsi media is encrypted in transit (TLS for signaling, SRTP for media), but the JVB can see the unencrypted streams (it has to, to forward them).

If you want true end-to-end encryption—where the JVB sees only encrypted blobs it can’t decode—enable E2EE (End-to-End Encryption):

Terminal window
ENABLE_E2EE=1

This uses WebRTC insertable streams and FrameCryptor to encrypt video/audio before it leaves the browser. The JVB forwards encrypted frames without decrypting them.

Tradeoff: E2EE disables screensharing and recording. The browsers decrypt on the fly, but the server can’t see the content to forward it separately, so you’re limited to peer-to-peer. For a Tuesday standup with three people, this is fine. For a webinar with 50 viewers, this breaks.

Use E2EE for highly sensitive calls. Leave it off for everyday use.


Scaling Beyond One Server

Your Tuesday call works great. Then you’re invited to a departmental all-hands. 35 people. Your 2-core JVB is now at 90% CPU, the audio starts dropping, and someone complains about lag.

Time to scale.

Vertical scaling (bigger VM, more CPU/RAM) is the lazy option. A 4-core, 8 GB machine can handle ~20 participants. 8-core, 16 GB can push 40-50. Eventually you hit diminishing returns—a single JVB can’t exceed ~50 concurrent participants without becoming a bottleneck.

Horizontal scaling (multiple JVBs) is the real solution. You run multiple JVB instances (on separate servers) and configure Octo (Jitsi’s multi-bridge cascading protocol) so they talk to each other. Participants are load-balanced across bridges based on region or capacity.

Octo setup is complex (requires Jicofo config, bridge discovery, inter-bridge bandwidth management), and honestly, it’s overkill for a homelab. If you need it, you’re probably running a production service, not a hobby video conference.

For self-hosting, the practical limit is “stay on one box” or “move to a bigger box.” Accept that constraint.


Real Homelab Tradeoffs

Let’s be honest about what you’re getting and what you’re losing:

Advantages:

Disadvantages:

The real tradeoff is effort vs. cost. Zoom costs $15/month per user and “just works.” Jitsi costs you ~$20/month for a decent VPS and 4 hours of setup plus debugging NAT issues at 11 PM.

For a team of 3–20 people who value privacy and don’t mind occasional hiccups, Jitsi is worth it. For anything bigger or more critical, Zoom or Google Meet is probably the right call.


Tips for a Smooth Deployment

  1. Don’t run on a $5 VPS. Seriously. Jitsi is CPU-bound. A tiny shared instance will be slow and frustrating. Minimum: 2 cores, 4 GB RAM, dedicated. If it’s shared hosting, skip it.

  2. Test NAT traversal before going live. Use stunclient or a similar tool to confirm UDP/10000 is reachable. Have a friend in a different network join and report audio/video quality. If it’s bad, deploy a TURN server.

  3. Monitor resources during calls. Watch docker stats while people are on a call. If JVB is constantly > 80% CPU, you’re overprovisioned or have too many participants.

  4. Use Let’s Encrypt for HTTPS, but set up auto-renewal. The official Docker Compose handles this with certbot, but if it fails silently, your cert will expire and everything breaks. Monitor renewal logs.

  5. Back up your config. The .env file, any custom XMPP configs, and JWT secrets—keep these backed up. Losing them means rebuilding from scratch.

  6. E2EE for sensitive calls, disable it for everything else. The CPU savings are real, and most people don’t need it.

  7. TURN server is not optional if your participants are on corporate networks. Add coturn or use a managed TURN service. Yes, it adds cost. Yes, it’s worth it.


Wrapping Up: Your Private Conference Room

You’ve now got the blueprint for running your own video conference infrastructure. Docker handles the heavy lifting, Let’s Encrypt keeps it secure, and the JVB relays the calls.

Is it as polished as Zoom? No. Will it require occasional tweaking? Absolutely. Will it give you the warm fuzzy feeling of owning your own infrastructure and knowing your calls aren’t being vacuumed up for training data? 100%.

Deploy it. Test it with a friend. Fix the NAT issues (and there will be NAT issues). Then enjoy your private, self-hosted video calls where the only person with the recording is you.

Your data. Your server. Your rules.

That’s the whole point.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
ModSecurity vs Coraza WAF

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts