Zoom Calls Every Tuesday and You’re Done Asking Permission
Zoom calls every Tuesday. Three participants, no big deal. Except you’ve learned that Zoom records everything by default, keeps transcripts locked in their cloud, and—here’s the thing—you have no idea if your conference room footage is sitting in some data warehouse waiting to be sold off.
You’ve got infrastructure. A homelab, maybe a bare-metal server collecting dust, or a VPS you’re already paying for. So why rent someone else’s video conferencing when you can run your own?
Jitsi Meet is the answer. It’s an open-source video conferencing platform that doesn’t require you to be a distributed systems engineer to run. You deploy it, throw it behind a reverse proxy, point some DNS at it, and boom—you’ve got a fully encrypted, zero-knowledge video call system that’s actually privacy-respecting.
But here’s the real talk: Jitsi is not Zoom. It won’t have that polish, the killer AI-powered features, or the comfort of knowing someone charges money to keep it running. What it will give you is control, privacy, and the smug satisfaction of knowing you’re hosting your own infrastructure. That’s worth the slight edge of “yeah, sometimes the audio drops for a second.”
Let’s build this thing.
What Is Jitsi Meet, Really?
Jitsi Meet is open-source video conferencing software maintained by the Jitsi Foundation (acquired by 8x8 a few years back, but it’s still genuinely open-source). The architecture is modular, which is both its strength and the reason you need to understand what’s happening under the hood.
When you hit a Jitsi instance, you’re actually interacting with several components working together:
- Jitsi Meet — the web frontend. React-based, runs in your browser, handles the UI for joining calls and managing settings.
- Prosody — an XMPP server. Yes, really. XMPP still exists and it’s perfect for real-time messaging and presence management. This handles the signaling layer.
- Jicofo — the Jitsi Conference Focus. This is the orchestra conductor. It manages conference state, decides who talks to whom, and ensures everyone’s on the same page about who’s in the call.
- JVB (Jitsi Videobridge) — the actual videobridge. This is the component that relays video and audio between participants. It’s where CPU and bandwidth live.
The magic is that all of this runs on a single server for small deployments. One VM, four services, all containerized. For three people on a Tuesday, your $20/month VPS is enough.
But here’s the catch: the videobridge is stateful and CPU-hungry. Each participant adds overhead. At around 10 simultaneous participants, you start feeling it. At 20, your server is sweating. At 50, you need a different architecture.
We’ll get to scaling later. For now, assume you’re deploying on a single VM.
Architecture: How This Thing Actually Works
Before you docker-compose your way to a working instance, understand the traffic flow. It matters.
When you join a call in Jitsi, here’s what happens:
- Browser loads
your-jitsi.example.com— Nginx/reverse proxy serves the React frontend (HTTPS). - Browser initiates signaling via XMPP (WebSocket over HTTPS) to Prosody. This is how the browser tells the system “I want to join room X with user Y.”
- Jicofo orchestrates. It tells the JVB “allocate a bridge for this conference” and tells each participant’s browser “here’s your peer connection info.”
- Browsers establish WebRTC peer connections directly to the JVB. The video and audio flow peer-to-peer initially, but the JVB rebroadcasts to ensure everyone gets the stream. This is the Selective Forwarding Unit (SFU) model — different from Zoom’s media server approach.
That peer-to-peer connection is the critical part: the JVB must be reachable on UDP port 10000 from every participant’s machine. This is why NAT traversal becomes your biggest headache.
For most home and office connections, UDP/10000 outbound works fine. But when participants sit behind carrier-grade NAT (bad ISP), symmetric NAT (restrictive corporate networks), or multiple layers of firewalls, the connection dies. Enter: TURN servers.
A TURN server (Traversal Using Relays around NAT) is a fallback relay. If WebRTC can’t establish a direct peer connection, TURN forces the media through a relay server. It’s higher latency and more CPU-intensive, but it guarantees connectivity.
For a homelab, you either add a TURN server to your Jitsi setup, or you pray your participants aren’t behind the worst NAT possible. (Spoiler: some will be. Add a TURN server.)
Deployment: The Docker Way
The official Jitsi repository provides a battle-tested Docker Compose setup. You’re not starting from scratch; you’re just customizing it.
The minimum setup is dead simple:
# .env — keep this in your docker-compose directoryDOMAIN=jitsi.example.com
# JWT auth (set this to enable)ENABLE_AUTH=1AUTH_TYPE=jwtJWT_APP_ID=myappJWT_APP_SECRET=your-super-secret-key-min-32-chars-long!
# TURN server (OPTIONAL but HIGHLY RECOMMENDED)JVB_STUN_SERVERS=stun.l.google.com:19302,stun1.l.google.com:19302TURN_SERVER=turn.example.comTURN_PORT=443TURN_TRANSPORT=tcpTURN_SECRET=your-turn-secret
# Memory and threading for JVBJVB_INIT_MEMORY=2gJVB_MAX_MEMORY=3gJVB_THREAD_COUNT=4
# MiscTZ=UTCJICOFO_AUTH_TYPE=jwtJIBRI_RECORDING_USE_HOST_CLOCK=trueThis .env file controls nearly everything. The DOMAIN must match your DNS and SSL cert. The JWT_APP_SECRET must be at least 32 characters (seriously, don’t skimp).
Now the Docker Compose file. Clone the official repo first:
git clone https://github.com/jitsi/docker-jitsi-meet.gitcd docker-jitsi-meetcp .env.example .env# Edit .env with your settings (see above)The official docker-compose.yml is massive and imports a lot of config from .env. For self-hosting, you probably want a docker-compose.override.yml to adjust resource limits and add TURN support:
version: '3.8'
services: # JVB needs CPU and memory — don't starve it jvb: mem_limit: 3500m memswap_limit: 4000m cpus: "2.0" environment: JVB_TCP_HARVESTER_DISABLED: "false" JVB_TCP_PORT: 4443 # Fallback ICE servers (critical for NAT traversal) JVB_STUN_SERVERS: "stun.l.google.com:19302,stun1.l.google.com:19302" JVB_OPTS: "-Dnet.java.sip.communicator.impl.protocol.jabber.SEND_PRESENCE_SUBSCRIPTION_FIRST=true"
# Prosody (XMPP server) prosody: mem_limit: 1000m memswap_limit: 1200m cpus: "1.0"
# Jicofo (conference orchestrator) jicofo: mem_limit: 800m memswap_limit: 1000m cpus: "1.0"
# Web frontend web: mem_limit: 500m memswap_limit: 600m cpus: "0.5"
# Optional: Jibri for recording # jibri: # mem_limit: 2000m # memswap_limit: 2500m # cpus: "2.0" # cap_add: # - SYS_ADMIN # devices: # - /dev/snd:/dev/sndThese resource limits prevent one container from consuming your entire server. JVB gets the most (it’s doing the heavy lifting), Prosody and Jicofo are modest, and the web frontend is tiny.
Networking: The Part That Will Haunt You
You’ve got Docker running. Services are up. You hit https://jitsi.example.com in a browser and the UI loads. You click “create room” and it looks good.
Then you try to join from your phone or a friend’s computer behind their home router, and suddenly the audio is terrible or the connection drops.
This is the JVB UDP port issue.
The JVB listens on UDP/10000. For WebRTC to work, that port must be:
- Open at your firewall (inbound from the internet).
- Port-forwarded if your Jitsi server is behind NAT (which it probably is in a homelab).
- Reachable by the participant’s browser (not blocked by their ISP or corporate network).
Here’s the firewall config for a typical homelab:
# On your Jitsi server (UFW example)sudo ufw allow 80/tcpsudo ufw allow 443/tcpsudo ufw allow 10000/udpsudo ufw allow 22/tcp
# If you're behind a router/firewall, forward:# WAN:10000/UDP → LAN:10000/UDP (to your Jitsi server's internal IP)
# Check it's working from outside:# (from a machine on the public internet)nc -u -v -w 3 your-jitsi.example.com 10000# should NOT hang — if it does, port forwarding is brokenBut here’s the thing: UDP/10000 outbound might also be blocked by your ISP (carrier-grade NAT is a real nightmare). And corporate networks? Forget about it.
That’s why TURN servers exist. If UDP/10000 direct isn’t available, TURN relays the media over TCP/443 (which almost everything can reach). The overhead is real—more CPU, more latency—but it’s better than a broken call.
Deploying a TURN server is out of scope here (it’s a whole separate thing), but for testing, you can use Google’s public STUN servers (they’re in the .env already). For production in a homelab, consider running coturn alongside your Jitsi instance on a cheap VPS, or use a managed TURN service.
Authentication: Lock It Down
By default, Jitsi allows anyone on the internet who knows your domain to create and join rooms. Great for a public meeting space. Terrible for privacy.
The simplest auth method is JWT (JSON Web Tokens). Set ENABLE_AUTH=1 and AUTH_TYPE=jwt in .env, and now room creation and guest access require a valid JWT token signed with your secret.
The token flow:
- Your app generates a JWT signed with
JWT_APP_SECRET, including claims likesub(user ID) androom(room name). - Participant joins:
https://jitsi.example.com/room#jwt=TOKEN. - Jitsi verifies the signature and grants access.
Here’s a quick Python example to generate a token:
import jwtimport time
SECRET = "your-super-secret-key-min-32-chars-long!"APP_ID = "myapp"
def generate_token(user_id, room_name, user_name="Guest"): payload = { "aud": "jitsi", "iss": APP_ID, "sub": APP_ID, "room": room_name, "exp": int(time.time()) + 3600, # 1 hour validity "context": { "user": { "id": user_id, "name": user_name, } } } token = jwt.encode(payload, SECRET, algorithm="HS256") return token
# Usagetoken = generate_token("user123", "standup", "Alice")print(f"Join at: https://jitsi.example.com/standup#jwt={token}")For tighter integration, you can use AUTH_TYPE=ldap or AUTH_TYPE=saml to hook into your enterprise directory. Or use ENABLE_GUEST_DENY=1 to require authentication for everything.
Recording: Jibri and the Phantom Browser
Jitsi can record calls using Jibri (Jitsi Broadcasting Infrastructure). It’s a bit weird: Jibri is literally a headless browser instance that joins the call and records the screen. It works, but it’s resource-hungry.
To enable recording, uncomment the jibri service in docker-compose.yml and ensure:
# In docker-compose.ymljibri: image: jitsi/jibri:${JIBRI_TAG} cap_add: - SYS_ADMIN devices: - /dev/snd:/dev/snd environment: JIBRI_RECORDER_UI_ENABLED: "false" JIBRI_RECORDING_USE_HOST_CLOCK: "true" JIBRI_XMPP_USER: jibri JIBRI_XMPP_PASSWORD: ${JIBRI_XMPP_PASSWORD}Jibri requires actual audio hardware (or a dummy device in Docker), which is annoying in containerized environments. It also eats CPU. For a homelab, only enable it if you’re regularly recording and can spare the resources.
Recordings save to /config/recordings/ inside the container—mount that as a volume and pull them to your NAS or S3 afterward.
End-to-End Encryption: The Privacy Flex
By default, Jitsi media is encrypted in transit (TLS for signaling, SRTP for media), but the JVB can see the unencrypted streams (it has to, to forward them).
If you want true end-to-end encryption—where the JVB sees only encrypted blobs it can’t decode—enable E2EE (End-to-End Encryption):
ENABLE_E2EE=1This uses WebRTC insertable streams and FrameCryptor to encrypt video/audio before it leaves the browser. The JVB forwards encrypted frames without decrypting them.
Tradeoff: E2EE disables screensharing and recording. The browsers decrypt on the fly, but the server can’t see the content to forward it separately, so you’re limited to peer-to-peer. For a Tuesday standup with three people, this is fine. For a webinar with 50 viewers, this breaks.
Use E2EE for highly sensitive calls. Leave it off for everyday use.
Scaling Beyond One Server
Your Tuesday call works great. Then you’re invited to a departmental all-hands. 35 people. Your 2-core JVB is now at 90% CPU, the audio starts dropping, and someone complains about lag.
Time to scale.
Vertical scaling (bigger VM, more CPU/RAM) is the lazy option. A 4-core, 8 GB machine can handle ~20 participants. 8-core, 16 GB can push 40-50. Eventually you hit diminishing returns—a single JVB can’t exceed ~50 concurrent participants without becoming a bottleneck.
Horizontal scaling (multiple JVBs) is the real solution. You run multiple JVB instances (on separate servers) and configure Octo (Jitsi’s multi-bridge cascading protocol) so they talk to each other. Participants are load-balanced across bridges based on region or capacity.
Octo setup is complex (requires Jicofo config, bridge discovery, inter-bridge bandwidth management), and honestly, it’s overkill for a homelab. If you need it, you’re probably running a production service, not a hobby video conference.
For self-hosting, the practical limit is “stay on one box” or “move to a bigger box.” Accept that constraint.
Real Homelab Tradeoffs
Let’s be honest about what you’re getting and what you’re losing:
Advantages:
- Privacy. No third-party access to call data.
- Control. You own the infrastructure, the recordings, the user list.
- Cost. After the initial setup, it’s just your server electricity (or VPS bill).
- Open-source. No vendor lock-in, no surprise feature removals.
Disadvantages:
- Polish is rough. UI lags behind Zoom. Mobile app is mediocre.
- Reliability is your problem. If the server dies, so do your calls.
- NAT traversal is a nightmare for some participants. You’ll spend hours debugging “why can’t Bob from the corporate office hear anything?”
- No built-in integrations like Zoom’s Google Calendar plugin or Salesforce CRM sync.
- Recording is clunky (Jibri is heavy).
The real tradeoff is effort vs. cost. Zoom costs $15/month per user and “just works.” Jitsi costs you ~$20/month for a decent VPS and 4 hours of setup plus debugging NAT issues at 11 PM.
For a team of 3–20 people who value privacy and don’t mind occasional hiccups, Jitsi is worth it. For anything bigger or more critical, Zoom or Google Meet is probably the right call.
Tips for a Smooth Deployment
-
Don’t run on a $5 VPS. Seriously. Jitsi is CPU-bound. A tiny shared instance will be slow and frustrating. Minimum: 2 cores, 4 GB RAM, dedicated. If it’s shared hosting, skip it.
-
Test NAT traversal before going live. Use
stunclientor a similar tool to confirm UDP/10000 is reachable. Have a friend in a different network join and report audio/video quality. If it’s bad, deploy a TURN server. -
Monitor resources during calls. Watch
docker statswhile people are on a call. If JVB is constantly > 80% CPU, you’re overprovisioned or have too many participants. -
Use Let’s Encrypt for HTTPS, but set up auto-renewal. The official Docker Compose handles this with certbot, but if it fails silently, your cert will expire and everything breaks. Monitor renewal logs.
-
Back up your config. The
.envfile, any custom XMPP configs, and JWT secrets—keep these backed up. Losing them means rebuilding from scratch. -
E2EE for sensitive calls, disable it for everything else. The CPU savings are real, and most people don’t need it.
-
TURN server is not optional if your participants are on corporate networks. Add coturn or use a managed TURN service. Yes, it adds cost. Yes, it’s worth it.
Wrapping Up: Your Private Conference Room
You’ve now got the blueprint for running your own video conference infrastructure. Docker handles the heavy lifting, Let’s Encrypt keeps it secure, and the JVB relays the calls.
Is it as polished as Zoom? No. Will it require occasional tweaking? Absolutely. Will it give you the warm fuzzy feeling of owning your own infrastructure and knowing your calls aren’t being vacuumed up for training data? 100%.
Deploy it. Test it with a friend. Fix the NAT issues (and there will be NAT issues). Then enjoy your private, self-hosted video calls where the only person with the recording is you.
Your data. Your server. Your rules.
That’s the whole point.