You Kicked Out Alexa. Now What?
You did it. You deregistered the Echo, unplugged the Google Home, gave them both a little speech about privacy, and dropped them in the donation bin. Felt great. Very principled.
Then your spouse tried to turn off the bedroom lights at midnight and had to fumble around in the dark for the physical switch like some kind of animal from the before-times.
This is the price of self-hosting voice assistants. It works — really well, actually — but you have to build it yourself and understand what you’re building. The two serious options in the Home Assistant ecosystem are Rhasspy (the OG community solution) and the Wyoming protocol stack (Nabu Casa’s modular successor that became mainstream around 2025). They’re not the same thing, they’re not interchangeable, and picking the wrong one for your situation is going to cost you a weekend.
Let’s sort this out.
The Brief History (Skip If You Don’t Care)
Rhasspy showed up around 2019 as the answer to “how do I do offline voice in Home Assistant without Alexa?” It was glorious for its time: a monolithic app with a web GUI, built-in support for a dozen STT engines, intent handling, wake words, the whole pipeline in one container. The community loved it.
The problem is that “one container with everything” is both its strength and its ceiling. Rhasspy 2.5 is stable but maintenance has slowed significantly. The Rhasspy 3 branch — which was supposed to modularize the whole thing — stalled in development. It exists, but it’s not ready for production use and there’s no clear ETA.
Meanwhile, in 2023, Nabu Casa (the company behind Home Assistant) announced the Wyoming protocol: a lightweight, streaming audio protocol for connecting modular voice services. Instead of one monolithic app, you get separate services for each piece of the pipeline:
- faster-whisper — STT (speech-to-text), runs Whisper models locally
- Piper — TTS (text-to-speech), neural voices, fast, sounds good
- openWakeWord — wake word detection, runs on-device before any STT happens
- Wyoming Satellite — the “ears and mouth” component that runs on the device itself (an ESP32-S3, a Pi Zero 2W, whatever)
Each service runs in its own container, talks Wyoming protocol, and Home Assistant orchestrates them through the Wyoming integration. By 2025, this is the stack Nabu Casa is actively developing and shipping hardware for.
Hardware Reality Check
Before we dig into config, let’s be honest about what you’re running this on, because model choice and expected performance vary wildly.
| Hardware | Recommended STT Model | Expected Latency |
|---|---|---|
| Raspberry Pi 4 (4GB) | faster-whisper tiny | 2-4s (rough) |
| Pi 4 + Coral USB TPU | faster-whisper small | ~1s |
| Intel N100 mini-PC | faster-whisper medium | <1s |
| Older x86 with 8GB RAM | faster-whisper medium | 1-2s |
Tiny on a Pi 4 without acceleration is frustrating. It transcribes correctly most of the time but the pause between “Hey Jarvis” and anything happening is long enough that your family will go back to light switches. If you’re on bare Pi 4 hardware, either add a Coral USB accelerator or accept that this is a personal project, not a household rollout.
N100 mini-PCs (Beelink EQ12, GMKtec M5 Plus, etc.) have become the homelab sweet spot. They’re around $150-180, run 24/7 on ~10W, and handle faster-whisper medium without breaking a sweat. Medium is where it actually feels responsive.
Wyoming Stack: The Self-Hosted Setup
Here’s a practical Docker Compose that puts the core Wyoming services on a Pi or mini-PC. This assumes you’re running Home Assistant separately (HA OS, HA Container, whatever).
services: wyoming-whisper: image: rhasspy/wyoming-faster-whisper:latest container_name: wyoming-whisper restart: unless-stopped volumes: - ./whisper-data:/data ports: - "10300:10300" command: > --uri tcp://0.0.0.0:10300 --model medium-int8 --language en --device cpu environment: - TZ=America/New_York
wyoming-piper: image: rhasspy/wyoming-piper:latest container_name: wyoming-piper restart: unless-stopped volumes: - ./piper-data:/data ports: - "10200:10200" command: > --uri tcp://0.0.0.0:10200 --voice en_US-lessac-medium environment: - TZ=America/New_York
wyoming-openwakeword: image: rhasspy/wyoming-openwakeword:latest container_name: wyoming-openwakeword restart: unless-stopped ports: - "10400:10400" command: > --uri tcp://0.0.0.0:10400 --preload-model ok_nabu environment: - TZ=America/New_YorkA few things worth knowing here:
Model variants for faster-whisper: tiny, tiny-int8, base, base-int8, small, small-int8, medium, medium-int8. The -int8 quantized versions use roughly half the memory with minimal accuracy loss. Start with medium-int8 if your hardware can handle it.
Piper voices: Lessac medium is a solid default. There are dozens of voices at huggingface.co/rhasspy/piper-voices. If you want your HA to sound less like a GPS and more like a person, spend 10 minutes browsing them.
openWakeWord models: ok_nabu, hey_jarvis, alexa, hey_mycroft. You can load multiple with repeated --preload-model flags. They all run simultaneously and it’s cheap on CPU.
Adding Wyoming to Home Assistant
In HA: Settings → Devices & Services → Add Integration → Wyoming Protocol
Add it three times — once pointing at each service (whisper on 10300, piper on 10200, openWakeWord on 10400). Then go to Settings → Voice Assistants, create a pipeline, and wire them together. It takes about five minutes and it just works.
The Voice PE Puck and M5Stack Atom Echo
If you want actual always-on voice endpoints in rooms (not just talking to a tablet), you have two reasonable cheap options.
Home Assistant Voice Preview Edition
This is Nabu Casa’s official hardware — an ESP32-S3-based puck with a multi-mic array and speaker. It runs the Wyoming Satellite firmware, connects to your HA instance over WiFi, and uses your self-hosted Wyoming services for STT/TTS. Everything stays local.
Flash it with ESPHome (the add-on handles this automatically), and in HA it shows up as a satellite device. Pick which voice pipeline it uses, done. It’s the path of least resistance.
M5Stack Atom Echo ($20)
If you don’t want to wait for stock on the Voice PE or you want to scatter several of these around cheaply, the Atom Echo is the classic choice. It’s a tiny ESP32 brick with a built-in microphone and speaker. Not great for noisy rooms, but fine for a bedroom or office.
Flash it with the ESPHome Wyoming Satellite firmware. There’s a community-maintained config:
esphome: name: atom-echo-bedroom friendly_name: Bedroom Echo
esp32: board: m5stack-atom
wifi: ssid: !secret wifi_ssid password: !secret wifi_password
api: encryption: key: !secret api_encryption_key
ota: platform: esphome password: !secret ota_password
i2s_audio: - id: i2s_in i2s_lrclk_pin: GPIO33 i2s_bclk_pin: GPIO19 - id: i2s_out i2s_lrclk_pin: GPIO33 i2s_bclk_pin: GPIO19
microphone: - platform: i2s_audio i2s_audio_id: i2s_in adc_pin: GPIO23 id: mic
speaker: - platform: i2s_audio i2s_audio_id: i2s_out dac_pin: GPIO22 id: spk
voice_assistant: microphone: mic speaker: spk noise_suppression_level: 2 auto_gain: 31dBFS volume_multiplier: 2.0 wake_word: okay nabu on_wake_word_detected: - light.turn_on: id: led blue: 100% on_listening: - light.turn_on: id: led blue: 50% on_tts_start: - light.turn_on: id: led green: 100% on_end: - light.turn_off: id: led
light: - platform: neopixelbus id: led type: GRB pin: GPIO27 num_leds: 1 name: LEDThe wake word detection for the Atom Echo runs on-device with openWakeWord — audio never leaves the device until you’ve said the magic words. Only then does it stream audio to your Wyoming stack for transcription. That’s the privacy model: local wake word, local STT, local TTS, local intent handling.
LLM Integration: Talking to Ollama
Out of the box, Wyoming + HA handles structured commands (“turn off the kitchen lights”, “set thermostat to 70”) through the built-in Conversation agent. For that, it’s excellent.
But if you want free-form conversation or more complex reasoning, you can replace the Conversation agent backend with a local LLM. The relevant integrations:
Home Assistant built-in: Settings → Voice Assistants → your pipeline → Conversation Agent. You can swap the default “Home Assistant” agent for any LLM integration.
Ollama integration: There’s a community integration called hass-ollama-conversation (available through HACS) that connects HA’s Conversation agent to a local Ollama instance. It sends your transcribed query to Ollama, gets a response, and passes it back to Piper for TTS.
Extended OpenAI Conversation: Another HACS integration that works with any OpenAI-compatible API — which includes Ollama when you run it with --api flag.
The practical config in configuration.yaml for the Ollama integration:
# After installing via HACSollama: host: http://192.168.1.50:11434 model: llama3.2:3b prompt: > You are a smart home assistant. Answer concisely in 1-2 sentences. Control devices when asked. The current time is {{ now() }}.Llama 3.2 3B is a good choice here — it’s fast enough on modest hardware, understands home automation context, and doesn’t require a GPU. The 7B and 8B models give better reasoning but add noticeable latency to what should feel like a snappy voice interaction.
Honest take: the LLM path adds latency and complexity. For “turn off lights” it’s overkill. For “is the front door locked and did I leave any windows open?” it’s actually useful because it can synthesize state from multiple entities rather than requiring you to ask three separate questions.
Rhasspy: When It’s Still the Right Call
Let’s be fair to Rhasspy, because “use Wyoming for everything new” doesn’t mean Rhasspy is dead for everyone.
Keep Rhasspy if:
- You’ve got a working Rhasspy 2.5 setup with custom intents and slot programs that you’ve tuned for years. The migration pain is real and Wyoming’s intent handling is simpler by design.
- You need the Rhasspy GUI intent builder. Wyoming assumes you’re comfortable with YAML and HA automations; Rhasspy had a proper web UI for building intents visually.
- You’re running on hardware where Wyoming’s modular approach (multiple containers, multiple ports) adds complexity you don’t want.
- Your STT requirements are unusual — Rhasspy 2.5 supports Kaldi, DeepSpeech, and other backends that Wyoming doesn’t.
Use Wyoming for:
- Anything new. There’s no reason to start a fresh deployment on Rhasspy when Wyoming is what’s being actively developed.
- Hardware integration. The Voice PE puck, the Atom Echo, and any future HA-certified voice hardware speaks Wyoming natively.
- Modular upgrades. Want to swap your wake word engine without touching STT? Wyoming’s architecture makes that a one-line change.
- Long-term viability. Rhasspy 2.5 maintenance has slowed and Rhasspy 3 is stalled. Wyoming is where Nabu Casa is putting resources.
If you’re on Rhasspy 2.5 and it’s working, the migration isn’t urgent. But plan for it eventually — especially if you want to use the newer HA voice hardware.
Troubleshooting the Common Pain Points
Wake word false positives: openWakeWord’s sensitivity can be tuned. In the wyoming-openwakeword command, add --threshold 0.5 (default is lower). Higher = less sensitive = fewer false triggers.
STT accuracy drops for specific words: Faster-whisper is trained on general speech. If your entity names are unusual (“turn on Zephyrus” for a PC), add them as aliases in HA or retrain with custom vocabulary. The simpler fix: rename the entity to something Whisper handles naturally.
Audio echo/feedback on the Atom Echo: The noise_suppression_level: 2 in the ESPHome config helps. Also make sure your Piper TTS volume isn’t blasting — the Atom Echo mic will pick it up and try to transcribe the HA response. Use volume_multiplier to tune it.
Wyoming services not connecting: Check that your firewall isn’t blocking the Wyoming ports (10200, 10300, 10400) between the server running the containers and your HA instance. These aren’t HTTP — HA opens a persistent TCP connection to each service.
Piper voice sounds robotic: You’re using a low-quality voice model. Switch from en_US-lessac-low to en_US-lessac-medium or en_US-lessac-high. The high-quality models are around 60MB and noticeably better.
The Bottom Line
Wyoming is the future of local voice in Home Assistant. It’s modular, actively developed, and the official HA voice hardware speaks it natively. If you’re starting fresh, use Wyoming — full stop.
Rhasspy isn’t dead, but it’s living in maintenance mode. If you have a working Rhasspy 2.5 deployment you like, no one’s forcing you to migrate today. But if you’re spinning up voice for the first time or adding new satellite devices, Wyoming is the clear choice.
On hardware: don’t underestimate the model size question. A Pi 4 with faster-whisper tiny will frustrate you and your family. Get an N100 box, add a Coral USB to your Pi, or accept that voice assistant latency is going to be a recurring topic at dinner. The $150 mini-PC investment genuinely transforms the experience from “cool experiment” to “thing the household actually uses.”
Your spouse wants to turn off the bedroom lights by voice. Wyoming on an N100 with a couple of Atom Echoes and medium-int8 can actually deliver that — no cloud, no Alexa, no data leaving your network. It just takes an afternoon to set up instead of 30 seconds with a commercial product.
That’s the trade you made when you donated those Echos. It was the right call.