Comment by hamdingers
8 hours ago
If you're less concerned about privacy, I use Gemini 2.5 Flash for this and it's exceptionally good and fast as a HA assistant while being much cheaper than the electricity that would be needed to keep a 3090 awake.
The thing that kills this for me (and they even mentioned it) is wake word detection. I have both the HA voice preview and FPH Satellite1 devices, plus have experimented with a few other options like a Raspberry Pi with a conference mic.
Somehow nothing is even 50% good as my Echo devices at picking up the wake word. The assistant itself is far better, but that doesn't matter if it takes 2-3 tries to get it to listen to you. If someone solves this problem with open hardware I'll be immediately buying several.
How about a button?
I'd prefer to physically press a button on an intercom box than having something churning away constantly processing sound.
If I have to go to a thing and push a button, I'd rather the button do the thing I wanted in the first place. Voice assistants are for when my hands are full or I don't want to get up. (I wrote more about my home automation philosophy in another comment[1]).
Also I have all my voice assistant devices mounted to the ceiling
1. https://news.ycombinator.com/item?id=47399909
The pebble index seems like the optimal form for this.
https://repebble.com/index
Could be pressed even if your hands were busy.
If you want to relax some constraints, I made something similar for $10: https://www.stavros.io/posts/i-made-a-voice-note-taker/
4 replies →
Time for a real life Star Trek comm badge
I'm in if I can embed it into my forearm
In the mid 2000s I had a setup where some children's walkie talkie "spy watches" could be used to issue commands to a completely DIY, relay based smart home system.
I'm looking forward to whenever my Pebble ships so I can recreate that experience with this: https://github.com/skylord123/pebble-home-assistant-ws
apple watch gets you close.
Rules out a bunch of cases where your hands are busy handling ingredients in the kitchen, etc
What's been surprising in my experience regarding the wake word is that it recognizes me (adult male) saying the wake word ~95% of the time. However, it only registers the rest of my family (women and children) ~30% of the time.
I have no firsthand knowledge, but I’d strongly bet that the home-assistant effort to donate training data is mostly get adult males, and nearly zero children.
I remember when those systems first started collecting data they were worried kids wouldn't be handled - but they didn't know how to handle the privacy issuses with recording kids so discouraged it. Women being missed is not a surprise - but not anticipated.
This was 2021 (so pre-llm), but I used to work for a company that gathered data for training voice commands (Alexa, Toyota, Sonos, were some clients). Basically, we paid people to read digital assistant scripts at scale.
Your assumptions about training data do not match the demographics of data I collected. The majority of what our work revolved around was getting diversity into the training data. We specifically recruited kids, older folks, women, people with accented/dialected English and just about every variety of speech that we could get our hands on. The companies we worked with were insanely methodical about ensuring that different people were included.
1 reply →
Oh, I'm sure you're right. I've had people in my personal life (non-technical; "AI enthusiasts") laugh at me over concerns about training bias but this is likely a real world example of it.
1 reply →
I have a feeling beamforming microphone arrays might help here, something like this could improve the audio being processed substantially - https://www.minidsp.com/products/usb-audio-interface/uma-8-m....
That's a good call. I have a PS3(?) mic/camera that I was using when I was running the original Mycroft project on a Pi. I wonder if that would help with the inbuilt HA mic not waking for most of my family, most of the time. I will have to look at my VA Preview device and its specs later because I'm not sure if you can connect an external mic to it out-of-the-box.
Alexa devices have these (or used to at least), but Google Home's never did. So it shouldn't be necessary.
Yeah a small (ideally personalized) wakeword model would probably outperform just about any audio wizardry.
Why not use an easier to detect wake “word”, like two claps in quick succession? Or a couple of notes of a melody?
Can't clap if your hands are full and I would not subject my family to my attempts at delivering a melody.
I haven't tried training my own wake word though, I'm tempted to see if it improves things.
What about whistling?
Personally I'd pick "Cthulhu"
What about your wifi APs sensing which room you are in, with your choice of hilarious dance moves as the trigger ?
Funky chicken for Gemini
Penguin dance for OpenAI
Claude?