← Back to context

Comment by protocolture

3 days ago

Keen for this also. Been having issues getting a smooth voice experience from HA to ChatGPT. I dont like the whole wakeword concept for the receiver either. I think theres work to be done on the whole stack.

What's wrong with the wakeword stuff?

Great timing as I was looking into it yesterday as was thinking about writing my own set of agents to run house stuff. I don't want to spent loads of time on voice interaction so HA wakeword stuff would've been useful. If not I'll bypass HA for voice and really only use HA via mcp.

I can do fw dev for micros...but omg do I not want to spend the time looking thru a datasheet and getting something to run efficiently myself these days.

  • You can use the vendor supported wakewords, and they are generally pretty good.

    However-> These are device specific. The devices I purchased for this purpose have very few vendor supported wakewords, but even more prominently, refuse to integrate with HA. Possible firmware issue, but I have reloaded the firmware 30 times. I dont necessarily want to purchase something else for this purpose. Which is where building a bespoke HA audio box becomes its own can of worms.

    But if you want a custom wake word, or more like a wake phrase, you go down a rabbit hole of training/cost/memory etc that starts to get annoying fast.

    I kind of know I am being unreasonable. I dont want a device that just ships off everything it hears to an LLM, even local, that would suck. I just want a third way.

    Then theres other stuff. Like HA has a hard time with providing context to an LLM, because it sends the whole conversation thus far off to the LLM for context. It can get really weird really quickly. This caused me a lot of issues with lights for example. It would remember switching a light on, and if that was in the context, would refuse to switch it on a second time if it turned off due to a rule or manual intervention. But if you dont send the context, you cant have deeper conversations. You cant ask subsequent questions basically.

    • On my new AMD laptop, it took about 90 minutes to run 50k training rounds on OpenWakeWord.

      It's not really a big burden.

      A tiny AI running locally is the third option you want. That's the only reasonable way to do configurable wake word detection

It should participate in all conversations, take initiative and experiment.

  • "Hey, hey, are you still asleep? Using spare cycles, I have designed an optimal recipe for mashed potatoes, as you mentioned ten days ago. I need you to go get some potatoes."

    • A local AI system that hears your conversations, identifies problems, and then uses spare cycles to devise solutions for them is actually an incredible idea. I'm never going to give a cloud system the kind of access it would need to do a really good job, but a local one I control? Absolutely.

      "Hey, are you still having trouble with[succinct summary of a problem it identified]?" "Yes" "I have a solution that meets your requirements as I understand them, and fits in your budget."

      3 replies →

    • I ponder the concept in the 90's. Initially I thought it should be an assistant but with age came wisdom and now I think it should be a virtual drill instructor. "Rise and shine $insult $insult, the sun is up, the store is open, we will be getting some potatoes today, $insult $insult, it was all your idea now apply yourself!" Bright lights flashing, loud music, the shower starts running. "Shower time, you have 7 minutes! $insult $insult" 4 minutes in the coffee machine boots up. "You will be wearing the blue pants, top shelve on the left stack, the green shirt, 7th from the left. Faster faster! $insult $insult"