← Back to context

Comment by 0x1ceb00da

1 year ago

This suggests that the AI "brain" receives the user input as text prompt (agent relays the speech prompt to GPT-4o) and generates audio as output (GPT-4o streams speech packets back to the agent).

But when I asked advanced voice mode it said the exact opposite. That it receives input as audio and generates text as output.

Both input and output are audio. This post is about bridging WebRTC audio I/O with an API that itself operates on simple TCP socket streams of raw PCM. For reliability and efficiency you want end users to connect with compressed loss-tolerant Zoom-style streams, and that goes through a middleman which relays to the model API.

Who did you ask? ChatGPT? Not sure if you understand LLMs but its knowledge is based on the training data, it can't reason about itself, it can only hallucinate in this case, sometimes correctly, most times incorrectly.

  • This is also true for petty much all humans and bypassing this limitation is called enlightenment/self realization.

    LLMs don't even have a self so it can never be realized. Just the ego alone exists.