Comment by ianbicking

2 months ago

What's been your experience with the Realtime API? I've been doing LLM with voice, but haven't really given it a try – the price is so high, and it feels like it's much harder to control. Specifically that you just get one system prompt and then the model takes over entirely. (Though looking at the API, I see you can inject text and do some other things to play around with the session.)

2 comments

ianbicking

akadeb 2 months ago

I agree, it's still pricy. The cost works out better with `gpt-4o-mini-realtime-preview-2024-12-17`.

Yep its constrained to the system prompt but I pass in conversation history with each new session to keep it relevant. It also supports tool calling which is clutch.

Have you tried Hume AI? They've got a neat suite of APIs that give you more control on each session.

ianbicking 2 months ago

Hume has been on my radar for a long time, but I've never actually used their products. They keep coming out with new lines and yet I never see anyone talk about them... I'm not sure why? Though it's so hard to figure out their offerings, and some seem to actually be wrappers around other LLMs...
Do you know what Hume's latency is like? The completely vertically integrated Realtime API is pretty compelling because of that latency, but it's not as clear to me how they would make that all work with their hybrid system.