Comment by sasipi247

8 hours ago

I am working on a system built around the OpenAI Responses API WebSocket mode as performance is something that interests me.

Its like a microservices architecture with NATS JetStream coordinating stuff. I want to keep the worker core as clean as possible, just managing open sockets, threads and continuation.

Document querying is something I am interested in also. This system allows me to pin a document to a socket as a subagent, which is then called upon.

I have hit alot of slip ups along the way, such as infinite loops trying to call OpenAI API, etc ...

Example usage: 10 documents on warm sockets on GPT 5.4 nano. Then the main thread can call out to those other sockets to query the documents in parallel. It allows alot of possibilities : cheaper models for cheaper tasks, input caching and lower latency.

There is also a frontend

Alot of information is in here, just thoughts, designs etc: https://github.com/SamSam12121212/ExplorerPRO/tree/main/docs