Comment by skatanski

3 hours ago

You could have a dedicated lightweight-agent running a cheap model in parallel to any workload, analysing the workload (like the prompt) and creating "memories" in a vector DB. These could be according to some guidelines. Alternatively, if there's safety risk - storing and approving could be decoupled and split into 2.

0 comments