Comment by thethethethe
10 months ago
I'm not sure how this problem can be solved. How do you test a system with emergent properties of this degree that whose behavior is dependent on existing memory of customer chats in production?
10 months ago
I'm not sure how this problem can be solved. How do you test a system with emergent properties of this degree that whose behavior is dependent on existing memory of customer chats in production?
Using prompts know to be problematic? Some sort of... Voight-Kampff test for LLMs?
I doubt it's that simple. What about memories running in prod? What about explicit user instructions? What about subtle changes in prompts? What happens when a bad release poisons memories?
The problem space is massive and is growing rapidly, people are finding new ways to talk to LLMs all the time