Comment by jboggan
7 hours ago
In 2017 LLMs weren't powerful enough to generate working code on their own, but my goal was to at least create a chatbot that could help you rubber-duck-debug your way to a solution. Unfortunately the tech wasn't quite strong enough for that, and not enough engineers even knew what rubber-duck-debugging was. RIP Duckly.
Trying to train an LLM on two 1080ti's on the StackOverflow corpus in my living room was a vibe though. Good times.
Duckly deserved to actually work. There’s a small irony here: the closest study I found to this, robots specifically built to simulate attentive listening, found they performed no better than an actual inanimate rubber duck for adult engineers. The mechanical signal of listening doesn’t seem to be the active ingredient. Makes me wonder if Duckly would have needed real disagreement to close a gap a duck can’t, not just better natural language.
You're probably on to something with the value of disagreement. I think it's one reason why chatting with current models doesn't create the same stimulation as rubber-ducking used to bring. The models are typically too quick to agree and amplify what you think rather than truly break it down and push back.
And thanks for saying it should have worked, I agree. My chagrin has increased over the years as I have realized the magnitude of my ill-timing.
Has anyone seen a good set of prompts for that disagreement? For the "skeptical eyebrow-raise" or "confused/doubtful head tilt" aspect of rubber ducks?
Agentic uses adversarial expert, steel-man opponent, risk-mitigation and failure-mode analysis. But what about almost brainstorming, but with thought-provoking nudge questions? Or on the other hand, arm-waving fight-club style discussion? Or... It's a big design space. I used to go to lots of research talks at MIT, in assorted departments. The post-talk Q&A question cultures varied a lot. Like encompassing both "leaves the speaker in tears", and "nudge so subtle, you won't quickly get it if you've not already spotted the fatal flaw in the work".
So aside from dialing down the "transformative insight!" silliness, there seems a rich multi-agent space to explore. Even multi-persona - a group "let's help you explore your argument/thesis" could be valuable in education. Hmm, has anyone used multiple rubber ducks at the same time, to host multiple roles? Hats? ...
2017 is a bit early to refer to them as LLMs. I'm not sure when exactly we started to refer to LMs as 'large', but I don't think it was before GPT2 (2019). That said, from the NLP work I've done, it was much more interesting working on small specialized models.
perhaps it is time to resurrect Duckly queue Frankenstein music and thunder in background
It's AIive!
AI ive, the thinnest neural network ever.