Comment by sulam
1 day ago
If you trust everything the LLM tells you, and you learn from code, then yes the same exact risks apply. But this is not how you use (or should use) LLMs when you’re learning a topic. Instead you should use high quality sources, then ask the LLM to summarize them for you to start with (NotebookLM does this very well for instance, but so can others). Then you ask it to build you a study plan, with quizzes and exercises covering what you’ve learnt. Then you ask it to setup a spaced repetition worksheet that covers the topic thoroughly. At the end of this you will know the topic as well as if you’d taken a semester-long course.
One big technique it sounds like the authors of the OAuth library missed is that LLMs are very good at generating tests. A good development process for today’s coding agents is to 1) prompt with or create a PRD, 2) break this down into relatively simple tasks, 3) build a plan for how to tackle each task, with listed out conditions that should be tested, 3) write the tests, so that things are broken, TDD style and finally 4) write the implementation. The LLM can do all of this, but you can’t one-shot it these days, you have to be a human in the loop at every step, correcting when things go off track. It’s faster, but it’s not a 10x speed up like you might imagine if you think the LLM is just asynchronously taking a PRD some PM wrote and building it all. We still have jobs for a reason.
> Instead you should use high quality sources, then ask the LLM to summarize them for you to start with (NotebookLM does this very well for instance, but so can others).
How do you determine if the LLM accurately reflects what the high-quality source contains, if you haven't read the source? When learning from humans, we put trust on them to teach us based on a web-of-trust. How do you determine the level of trust with an LLM?
> When learning from humans, we put trust on them to teach us based on a web-of-trust.
But this is only part of the story. When learning from another human, you'll also actively try and devise whether they're trustworthy based on general linguistic markers, and will try to find and poke holes in what they're saying so that you can question intelligently.
This is not much different from what you'd do with an LLM, which is why it's such a problem that they're more convincing than correct pretty often. But it's not an insurmountable issue. The other issue is that their trustworthiness will wary in a different way than a human's, so you need experience to know when they're possibly just making things up. But just based on feel, I think this experience is definitely possible to gain.
Because summarizing is one of the few things LLMs are generally pretty good at. Plus you should use the summary to determine if you want to read the full source, kind of like reading an abstract for a research paper before deciding if you want to read the whole thing.
Bonus: the high quality source is going to be mostly AI written anyway
Actually, LLMs aren’t that great for summarizing. It would be a boon for RAG workflows if they were.
I’m still on the lookout for a great model for this.
I did actually use the LLM to write tests, and was pleased to see the results, which I thought were pretty good and thorough, though clearly the author of this blog post has a different opinion.
But TDD is not the way I think. I've never been able to work that way (LLM-assisted or otherwise). I find it very hard to write tests for software that isn't implemented yet, because I always find that a lot of the details about how it should work are discovered as part of the implementation process. This both means that any API I come up with before implementing is likely to change, and also it's not clear exactly what details need to be tested until I've fully explored how the thing works.
This is just me, other people may approach things totally differently and I can certainly understand how TDD works well for some people.