Comment by jryio
5 days ago
I would like to see randomized control group studies using study mode.
Does it offer meaningful benefits to students over self directed study?
Does it out perform students who are "learning how to learn"?
What affect does allowing students to make mistakes have compared to being guided through what to review?
I would hope Study Mode would produce flash card prompts and quantize information for usage in spaced repetition tools like Mochi [1] or Anki.
See Andy's talk here [2]
It doesn’t do any of that, it just captures the student market more.
They want a student to use it and say “I wouldn’t have learned anything without study mode”.
This also allows them to fill their data coffers more with bleeding edge education. “Please input the data you are studying and we will summarize it for you.”
> It doesn’t do any of that
Not to be contrarian, but do you have any evidence of this assertion? Or are you just confidently confabulating a response for something outside of the data you've been exposed to? Because a commentor below provided a study that directly contradicts this.
A study that directly contradicts what exactly?
Such a smart play.
Bingo. The scale they're operating at, new features don't have to be useful, they only need to look like they are for the first few minutes.
https://www.nature.com/articles/s41598-025-97652-6
This isn't study mode, it's a different AI tutor, but:
"The median learning gains for students, relative to the pre-test baseline (M = 2.75, N = 316), in the AI-tutored group were over double those for students in the in-class active learning group."
I wonder how much this was a factor:
"The occurrence of inaccurate “hallucinations” by the current [LLMs] poses a significant challenge for their use in education. [...] we enriched our prompts with comprehensive, step-by-step answers, guiding the AI tutor to deliver accurate and high-quality explanations (v) to students. As a result, 83% of students reported that the AI tutor’s explanations were as good as, or better than, those from human instructors in the class."
Not at all dismissing the study, but to replicate these results for yourself, this level of gain over a classroom setting may be tricky to achieve without having someone make class materials for the bot to present to you first
Edit: the authors further say
"Krupp et al. (2023) observed limited reflection among students using ChatGPT without guidance, while Forero (2023) reported a decline in student performance when AI interactions lacked structure and did not encourage critical thinking. These previous approaches did not adhere to the same research-based best practices that informed our approach."
Two other studies failed to get positive results at all. YMMV a lot apparently (like, all bets are off and your learning might go in the negative direction if you don't do everything exactly as in this study)
In case you find it interesting: I deployed an early version of a "lesson administering" bot deployed on a college campus that guides students through tutored activities of content curated by a professor in the "study mode" style -- that is, forcing them to think for themselves. We saw an immediate student performance gain on exams of about 1 stdev in the course. So with the right material and right prompting, things are looking promising.
OpenAI should figure out how to onboard teachers. Teacher uploads context for the year, OpenAI distributes a chatbot to the class that's perma fixed into study mode. Basically like GPT store but with an interface and UX tuned for a classroom.
There's studies showing that LLM makes experienced devs slower in their work. I wouldn't be surprised if it was the same for self study.
However consider the extent to which LLMs make the learning process more enjoyable. More students will keep pushing because they have someone to ask. Also, having fun & being motivated is such a massive factor when it comes to learning. And, finally, keeping at it at 50% the speed for 100% the material always beats working at 100% the speed for 50% the material. Who cares if you're slower - we're slower & faster without LLMs too! Those that persevere aren't the fastest; they're the ones with the most grit & discipline, and LLMs make that more accesible.
The study you're referencing doesn't make that conclusion.
It concludes theres a learning curve that generally takes about 50 hours of time to figure out. The data shows that the one engineer who had more than 50 hours of experience with Cursor actually worked faster.
This is largely my experience, now. I was much slower initially, but I've now figured out the correct way to prompt, guide, and fix the LLM to be effective. I produce way more code and am mentally less fatigued at the end of each day.
People keep citing this study (and it was on the top of HN for a day). But this claim falls flat when you find out that the test subjects had effectively no experience with LLM equipped editors and the 1-2 people in the study that actually did have experience with these tools showed a marked increase in productivity.
Like yeah, if you’ve only ever used an axe you probably don’t know the first thing about how to use a chainsaw, but if you know how to use a chainsaw you’re wiping the floor with the axe wielders. Wholeheartedly agree with the rest of your comment; even if you’re slow you lap everyone sitting on the couch.
I presume you're referring to the recent METR study. One aspect of the study population, which seems like an important causal factor in the results, is that they were working in large, mature codebases with specific standards for code style, which libraries to use, etc. LLMs are much better at producing "generic" results than matching a very specific and idiosyncratic set of requirements. The study involved the latter (specific) situation; helping people learn mainstream material seems more like the former (generic) situation.
(Qualifications: I was a reviewer on the METR study.)
*slower with Sonnet 3.7 on large open source code bases where the developer is a senior member of the project core team.
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
I believe we'll see the benefits and drawbacks of AI augmentation to humans performing various tasks will vary wildly based on the task, the way the AI is being asked to interact, and the AI model.
It was a 16 person study on open source devs that found 50 hours of experience with the tool made people more productive
I would be interested to see if there have already been studies about the efficacy of tutors at good colleges. In my experience (in academia), the students who make it into an Ivy or an elite liberal arts school make extensive use of tutor resources, but not in a helpful way. They basically just get the tutor to work problems for them (often their homework!) and feel like they've "learned" things because tough questions always seems so obvious when you've been shown the answer. In reality, what it means it that they have no experience being confused or having to push past difficult things they were stuck on. And those situations are some of the most valuable for learning.
I bring this up because the way I see students "study" with LLMs is similar to this misapplication of tutoring. You try something, feel confused and lost, and immediately turn to the pacifier^H^H^H^H^H^H^H ChatGPT helper to give you direction without ever having to just try things out and experiment. It means students are so much more anxious about exams where they don't have the training wheels. Students have always wanted practice exams with similar problems to the real one with the numbers changed, but it's more than wanting it now. They outright expect it and will write bad evals and/or even complain to your department if you don't do it.
I'm not very optimistic. I am seeing a rapidly rising trend at a very "elite" institution of students being completely incapable of using textbooks to augment learning concepts that were introduced in the classroom. And not just struggling with it, but lashing out at professors who expect them to do reading or self study.
it makes difference to students who are already motivated. that was the case with youtube.
unfortunately that group is tiny and getting tinier due to dwindling attention span.
Come on. Asking an educational product to do a basic sanity test as to whether it helps is far too high a bar. Almost no educational app does that sort of thing.
I would also be interested to see whether it outperforms students doing literally nothing.