Comment by AIPedant
9 months ago
Yes, here's the link: https://arxiv.org/abs/2503.21934v1
Anecdotally, I've been playing around with o3-mini on undergraduate math questions: it is much better at "plug-and-chug" proofs than GPT-4, but those problems aren't independently interesting, they are explicitly pedagogical. For anything requiring insight, it's either:
1) A very good answer that reveals the LLM has seen the problem before (e.g. naming the theorem, presenting a "standard" proof, using a much more powerful result)
2) A bad answer that looks correct and takes an enormous amount of effort to falsify. (This is the secret sauce of LLM hype.)
I dread undergraduate STEM majors using this thing - I asked it a problem about rotations and spherical geometry, but got back a pile of advanced geometric algebra, when I was looking for "draw a spherical triangle." If I didn't know the answer, I would have been badly confused. See also this real-world example of an LLM leading a recreational mathematician astray: https://xcancel.com/colin_fraser/status/1900655006996390172#...
I will add that in 10 years the field will be intensely criticized for its reliance on multiple-choice benchmarks; it is not surprising or interesting that next-token prediction can game multiple-choice questions!
This is a paper by INSAIT researchers - a very young institute which hired most of its PHD staff only in the last 2 years, basically onboarding anyone who wanted to be part of it. They were waiving their BG-GPT on national TV in the country as a major breakthrough, while it was basically was a Mistral fine-tuned model, that was eventually never released to the public, nor the training set.
Not sure whether their (INSAIT's) agenda is purely scientific, as there's a lot of PR on linkedin by these guys, literally celebrating every PHD they get, which is at minimum very weird. I'd take anything they release with a grain of sand if not caution.
Half the researchers are at ETH Zurich (INSAIT is a partnership between EPFL, ETH and Sofia) - hardly an unreliable institution.
[dead]
In my experience LLMs can't get basic western music theory right, there's no way I would use an LLM for something harder than that.
While I may be mistaken, but I don't believe that LLMs are trained on a large corpus of machine readable music representations, which would arguably be crucial to strong performance in common practice music theory. I would also surmise that most music theory related datasets largely arrive without musical representations altogether. A similar problem exists for many other fields, particularly mathematics, but it is much more profitable to invest the effort to span such representation gaps for them. I would not gauge LLM generality on music theory performance, when its niche representations are likely unavailable in training and it is widely perceived as having miniscule economic value.
music theory is a really good test because in my experience the AI is extremely bad at it
> In my experience LLMs can't get basic western music theory right, there's no way I would use an LLM for something harder than that.
This take is completely oblivious, and frankly sounds like a desperate jab. There are a myriad of activities whose core requirement is a) derive info from a complex context which happens to be supported by a deep and plentiful corpus, b) employ glorified template and rule engines.
LLMs excel at what might be described as interpolating context following input and output in natural language. As in a chatbot that is extensivey trained in domain-specific tasks, which can also parse and generate content. There is absolutely zero lines of intellectual work that do not benefit extensively from this sort of tool. Zero.
A desperate jab? But I _want_ LLM's to be able to do basic, deterministic things accurately. Seems like I touched a nerve? Lol.
Discussed here: https://news.ycombinator.com/item?id=43540985 (Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad, 4 points, 2 comments).
Anecdotally: schoolkids are at the leading edge of LLM innovation, and nowadays all homework assignments are explicitly made to be LLM-proof. (Well, at least in my son's school. Yours might be different.)
This effectively makes LLMs useless for education. (Also sours the next generation on LLMs in general, these things are extremely lame to the proverbial "kids these days".)
How do you make homework assignments LLM-proof? There may be a huge business opportunity if that actually works, because LLMs are destroying education at a rapid pace.
By giving pen and paper exams and telling your students that the only viable preparation strategy is doing the hw assignments themselves :)
6 replies →
You just (lol) need to give non-standard problems and demand students to provide reasoning and explanations along with the answer. Yeah, LLMs can "reason" too, but it's obvious when the output comes from an LLM here.
(Yes, that's a lot of work for a teacher. Gone are the days when you could just assign reports as homework.)
8 replies →
> This effectively makes LLMs useless for education.
No. You're only arguing LLMs are useless at regurgitating homework assignments to allow students to avoid doing it.
The point of education is not mindless doing homework.