Comment by bArray
19 days ago
> This approach still works, why do something else?
One issue is that the time provided to mark each piece of work continues to decrease. Sometimes you are only getting 15 minutes for 20 pages, and management believe that you can mark back-to-back from 9-5 with a half hour lunch. The only thing keeping people sane is the students that fail to submit, or submit something obviously sub-par. So where possible, even for designing exams, you try to limit text altogether. Multiple choice, drawing lines, a basic diagram, a calculation, etc.
Some students have terrible handwriting. I wouldn't be against the use of a dumb terminal in an exam room/hall. Maybe in the background it could be syncing the text and backing it up.
> Unless you're specifically testing a student's ability to Google, they don't need access to it.
I've been the person testing students, and I don't always remember everything. Sometimes it is good enough for the students to demonstrate that they understand the topic enough to know where to find the correct information based on a good intuition.
I want to echo this.
Your blue book is being graded by a stressed out and very underpaid grad student with many better things to do. They're looking for keywords to count up, that's it. The PI gave them the list of keywords, the rubric. Any flourishes, turns of phrase, novel takes, those don't matter to your grader at 11 pm after the 20th blue book that night.
Yeah sure, that's not your school, but that is the reality of ~50% of US undergrads.
Very effective multiple choice tests can be given, that require work to be done before selecting an answer, so it can be machine graded. Not ideal in every case but a very quality test can be made multiple choice for hard science subjects
True! Good point!
But again, the test creator matters a lot here too. To make such an exam is quite the labor. Especially as many/most PIs have other better things to do. Their incentives are grant money, then papers, then in a distant 3rd their grad students, and finally undergrad teaching.any departments are explicit on this. To spend the limited time on a good undergrad multiple choice exam is not in the PIs best interest.
Which is why, in this case of a good Scantron exam, they're likely to just farm it out to Claude. Cheap, easy, fast, good enough. A winner in all dimensions.
Also, as an aside to the above, an AI with OCR for your blue book would likely be the best realistic grader too. Needs less coffee after all
This is what my differential equations exams were like almost 20 years ago. Honestly, as a student I considered them brutal (10 questions, no partial credit available at all) even though I'd always been good at math. I scraped by but I think something like 30% of students had to retake the class.
Now that I haven't been a student in a long time and (maybe crucially?) that I am friends with professors and in a relationship with one, I get it. I don't think it would be appropriate for a higher level course, but for a weed-out class where there's one Prof and maybe 2 TAs for every 80-100 students it makes sense.
> Very effective multiple choice tests can be given, that require work to be done before selecting an answer, so it can be machine graded.
As someone who has been part of the production of quite a few high stakes MC tests, I agree with this.
That said, a professor would need to work with a professional test developer to make a MC that is consistently good, valid, and reliable.
Some universities have test dev folks as support, but many/most/all of them are not particularly good at developing high quality MC tests imho.
So, for anyone in a spot to do this, start test dev very early, ideally create an item bank that is constantly growing and being refined, and ideally have some problem types that can be varied from year-to-year with heuristics for keys and distractors that will allow for items to be iterated on over the years while still maintaining their validity. Also, consider removing outliers from the scoring pool, but also make sure to tell students to focus on answering all questions rather than spinning their wheels on one so that naturally persistent examinees are less likely to be punished by poor item writing.
Pros and cons. Multiple choice can be frustrating for students because it's all or nothing. Spend 10 minutes+ on question, make a small calculation error and end up with a zero. It's not a great format for a lot of questions.
2 replies →
For large classes or test questions used over multiple years, you need to take care that the answers are not shared. It means having large question banks which will be slowly collected. A good question can take a while to design, and it can be leaked very easily.
Scantron and a #2 pencil.
Stanford started doing 15 minute exams with ~12 questions to combat LLM use. OTOH I got a final project feedback from them that was clearly done by an LLM :shrug:
> I got a final project feedback from them that was clearly done by an LLM
I've heard of this and have been offered "pre-prepared written feedback banks" for questions, but I write all of my feedback from scratch every time. I don't think students should have their work marked by an LLM or feedback given via an LLM.
An LLM could have a place in modern marking, though. A student submits a piece of work and you may have some high level questions:
1. Is this the work of an LLM?
2. Is this work replicated elsewhere?
3. Is there evidence of poor writing in this work?
4. Are there examples where the project is inconsistent or nonsensical?
And then the LLM could point to areas of interest for the marker to check. This wouldn't be to replace a full read, but would be the equivalent of passing a report to a colleague and saying "is there anything you think I missed here?".
> Some students have terrible handwriting.
Then they should have points deducted for that. Effective communication of answers is part of any exam.
> Then they should have points deducted for that. Effective communication of answers is part of any exam.
Agreed. Then let me type my answers out like any reasonable person would do.
For reference…
For my last written blue book exam (in grad school) in the 90s, the professor insisted on blue books and handwriting.
I asked if I could type my answers or hand write my answers in the blue books and later type them out for her (with the blue book being the original source).
I told her point blank that my “clean” handwriting was produced at about a third of the speed that I can type, and that my legible chicken scratch was at about 80% of my typing rate. I hadn’t handwritten anything longer than a short note in over 5 years. She insisted that she could read any handwriting, and she wasn’t tech savvy enough to monitor any potential cheating in real time (which I think was accurate and fair).
I ended up writing my last sentence as the time ran out. I got an A+ on the exam and a comment about one of my answers being one of the best and most original that she had read. She also said that I would be allowed to type out my handwritten blue book tests if I took her other class.
All of this is to say that I would have been egregiously misgraded if “clean handwriting” had been a requirement. There is absolutely no reason to put this burden on people, especially as handwriting has become even less relevant since that exam I took in the 90s.
I personally don't believe that terrible handwriting should have any hold over a computer science student.
Doctors (medicine) get away with it.
> Then they should have points deducted for that. Effective communication of answers is part of any exam.
...even when it's for a medical reason?