← Back to context

Comment by Balgair

7 hours ago

I want to echo this.

Your blue book is being graded by a stressed out and very underpaid grad student with many better things to do. They're looking for keywords to count up, that's it. The PI gave them the list of keywords, the rubric. Any flourishes, turns of phrase, novel takes, those don't matter to your grader at 11 pm after the 20th blue book that night.

Yeah sure, that's not your school, but that is the reality of ~50% of US undergrads.

Very effective multiple choice tests can be given, that require work to be done before selecting an answer, so it can be machine graded. Not ideal in every case but a very quality test can be made multiple choice for hard science subjects

  • > Very effective multiple choice tests can be given, that require work to be done before selecting an answer, so it can be machine graded.

    As someone who has been part of the production of quite a few high stakes MC tests, I agree with this.

    That said, a professor would need to work with a professional test developer to make a MC that is consistently good, valid, and reliable.

    Some universities have test dev folks as support, but many/most/all of them are not particularly good at developing high quality MC tests imho.

    So, for anyone in a spot to do this, start test dev very early, ideally create an item bank that is constantly growing and being refined, and ideally have some problem types that can be varied from year-to-year with heuristics for keys and distractors that will allow for items to be iterated on over the years while still maintaining their validity. Also, consider removing outliers from the scoring pool, but also make sure to tell students to focus on answering all questions rather than spinning their wheels on one so that naturally persistent examinees are less likely to be punished by poor item writing.

  • True! Good point!

    But again, the test creator matters a lot here too. To make such an exam is quite the labor. Especially as many/most PIs have other better things to do. Their incentives are grant money, then papers, then in a distant 3rd their grad students, and finally undergrad teaching.any departments are explicit on this. To spend the limited time on a good undergrad multiple choice exam is not in the PIs best interest.

    Which is why, in this case of a good Scantron exam, they're likely to just farm it out to Claude. Cheap, easy, fast, good enough. A winner in all dimensions.

    Also, as an aside to the above, an AI with OCR for your blue book would likely be the best realistic grader too. Needs less coffee after all

  • For large classes or test questions used over multiple years, you need to take care that the answers are not shared. It means having large question banks which will be slowly collected. A good question can take a while to design, and it can be leaked very easily.

  • Pros and cons. Multiple choice can be frustrating for students because it's all or nothing. Spend 10 minutes+ on question, make a small calculation error and end up with a zero. It's not a great format for a lot of questions.

    • They're also susceptible to old-school cheating - sharing answers. When I was in college, multiple choice exams were almost extinct because students would form groups and collect/share answers over the years.

      You can solve that but it's a combinatorial explosion.

  • This is what my differential equations exams were like almost 20 years ago. Honestly, as a student I considered them brutal (10 questions, no partial credit available at all) even though I'd always been good at math. I scraped by but I think something like 30% of students had to retake the class.

    Now that I haven't been a student in a long time and (maybe crucially?) that I am friends with professors and in a relationship with one, I get it. I don't think it would be appropriate for a higher level course, but for a weed-out class where there's one Prof and maybe 2 TAs for every 80-100 students it makes sense.