Comment by teruakohatu
3 hours ago
The pelican is really getting old as an a standalone evaluation metric. By now they are certainly going to be in training set if not explicitly tuned to produce it for the press on HN alone.
Keep the pelican but isn’t it time to add something else more novel that all current and past models struggle with?
Relevant: https://news.ycombinator.com/item?id=47839493