The advantages of a public radio quiz game like the Sunday Puzzle is that it doesn’t test for esoteric knowledge, and the challenges are phrased such that models can’t draw on “rote memory ...
At least one model, DeepSeek’s R1, gives solutions it knows to be wrong for some of the Sunday Puzzle questions. R1 will state verbatim “I give up,” followed by an incorrect answer chosen ...