The advantages of a public radio quiz game like the Sunday Puzzle is that it doesn’t test for esoteric knowledge, and the challenges are phrased such that models can’t draw on “rote memory ...
At least one model, DeepSeek’s R1, gives solutions it knows to be wrong for some of the Sunday Puzzle questions. R1 will state verbatim “I give up,” followed by an incorrect answer chosen ...
当前正在显示可能无法访问的结果。
隐藏无法访问的结果