Veronica

Veronica @vmcombs.bsky.social

7mo

“The majority of mainstream evaluations reward hallucinatory behavior.”

The researchers say a “simple” change in evaluation method will change this.

This reads to me like humans who refuse to say “I don’t know the answer” wrote the algos & picked the evaluation criteria.

Our weaknesses in code…

September 22, 2025 - 11:27 UTC