Yes we knew that as soon as the Attention Is All You Need paper was published, it follows automatically from the fact that the output string of tokens is generated statistically from the input string & the meaning of the strings is not part of the model at all.
September 21, 2025 - 16:52 UTC
1
0
11