"In a landmark study, OpenAI researchers reveal that large language models will always produce plausible but false outputs, even with perfect data, due to fundamental statistical and computational limits."

www.computerworld.com/article/4059383/openai-admits-ai-hallucinations-are-mathematically-inevitable-not-just-engineering-flaws.html
121
746
3714
So they are admitting it’s worthless
0
0
0
No surprise. It's not "AI".
1
0
33
I call it A”I” because it is artificial.
0
1
3
Hallucination doesn't even mean anything when the machine has zero concept of truth. It's made to generate text mimicking human produced texts. There's a basic comprehension issue about what LLMs are and the people selling those are not here to clarify things.
1
0
3
They picked "Hallucination" because it makes it sound like there's an internal process, that the computer is making a mistake or something. Just admitting that it's all be GIGO the entire time would be pretty fatal to their valuation.
0
0
1
Whew, the Dunning-Kruger and confirmation bias of anti-AI folks in the replies and quotes are just *chef kiss* incredible.

Just another example of the BS Asymmetry Principle: the amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it.
0
0
2
wow, the lie machines lie, what a shocking revalation
0
0
1
So they're inherently less reliable than humans
0
0
1
Are we entering “the trough of disillusionment” phase of the hype cycle?
0
0
1
So, according to OpenAI, their product is worse than useless, and so is everyone else's.
3
0
250
There’s utility in narrowing bands of uncertainty at the very worst!
0
0
1
it’s ok they’ll just need to take even more of society’s money, energy, and water to make a second LLM that can quality check the first one, that will be its dedicated purpose, there’s no alternative
3
0
121
consider looking at the paper for 1 or 2 seconds
1
0
2
This all feels a lot like the "floating point" scandal associated with first-gen Pentium processors in the 90s.
0
0
1
Sounds dumb and dangerous, like a lot of tech bros. No wonder they love it.
1
0
4
Oh look it's what anyone who understood how this technology functions has been saying the entire time
0
0
2
The errors become part of the soup and compound over time. Positive feedback of shit.
0
0
1
As long as it's wrong less often than I am, it's a win for me.
0
0
0
Hmmm. Maybe AI was pushed out/unleased prematurely without proper studies? You don't say.

Color me shocked.
0
0
0
What I find frustrating about this is that it was obvious from the beginning that LLMs are (to be overly simplistic) predictive text with extra steps, and everyone knows it would be silly to trust predictive text. At least they're starting to actually being honest about that
0
0
9
Tl:dr I have always felt that Chaos theory could apply to AI.
0
0
1
We just need a couple trillion more dollars to get this thing REALLY poppin' off! We gotta have Jeff Chris come down from Indiana to program this professionally!
0
0
1
I have to think that is endemic based on the very principle of how they operate-- predicting words, not utilizing logic.
0
0
3
Just a few more datacenters bro and we'll fix it. Just need to be in more apps and used by more campanies for a few more years and we'll perfect it. C'mon man we just need to boil a couple more lakes I promise
0
0
1
my casio calculator is literally never wrong
1
0
6
And the silliness of a calculator is even more reliable than the silliness of GenAI!
0
0
2
nice to see that what people said is now proven
0
0
0
that is not what the paper says, rather, it’s quite simple to avoid, once you know what you’re looking for:
4
0
13
That is not what the entire article says. That concluding statement is that it could be made less worse, not that the problem can be completely solved. Go back and read the whole thing instead of the last paragraph.

And nothing simple about the proposal. Many models are getting worse.
2
0
7
The problem is, if you avoid it that way, what it produces will no longer be what most people want. The training incentives are the way they are for a reason. People love simple confident answers.
0
0
1
No it doesn't. Changing the incentives in training can only reduce but not eliminate the probability of error. There's an inherent lossiness in this type of algorithm, and there's inherent structural limitations. If the current algorithms were good enough, longer training would already have worked
1
0
2
Now we have gone from “don’t know how to fix it” to “can’t fix it.”

The response, of course, will be to turn UP the hype machine.
0
0
0
Let us also please ponder the phrase “plausible outputs”. R’lyeh exists and Cthulhu will one day awaken is ‘plausible’. We have only explored a small fraction of our oceans. They deliberately chose that word over ‘truthful’, ‘accurate’, or other words that do not mean ‘we know it’s all garbage’.
0
0
0
No LLM has any perception of meaning. Underneath it all, the way they work is "Based on this reference data, show me something that looks like a statistically probable answer to this prompt"

"How many R's are in strawberry?" It don't know what an R is.
1
0
14
For answering questions, LLMs could be helpful for finding a needle in a haystack, but for anything important any positive return needs to be verified, and any negative return should be ignored.
0
0
5
"Put your Mind at ease; let AI do all the messy thinking. And planting. And picking. And driving. And dreaming."
- the Eloi*

en.m.wikipedia.org/wiki/Eloi
0
0
0
@mark-carney.bsky.social

This is why your government needs to be much more cautious, than what we are hearing, when it comes to the idea that AI will make the government more "efficient."
0
0
0
And yet this entire article is still framed as though these models were doing anything other than exactly what they’re designed to do. Still framed as though they were designed to produce true output, when they are not at all.
0
0
1
Computers work as a tool for people because they provide guardrails & layers of validation for everything people do, in effect correcting for the mistakes every human makes.

There's enough intelligence in AI to do most of the work humans do with similar (but AI focused) guardrails and validation.
0
0
0
This seems like a financially important problem with the technology. Almost like it will never work, and I should not be valuable. This is like a bicycle that will always fall apart, inevitably and because that's how the thing was built.
0
0
2
Hundreds of billions of dollars baby.
0
0
3
LOL. As if we didn't know this already.....
0
0
0
I guess this was always obvious to me.

Computers are not fact-checkers. They cannot do more than sort through the ideas that they are given.
0
0
2
How was this not obvious from the start? If it's generating content, then the generated content can deviate from the original data. If I want accurate results, then I copy the data without generating new content.
0
0
5
In related news, while you can use a hammer to hammer in a screw, and the result may be indistinguishable from the outside, you’d still be an idiot for trying to.
0
0
13
the compsci mantra used to be: "garbage in, garbage out."
but ai has simplified it to: "garbage out."

...and they say there's no such thing as progress.
1
0
7
Eliminates 50%of the process. Efficiency!
0
0
2
Really cool that Governments are planning on deeply integrating AI into healthcare systems.
0
0
1
While it's interesting they determined this mathematically, it's also common sense that there's only so much that can be deduced from second hand accounts. Learning by doing, exploring and testing in the real world is always going to trump reading about it.
0
0
0
I wish people would stop referring to AI bullshitting as “hallucinations.” Real human brains can hallucinate. Computers just spit out garbage.
0
0
0
Yeppers. 😉
0
0
0
There’s no viable product here, which means there’s no road to profitability for any of these companies that have set billions on fire. The economic ripple effects are going to be horrific.
2
0
34
IMO niche markets in tech and research will keep LLM companies afloat. I’m in awe daily of genetic AI and massive productivity increase (without sounding like a capitalist cog 😩). Businesses don’t mind paying $$$ but it is probably OK that edu turns to small fine-tuned low-error models (or books!?!)
1
0
0
It’s like the only thing propping up the GDP numbers in the U.S. right now.

We are in trouble.
1
0
9
even when admitting their central claims are false they manage to waste resources by researching something any linguist could have explained in 30 minutes
0
0
2
i think this is actually more propaganda on openai’s part bc they do not list “this is just how LLMs work, on a structural level” as one of the reasons

but that’s the truth

LLMs can never not “hallucinate” bc they’re not intelligences, they don’t even know what words are much less know meaning
5
1
78
I'm just shitposting, but it seems like the non-deterministic-seeming output of an LLM is key to user perceptions of natural-seeming language and conversationality. but the only way to make that happen is to introduce randomness into the response, which contributes to production of cascading errors
1
0
9
That was my wake up moment “LLMs” do not understand a thing! 🤔
1
0
2
You still haven’t read up on any of the last 24 months of research into mechanistic interpretability, huh?
0
0
0
Not surprised reading this, at all. Thanks for sharing @mocogam.bsky.social
0
0
1
Yes we knew that as soon as the Attention Is All You Need paper was published, it follows automatically from the fact that the output string of tokens is generated statistically from the input string & the meaning of the strings is not part of the model at all.
1
0
11
There is a joke about the content-free vapidity of text generated mindlessly by rules, in Tristram Shandy, written more than two hundred years ago.
1
0
9
Real question - have the big LLM purveyors collaborating on cross-validating their outputs? I do this manually all the time. I give one task to Grok or Gemini and then have ChatGPT or Claude go through and validate. This removes 99% of hallucination, and most citation gaffs.
0
0
0
Waiting for the recommendation of 20 massive AI data centers running every query so that they can reach 95% confidence consensus...
0
0
2
I’m not a fan of AI, but only two of the comments above note that humans also say stuff that isn’t true, and build up whole narratives that are hallucinatory. So the utility question is not (as with autonomous vehicles) whether they ever fabricate (or crash) but whether they do so less than us.
5
0
4
They do, but it’s possible for a critically thinking person to select the human sources they trust and compare them against each other
1
0
0
Or on purpose. Not to sound like a conspiracy theorist, but grok has proven that LLMs can be manipulated to provide false information purposefully given to them. Just like humans spew false information on the internet.
0
0
2
what is the point of an expensive machine that is wrong?
1
0
5
AKSHUALLY no. LLMs are not like people in this.

People do things because motives.

LLMs do not have motives, only rules. LLMs can never be held accountable. They can never fear to fail, to wish for success, to desire anything at all.
1
0
2
And we're about to make the same mistake with AI as we did with the internet.
0
0
3
AI is just not worth it. At the rate they are building them, the data centers will go through all of the planet's available fresh water in a matter of decades, causing the mass extinction of everything that needs water to live.
1
0
0
AI is plenty worth it; it's LLMs and image generation that aren't worth it, and they're also the ones that consume resources this way
1
0
0
A literal bullshit engine.
0
0
2
The amount of cope coming off the page...
0
0
1
So they are like us.
0
0
0
It’s worse than that. Because the speed at which AI is progressing, it currently has no guard rails. AI takes in everything and has no frame of reference to guide its output. It jumps to incredibly detailed, but quite false conclusions.

It’ll only get worse from here as entropy takes over quickly.
0
0
0
I think they’re not pessimistic enough in the actual paper, claiming the “cause” was just a lack of training examples encouraging them to admit when they’re uncertain.
1
0
17
wow sounds like we should never trust open
ai to be a vendor of decision making software
1
0
9
Like having a car that will always crash regardless of how well you drive it
1
0
61
I mean, this is why they're trying to sell them as fast as possible.
0
0
2
Perfect. Let’s boil the oceans and use gobs of energy to produce wrong answers. Superduper.
0
0
79
In this case, it is just like humans.
2
0
5
Humans, at least some, are capable of admitting they don't know the answer. LLMs are not just incapable of that, it's one of their fundamental properties
1
0
2
ChatGPT lied to me *three times* about... how to change my email address in ChatGPT. It could not even correctly produce a result about itself.
1
1
45
It didn't try to, because trying to produce a correct answer to a question is entirely outside the scope of the model from the beginning
0
0
16
The marketing droids are busy working out how to sell hallucinations as a feature rather than a bug.
0
0
0
You don't say..
0
0
0
This is what I keep trying to tell my boss.
1
0
23
This is another thing I've been pointing out since the start. There's a reason for this, GenAI wasn't designed to be a search engine or a data storage system, it's an aesthetics simulator built around creating plausible but novel output. Hallucination is the point. It's a hallucination engine.
1
0
0
A hallucination engine has a lot of uses, and some produce useful scientific data (protien folding and suchlike), but those are treated like any other hypothesis and get tested before being utilized. Aesthetic uses go to the user, who may have tacky taste but will know what they like/asked for.
1
0
0
"Landmark study" is a joke. There have been researchers saying this for a while. Probably the most famous, but still late, was Apple's paper
1
0
3
Yes but that doesn't stop Apple from marketing their "Apple Intelligence" AI.
1
0
0
Will someone please tell our bosses this so we can stop shoehorning chatbot LLMs into everything as if it was AI?
1
0
3
People have been telling them for years. If they were smart enough to understand they'd have to get jobs.
0
0
1
"our liebot will always lie"
well, they admitted it finally, it's useless 100% and always will be.
1
0
3
It doesn't even lie though, that's the thing, it procedurally generates text based on a statistical model with no reference to meaning at all
1
0
1
"[...] industry evaluation methods actively encouraged the problem. Analysis of popular benchmarks [...] found nine out of 10 major evaluations used binary grading that penalized “I don’t know” responses while rewarding incorrect but confident answers."
0
0
3
“The majority of mainstream evaluations reward hallucinatory behavior.”

The researchers say a “simple” change in evaluation method will change this.

This reads to me like humans who refuse to say “I don’t know the answer” wrote the algos & picked the evaluation criteria.

Our weaknesses in code…
0
0
0
We already have a term for "plausible but possibly false."

It's called "bullshit." No, really, that's the academic definition of "bullshit."
0
0
9
They use "hallucinations" in a weird way. I think they mean "wrong answers" but calling them "hallucinations" seems like something other than a "wrong answer"
17
0
196
They're all hallucinations. Some of them happen to be more or less correct.
0
0
12
A wrong answer is just a wrong answer, a hallucination is a massive structure made up of many wrong answers, including answers to questions that were not asked
2
0
183
Technically, ALL of their responses are fundamentally the same to the machine (and are all "hallucinations" - they do not come from cognition).
It's just that some of its responses are encouraged, leading to a higher likelihood of them (without addressing the issue of the lacking foundation).
0
0
0
The machine is not capable of understanding the concepts of correct or incorrect.
0
0
0
Nope, it's correct. A LLM has no concept of right or wrong: it literally hallucinates an answer that's likely (from it's training data); sometimes it matches with reality, sometimes not. Whether it does or not is completely unrelated to the process, and more of a function of the sources collected.
0
0
9
I'm not sure exactlly what the study found, but these LLM's often talk about things that don't exist and make up elaborate stories with no truth to them. So it beyond saying "yes" when they should have said "no".
0
0
1
calling it an "answer" is still incorrect, because llms are not capable of "answering" a question they are "asked"
1
0
8
There's a small contingent of rebels, er, advocates for other terms, but pop culture is probably stronger. Here's a digestible take by @xolotl.org on "mirage" as likely a better term:

xolotl.org/are-we-tripping/
0
0
6
Hallucinations is the PR term used

when all you have is language prediction, it just turns into a bullshit generator. That is what they are, BS generators. for the untrained it looks plausible and good, but to an expert in the subject, it is worse than useless and just wrong
0
0
1
They want to anthropomorphize their computer program. It's the same reason they call it "A.I." when it absolutely is not even close to being A.I.

It's working, generally; people talk about these things like they have agency and make decisions.
0
0
0
I think it's grounded in classical theories of mind where the rational/logical ideal has access to perfect information about reality and therefore cannot be contradicted by it.

The LLM is intended to provide that, so failures are assumed to be irrational/illogical.
0
1
7
It may sound funny but "bullshit" has a fairly technical meaning, and it's that. No particular relationship to true or false. Bullshit can be incredibly useful but you damn well better realize it's bullshit.
0
0
0
The correct technical term is “bullshit”.
0
0
12
I made a typo error on a Google search recently, using a question mark instead of a letter. The AI invented a meaning for the word I'd accidentally created along with a suggested etymology. That's a very different animal to a simple wrong answer.
0
0
9
honestly i prefer hallucinations to just wrong because of the way its so *confidently* wrong. You cant call it lying because lying implies an internal desire to deceive (which most LLMs simply cannot have)

then again, "Bullshitting" works even better for this
0
0
3
They are using it in a weird way because it's become jargon; similar to the way "bug" is jargon for incorrect programming logic.

The vernacular equivalent is "bullshit".
0
0
3
I don’t like calling it a “hallucination”. It’s pure speculation. My autocorrect/autocomplete speculates too. It’s correct about 50% of the time, which isn’t often enough for me to consider it reliable.
0
0
0
Well, duh
0
0
0
Anybody who has any idea how these programs work has been shouting this for a goddamn year.

These things are not nearly as useful as the multibillion dollar valuation suggests. It's a cope from the investor class, who are absolutely not as smart as late stage capitalism wants you to believe.
1
0
18
That’s not 100% true. Many of the people who know how they work are obscuring this fact because they’re trying to sell it to us 😭
1
0
3
Llms are statistical models, statistical models, if they converge, will ALWAYS produce an answer whether that answer is meaningful or not
1
0
17
are there any interesting behavior modes of "non converging" models?
0
0
0
Burning lakes and rivers for this bullshit
0
0
10
Flawed datasets are the fuel, corporate requirement for interaction is the accelerant, and hallucinations are the fire in which LLM's burn, set aflame by the user's ever growing need for instant answers on tap.
0
0
0
Still wondering ... what's in the "pro" column for ordinary people? The "cons" are clear enough.
0
0
1
The spectre of Gödel is smiling sadly...
1
0
22
As a working mathematician I am chuffed to see Gödel referenced here.
0
0
4
Admittedly I know nothing about this, but if AI could produce testable assertions and also have the ability to test them, would that not be true?
3
0
0
It's a weighted random number generator. Literally the only thing that it is doing is weighing which letter is likely to come next, and stringing those together. That's *it*. There's no additional complexity. it's just rolling loaded dice and spitting out the resulting garbage.
1
0
1
AI is like a toaster. You give some input, it processes, then you get output. Sometimes it the output is so bad it is useless. A toaster can’t create its own input, nor does it learn from previous failures or successes. It’s a machine, it does what it is designed to do, and barely does it well.
0
0
0
It can't do that though. It requires comprehension in order to fulfil that, and it doesn't comprehend anything nor is it intended to. It's designed to place words in a probablistic pattern to resemble other sequences of words in its database that are tagged with whatever topic you're asking about.
0
0
5
And the huge thing about them being wrong is that the companies that operate these AI claim they assume no fault for the harm they do.
0
0
9
I am but a humble former data science practitioner. But if a physics theorist had a model that described reality like this they’d be unemployable.

So you want everyone to use something where false positives are inevitable and invisible for critical use cases. Ok sounds great.
2
0
26
Also I am unemployable because I think LLMs are stupid. The problem is the only things more stupid than an LLMs are executives and most DS directors.
0
0
25
You are aware that in Quantum physics uncertainty is actually a fundamental property. It was s kind of used a lot and many who work with it are employed.
3
0
1
Which literally disqualifies it from any industry where legal liability rests on accurate responses. Think medical, insurance, bank advice, just on and on.
1
0
4
And yet the use continues. Epic has already pushed some of this into its products and tricked hospitals into using it
0
0
0
Thanks.

Actually paper is much different than the discussion here.

Many people seem to be hallucinating what they think the paper says without verifying the source.
5
0
53
I was trying to lookup some python packages Friday to do a very specific task. Google's AI summary gave me back a package that I knew existed and did the thing I wanted but it got the import name wrong. That is something easy to verify as wrong, but I can see AI introducing a lot of new bugs.
1
0
1
AI seems to be specifically marketed towards novice developers who are most likely to not notice these sorts of mistakes and also miss out on the benefits of actually familiarizing themselves with the code.
0
0
1
This is what people with bare minimum understanding of foundational computer technology figured out 4 years ago.
0
0
3
so openai tell us that the product they themselves build is fundamentally and unfixably fucked? lol lmao
0
0
1
Is this a Theorem?
0
0
0
...and this doesn't kill the technology why?
1
0
1
Because it's useful to those who profit from the misinformation and lack of middlemen. It's very lucrative to Big Tech business owners.
0
0
1
Wait what? I’m having an LLM perform my knee surgery next week!
0
0
2
Wait until they find out what happens after the LLMs model their own hallucinations in a perpetual loop of regression.

All while threatening our climate with massive energy and water waste for their short term scam.

bsky.app/profile/mikeyearworth.bsky.social/post/3lzdwnh4p3s2g
0
0
4
I don't think it says that? The paper argues very specifically that the RL post-training is structured to produce hallucinations and suggests ways to change it to minimize hallucinations
1
0
1
That's cope. It's intrinsic to the way llms work.
1
0
1
This shouldn't be surprising. Extrapolation outside the radius of convergence has been known to be far less reliable than interpolation within. LLMs are forced to extrapolate.
0
0
2
Murphy's law (in its true original sense) undefeated.
0
0
0
Confirming what those of us with even the vaguest understanding of how it works already knew
0
0
0
Is this really surprising? All human vision is effectively hallucination, our brains fabricate the world we see and they do make mistakes.

All AIs are coded by the same flawed humans who in some cases still think the world is flat so I'm not surprised when the coding is faulty.
1
0
0
You don't even know what the words you're speaking mean.
1
0
0
Full on idiocracy.

Let's go back 30 years.

Would you accept going to the library and opening an encyclopedia and perhaps possibly not getting random incorrect information?

I'm pretty sure physical encyclopedia sales are going to go up.
2
0
19
Was that a double negative double maybe?! 🤔 Can you decipher that for me? It reads like would you go to the library and accept the possibility that information you get from an encyclopedia is correct.
1
0
0
I really miss old school Encarta. Especially that little trivia game they built into it.
2
1
17
So, the computer can't do math?
3
0
3
the computer maths can't human
0
0
0
ayep. It is akin to A + 1 = C, where A is some shit the LLM pulled out of its ass. It later takes that equation it kind of made up, because it can't be wrong and uses that as solid data. It will then use C in other equations and propagate the error, falsifying more data as needed to give answers
0
0
2
The computer maths just fine, but LLMs don’t since they just randomly string words together
0
0
21
It's great that we put billions of dollars and accelerated biome collapse for this!
0
0
5
AGI will prove to be one of those technologies that was quite easy to get to 80% right and absolutely impossible to get more than 90% right
0
0
1
not to be that guy but i had to check the date. i thought this was accepted info by now (until proven otherwise)

insideainews.com/2024/11/13/ai-hallucinations-are-inevitable-heres-how-we-can-reduce-them/
0
0
1
The models are useful when this fact is set as a premise to anything you try to use it for
0
0
0
Well, a study wasn't really necessary - it's literally the way it works.
0
0
2