Merged Artificial Intelligence

arthwollipot · Jan 30, 2025

jt512 said:
Be careful. I have had ChatGPT cite references that it just made up. When I complained to it, it apologized and just made up new references.

Yesterday I asked Copilot whether Taiwan is a country, and it cited the thread I was reading here, which was about the Princess Diana conspiracy.

theprestige · Jan 30, 2025

EHocking said:
I hate the ◊◊◊◊◊◊◊◊ “hallucinations” excuse for AI.

Just say WRONG!!!!

It's not an excuse. It's a derogatory epithet.

arthwollipot · Jan 30, 2025

theprestige said:
It's not an excuse. It's a derogatory epithet.

It's a technical description. Would you prefer "confabulation"?

theprestige · Jan 30, 2025

arthwollipot said:
It's a technical description. Would you prefer "confabulation"?

I don't think it's all that technical. But I do think it makes a useful distinction from other human forms of being wrong. It's not ignorance. It's not misunderstanding. It's not dishonesty. It's building a false picture of reality due to a failure of cognition.

arthwollipot · Jan 30, 2025

theprestige said:
I don't think it's all that technical. But I do think it makes a useful distinction from other human forms of being wrong. It's not ignorance. It's not misunderstanding. It's not dishonesty. It's building a false picture of reality due to a failure of cognition.

That's a pretty good definition of a hallucination you've just given there.

Ziggurat · Jan 30, 2025

zooterkin said:
James O'Malley has a piece on why DeepSeek doesn't mean the end of ChatGPT and friends. A more efficient use of resources will mean the existing computational capacity can be used to do more.

And that's even assuming that they're telling the truth about the amount of computational capacity used. I've seen claims that they lied about how many chips they used, because they were using a bunch of NVidia chips that sanctions were supposed to prevent them from even having. I don't know if that's true, but it's plausible.

The Great Zaganza · Jan 30, 2025

DeepSeek Isn't the only company doing what the Silicon Valley giants are doing, better and cheaper.
It's proof that there is not enough technology advantage to establish dominance in the Field, which is the only way these Tech Bros can think about competition.

If the US actually wanted to become the forerunner in A.i. instead of just shoveling taxpayer money to private companies, it would massively invest in educating students and workers in the technology. And it would go full Open Source - that's what made the Internet possible.

Puppycow · Feb 2, 2025

Here's an interesting explanation of the techniques behind DeepSeek:

Pixel42 · Feb 3, 2025

This is the sort of use of AI I can get behind.

AI to revolutionise fundamental physics and ‘could show how universe will end’

Exclusive: Cern’s next director general Mark Thomson says AI is paving the way for huge advances in particle physics

www.theguardian.com

rjh01 · Feb 4, 2025

Puppycow said:
Here's an interesting explanation of the techniques behind DeepSeek:

This is a video from a channel that produces very high-quality videos. If you are interested in the topic this is a must-watch video. Otherwise ignore it. It is very difficult to give a good summary in a few words as there is a lot of information in 20 minutes.

Puppycow · Feb 4, 2025

rjh01 said:
This is a video from a channel that produces very high-quality videos. If you are interested in the topic this is a must-watch video. Otherwise ignore it. It is very difficult to give a good summary in a few words as there is a lot of information in 20 minutes.

I'll try to briefly explain what is covered in the video.

One is a technique called "Mixture of Experts".

The Mixture-of-Experts (MoE) 101
The Mixture of Experts (MoE) model is a class of transformer models. MoEs, unlike traditional dense models, utilize a “sparse” approach where only a subset of the model’s components (the “experts”) are used for each input. This setup allows for more efficient pretraining and faster inference while managing a larger model size.

In MoEs, each expert is a neural network, typically a feed-forward network (FFN), and a gate network or router determines which tokens are sent to which expert. The experts specialize in different aspects of the input data, enabling the model to handle a wider range of tasks more efficiently.

The other technique they cover is something called "chain of thought", which is sort of like how human beings approach complicated questions.

We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking.

Combining these techniques allows the LLM to achieve similar performance, but requires fewer GPUs to run, resulting in greater efficiency. E.g., less electricity and less expensive hardware.

The video description also has links to technical papers.

Roger Ramjets · Feb 5, 2025

Puppycow said:
I'll try to briefly explain what is covered in the video.

"We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking."

Thanks for that. I already used up my YouTube allocation watching cat videos, so...

The 'chain of thought' idea is very interesting. This is a much better approach than just throwing more computing power at it.

Roger Ramjets · Feb 5, 2025

zooterkin said:
James O'Malley has a piece on why DeepSeek doesn't mean the end of ChatGPT and friends. A more efficient use of resources will mean the existing computational capacity can be used to do more.

But that means they won't have to buy exponentially more hardware to do the job, which means they won't need exponentially more investment, which means the stock price won't keep going up exponentially. This destroys the main purpose of AI, 'making' money!

Dr.Sid · Feb 5, 2025

Roger Ramjets said:
But that means they won't have to buy exponentially more hardware to do the job, which means they won't need exponentially more investment, which means the stock price won't keep going up exponentially. This destroys the main purpose of AI, 'making' money!

They will make enough money by replacing office jobs, don't worry ..

arthwollipot · Feb 5, 2025

rjh01 said:
This is a video from a channel that produces very high-quality videos. If you are interested in the topic this is a must-watch video. Otherwise ignore it. It is very difficult to give a good summary in a few words as there is a lot of information in 20 minutes.

Thank you.

It appears that the video might go into a level of detail that I am unlikely to be capable of following, so I'll give it a miss.

Darat · Feb 5, 2025

arthwollipot said:
Thank you.

It appears that the video might go into a level of detail that I am unlikely to be capable of following, so I'll give it a miss.

I doubt it. He's a very good explainer of complex ideas.

Wudang · Feb 5, 2025

Puppycow said:
Here's an interesting explanation of the techniques behind DeepSeek:

Thanks and thanks for the explanation. I avoid YouTube links here but I’ve now subscribed to Computerphile.

Ziggurat · Feb 5, 2025

Ziggurat said:
And that's even assuming that they're telling the truth about the amount of computational capacity used. I've seen claims that they lied about how many chips they used, because they were using a bunch of NVidia chips that sanctions were supposed to prevent them from even having. I don't know if that's true, but it's plausible.

So here's a bit of circumstantial evidence to support the claim that DeepSeek was using more hardware than claimed.

https://twitter.com/x/status/1885311994862940632

NVidia chip sales to Singapore soared after DeepSeek was founded, even though Singapore itself isn't really an AI hub.

Darat · Feb 5, 2025

We know they are telling the truth about the computational resources it requires. It is open source and people have already got it running.

Ziggurat · Feb 5, 2025

Darat said:
We know they are telling the truth about the computational resources it requires. It is open source and people have already got it running.

Do they have it running, or do they have it training? Because training takes a ◊◊◊◊ ton more resources than running once trained.

Merged Artificial Intelligence

Observer of Phenomena, Pronouns: he/him

Penultimate Amazing

Observer of Phenomena, Pronouns: he/him

Penultimate Amazing

Observer of Phenomena, Pronouns: he/him

Penultimate Amazing

Maledictorian

Penultimate Amazing

Schrödinger's cat

Gentleman of leisure

Penultimate Amazing

The Mixture-of-Experts (MoE) 101​

Philosopher

Philosopher

Philosopher

Observer of Phenomena, Pronouns: he/him

Lackey

BOFH

Penultimate Amazing

Lackey

Penultimate Amazing

The Mixture-of-Experts (MoE) 101