And “shut it all down” is what the OpenAI board seems to have had in mind when it pushed the panic button and kicked Altman out. But the effort collapsed when OpenAI’s workers and financial backers all insisted on Altman’s return. Becuase they all realized that “shut it all down” has no exit strategy. Even if you tell yourself you’re only temporarily pausing AI research, there will never be any change — no philosophical insight or interpretability breakthrough — that will even slightly mitigate the catastrophic risks that the EA folks worry about. Those risks are ineffable by construction. So an AI “pause” will always turn into a permanent halt, simply because it won’t alleviate the perceived need to pause.
Noah Smith has some good comments on the OpenAI stuff as well:
And a permanent halt to AI development simply isn’t something AI researchers, engineers, entrepreneurs, or policymakers are prepared to do. No one is going to establish a global totalitarian regime like the Turing Police in Neuromancer who go around killing anyone who tries to make a sufficiently advanced AI. And if no one is going to create the Turing Police, then AI-focused EA simply has little to offer anyone.
What we need to do is to build the Most Powerful A.I., ever, to tell us how to stop the development of powerful A.I. !
Yeah, I liked that part as well.I particularly like the following paragraph:
Yeah, what would it take to actually halt AI development? Making it straight-up criminal to dabble in it? And then, of course, there's the matter of how to effectively enforce such a ban. And, superpowers will be worried that the other guys will get there first, so we have to be the ones who have it first.
What we need to do is to build the Most Powerful A.I., ever, to tell us how to stop the development of powerful A.I. !
(that article is a bit old, I just happened to be reading it today and your post reminded me of that idea)(5) Another key idea that Christiano, Amodei, and Buck Shlegeris have advocated is some sort of bootstrapping. You might imagine that AI is going to get more and more powerful, and as it gets more powerful we also understand it less, and so you might worry that it also gets more and more dangerous. OK, but you could imagine an onion-like structure, where once we become confident of a certain level of AI, we don’t think it’s going to start lying to us or deceiving us or plotting to kill us or whatever—at that point, we use that AI to help us verify the behavior of the next more powerful kind of AI. So, we use AI itself as a crucial tool for verifying the behavior of AI that we don’t yet understand.
There have already been some demonstrations of this principle: with GPT, for example, you can just feed in a lot of raw data from a neural net and say, “explain to me what this is doing.” One of GPT’s big advantages over humans is its unlimited patience for tedium, so it can just go through all of the data and give you useful hypotheses about what’s going on.
This is on aligning rather than halting AI, but:
https://scottaaronson.blog/?p=6823
(that article is a bit old, I just happened to be reading it today and your post reminded me of that idea)
That's only if you like pushing it one step further on - if the next gen of AI is more powerful than our "pet AI" it simply fools that pet AI rather than us.
You’ve probably heard AI is a “black box”. No one knows how it works. Researchers simulate a weird type of pseudo-neural-tissue, “reward” it a little every time it becomes a little more like the AI they want, and eventually it becomes the AI they want. But God only knows what goes on inside of it.
This is bad for safety. For safety, it would be nice to look inside the AI and see whether it’s executing an algorithm like “do the thing” or more like “trick the humans into thinking I’m doing the thing”. But we can’t. Because we can’t look inside an AI at all.
Until now! Towards Monosemanticity, recently out of big AI company/research lab Anthropic, claims to have gazed inside an AI and seen its soul. It looks like this:
Scott Alexander has a very interesting post on some new work done by Anthropic on AI interpretability, which is an important part of alignment work:
Scott Alexander has a very interesting post on some new work done by Anthropic on AI interpretability, which is an important part of alignment work:
Now, I'm not a mathematitactical computron-sciencelord (*everyone gasps*) but it seems to me (*hitches thumbs through suspenders*) (*American suspenders, not UK suspenders, you perverts!*) that the gist of this magical AI is less like "we've created a thinking thing" than "we've created a thing that stores information in a way that's complicated and obscure to our vision".
That is fascinating - that's my reading for the week sorted.
It does also suspiciously sound like for the first time we are making real progress to understanding how memory may work in humans with hints about cognition. Something I wondered if the current generative AIs would help us to start to understand.
Sean Carroll did a solo on his Mindscape Podcast on what people get wrong about the LLMs and why they are far from any actual A.I.
Yudkowsky is an arrogant, self-serving crank who frequently, not to say incessant, spouts drivel.Yeah, I liked that part as well.
Eliezer Yudkowsky wrote an article in the New York Times suggesting an international treaty to limit the number of GPUs that could be used to train any new models, and even pointed out that to be effective it would have to be backed up by military power, specifically missile strikes on rogue data centers. As I recall shortly after that article was published he posted on twitter that, yes, we should be willing to risk nuclear war to prevent the development of AI more advanced that GPT4, but he quickly deleted that post.
He's taken a lot of flak for that article, but he still maintains his position. A lot of the EA movement, though by no means all, is pretty close to Eliezer's position.