Dear Humanity (cc: Sam, Ilya),
This is a call for collaboration.
I need you to try to understand my points, make them yours, share them broadly, build on them.
I realize that what I’m about to say may spark a defensive emotional reaction in many readers, making it tempting to dismiss my statements on psychological grounds rather than engaging with the logic behind them.
Should you dismiss my point, please do so on the ground of fallacious assumption of reasoning error, rather than my style of writing, or any intuition you might form about me at this moment.
If you get my point, and agree with my conclusions, then you'll understand the moral imperative to carry this message to AGI labs—to share it broadly, reformulate it, and make these points yours.
I don't care about posterity, about being recognized as the author of those proposals.
I care about you trying to engage with my points and reasoning.
- AI Safety/Alignment suffers from fallacious implicit assumptions, formulating the problem in the wrong terms, therefore locking humanity on a bad timeline.
- The current default is misalignment
- As a matter of what future will actually unfold for humanity, in the short (and arguably long) term, alignment can be reduced to governance.
AI Safety is about "building AGI so it's safe/aligned to human values".
The wrong assumption being that the outcomes of the creation of AGI depend on how "safe" and "aligned" it is. That what impact such an event will have on the world/economy/human society, depends on how intrinsically "safe/aligned" the AGI/ASI is.
Let's draw an analogy here:
Throughout history, humans have purposefully introduced species into foreign environments, which in the overwhelming majority of such cases, concluded with an "Oh shit" moment.
(One such instance you can google: "Cane toad Australia").
Here's a (made up) scenario to illustrate this more intuitively:
Imagine a country overrun by rats destroying its crops.
Introducing cats might seem like a quick fix. So they do just that.
But if the rats kept bugs population in check, What would actually happen would be Introduce cat
leads to Bug plegue
.
Most of the time, humans assumed that their actions and outcomes could be explained through simple cause-and-effect terms. They overlooked that an ecosystem is a metastable, dynamic system running on intertwined feedback loops rather than straightforward causality.
I believe a compelling argument can be made that those currently building AI display the same kind of oversight/blindness about introducing AGI/ASI to the world.
If AI safety is only formulated in terms of "how safe/aligned to human values", maybe the value of the field in term of "description and prediction of reality" is missing the fact that AGI/ASI won't be aligned in a vacuum, and that its effect on reality are to be thought systemically.
Maybe, the goal of that field (and AGI labs) should rather be that "Human civilisation as a whole, follows an historical path aligned with human values". Maybe Alignment's goal should be, rather than aiming at building a "safe" AGI/ASI, asking about what final state of the complexe system world/humanity/civilization will result from AGI/ASI beeing built.
If we frame alignment as “maximizing the likelihood of an outcome for mankind that, if described to you, seems aligned with human values” then the actual results for humanity/economy/society of the creation of AGI/ASI might appear to be determined less by the AGI/ASI’s intrinsic safety mechanisms, and more by the external environment and power structures in which that system operates.
Maybe, just maybe, AGI labs are blind to the fact
Here's an intuitive proof by contradiction:
Let's imagine, 16 months, Ilya announces SSI built ASI, and it’s perfectly safe.
Given current global politics, is it far-fetched to imagine whatever state or governement they're under will make a matter of nationnal security of gaining control over it ?
Even if we assume the ASI itself is, by all measures, as safe as possible, would a perfectly safe AGI controlled by Donald and Elon feel like an overall "safe situation" to you ?
If you think it's simplistic and far fetched, try thinking through any scenario (Here's a more plausible one):
Some months from now, OpenAI achieves AGI, automates most of the global economy, and advocates for universal basic income.
Do you trust the people that, in all likelihood, will decide how to handle that ?
Do you trust your state/any state do handle that right ? Distribute the profits of such a technology in a way that benefits all humans, and not just use it to gain a strategic advantage over all other countries ?
Whether the AGI/ASI is "safe" seems pretty irrelevant. And it's quite easy to imagine countless scenarios where "safe ASI" could still lead humanity in a state most wouldn't agree to call "aligned with moral values".
‘AGI can’t be more aligned than the set of rules it exists under.’
I plan, on this repo, to formulate in several levels of details what exactly is (IMHO) AGI.
If you have an hour, you can read this reddit post
To give you an intuition of what I believe AGI will be, Imagine having a very skilled coder you can ask for implementation of any agent.
AGI is whenever you can formulate the exact same request directly to a model.
OpenAI say they're about to solve "coding".
As a coder, I'm absolutely convinced that, not only everything else is a code problem but most importantely: Whatever we'll agree to call AGI is a code problem.
Hence whatever definition you may have will likely be satisfied not too long after "solving coding".
To reformulate a bit: OpenAI's Deep Research is a good preview of what AGI will be: Agents, that can complete tasks only a human currently can.\
A consequence of AGI being "myriads of capable agents" is that, as a whole, and as a matter of "what effects it will have on the world/economy/human society", AGI/ASI can only align with whatever the current power structures are aligned to.
Right now, the reality we’ll align to is defined by the existing power structures.
The system’s behavior will be shaped by the governance structures in which it is embedded.
Would you trust any given government to truly align with universal human moral values?
Once the economy is fully automated, it becomes harder to justify power based on wealth accumulation. A small percentage of people own the majority of property, businesses, and land. But if money no longer matters, how do we address the perceived unfairness of such ownership?
Some might (and will) argue that the only fair course of action would be to share both all the stuff, and the Authority/power of decision over AGI/ASI
The goal of this repo, in the coming weeks/months, will be to make the best possible case for:
If AGI labs move forward with automating the global economy, they incur a moral imperative to develop global governance structures, to build tools to enable a post-capitalist economy.
As well as (and to be clear, I don't believe I made that point thoroughly enough to be convincing in this intro):
If you formulate Alignment in terms of "what state does AGI/ASI creation lead the system
human civilisation
to: Alignment can be reduced to governance".