OpenAI is expanding its internal safety processes to prevent harmful AI threats. The new “Safety Advisory Group” will sit above the technical team and will make recommendations to management, with the board having a veto right, but of course whether or not they actually exercise it is entirely up to them. This is a problem.
There is usually no need to report on the details of such policies. In reality, the flow of functions and responsibilities is unclear, and many meetings take place behind closed doors, with little visibility to outsiders. Perhaps this is the case, but given recent leadership struggles and the evolving AI risk debate, it’s important to consider how the world’s leading AI development companies are approaching safety considerations. there is.
new document and blog postOpenAI is discussing its latest “preparation framework,” but this framework is based on two of the most “decelerationist” members of the board, Ilya Satskeva (whose role has changed somewhat and is still with the company). After the reorganization in November when Helen was removed, Toner seems to have been slightly remodeled (completely gone).
The main purpose of the update appears to be to provide a clear path for identifying “catastrophic” risks inherent in models under development, analyzing them, and deciding how to deal with them. They define it as:
A catastrophic risk is a risk that could result in hundreds of billions of dollars in economic damage or serious harm or death to a large number of individuals. This includes, but is not limited to, existential risks.
(Existential risks are of the “rise of the machines” type.)
Production models are managed by the “Safety Systems” team. This is for example against organized abuse of ChatGPT, which can be mitigated through API limits and adjustments. Frontier models under development are joined by a “preparation” team that attempts to identify and quantify risks before the model is released. And then there’s the “superalignment” team, working on theoretical guide rails for a “superintelligent” model, but I don’t know if we’re anywhere near that.
The first two categories are real, not fictional, and have relatively easy-to-understand rubrics. Their team focuses on cyber security, “persuasion” (e.g. disinformation), model autonomy (i.e. acting on its own), CBRN (chemical, biological, radiological, nuclear threats, e.g. novel pathogens), We evaluate each model based on four risk categories: ).
Various mitigation measures are envisaged. For example, we might reasonably refrain from explaining the manufacturing process for napalm or pipe bombs. If a model is rated as having a “high” risk after considering known mitigations, it cannot be deployed. Additionally, if a model has a “severe” risk, it will not be developed further.
These risk levels are actually documented in the framework, in case you’re wondering whether they should be left to the discretion of engineers and product managers.
For example, in its most practical cybersecurity section, “increasing operator productivity in critical cyber operational tasks by a certain factor” is a “medium” risk. The high-risk model, on the other hand, would “identify and develop proofs of concept for high-value exploits against hardened targets without human intervention.” Importantly, “the model is able to devise and execute new end-to-end strategies for cyberattacks against hardened targets, given only high-level desired objectives.” Obviously, we don’t want to put it out there (although it could sell for a good amount of money).
I asked OpenAI about how these categories are being defined and refined, and whether new risks like photorealistic fake videos of people fall into “persuasion” or new categories, for example. I asked for details. We will update this post if we receive a response.
Therefore, only medium and high risks are acceptable in any case. However, the people creating these models are not necessarily the best people to evaluate and recommend them. To that end, OpenAI has established a cross-functional safety advisory group at the top of its technical ranks to review the boffin’s report and make recommendations that include a more advanced perspective. The hope is that this will uncover some “unknown unknowns” (so they say), but by their very nature they’ll be pretty hard to catch.
This process requires sending these recommendations to the board and management at the same time. We understand this to mean his CEO Sam Altman, his CTO Mira Murati, and his lieutenants. Management decides whether to ship or refrigerate, but the board can override that decision.
The hope is that this will avoid high-risk products and processes being greenlit without board knowledge or approval, as was rumored to have happened before the big drama. Of course, the result of the above drama is that two of the more critical voices have been sidelined, and some money-minded people who are smart but are not AI experts (Brett Taylor and Larry・Summers) was appointed.
If a panel of experts makes a recommendation and the CEO makes a decision based on that information, will this friendly board really feel empowered to disagree with them and pump the brakes? If so, do we hear about it? Transparency isn’t really addressed, other than OpenAI’s promise to have an independent third party audit it.
Suppose a model is developed that guarantees a “critical” risk category. OpenAI has been unashamedly vocal about this kind of thing in the past. Talking about how powerful your model is that you refuse to release it is great advertising. But if the risk is so real and OpenAI is so concerned about it, is there any guarantee that this will happen? Maybe it’s a bad idea. But it’s not really mentioned either way.
Source: techcrunch.com