OpenAI's Superalignment Team Tackles the Challenge of Controlling Superintelligent AI

Following the controversy of Sam Altman's sudden departure and subsequent return to OpenAI, the firm's Superalignment has maintained its focus on addressing the intricate challenge of controlling artificial intelligence (AI) beyond human capabilities.

Ilya Sutskever, a co-founder and chief scientist of the Superalignment team, is responsible for developing strategies for controlling, regulating, and governing superintelligent AI systems. Despite skepticism within the AI research community about the prematurity and relevance of superalignment efforts, Sutskever, involved in Altman's ouster, actively leads the team, focusing on governance and control frameworks for future powerful AI systems.

Comprising members such as Collin Burns, Pavel Izmailov, and Leopold Aschenbrenner, the team recently unveiled its latest work at NeurIPS, the annual machine learning conference in New Orleans, as reported by TechCrunch.

Study Results Not Perfect, Though Promising

The team utilizes an analogy where a weaker AI model, like GPT-2, guides a more advanced model, such as GPT-4, toward desired outcomes and away from undesirable ones. This approach is crucial in achieving alignment goals for superintelligent AI, addressing the nuanced challenges involved.

In the future, humans will need to supervise AI systems much smarter than them.

We study an analogy: small models supervising large models.

Read the Superalignment team's first paper showing progress on a new approach, weak-to-strong generalization: https://t.co/BiBpGdqtKj pic.twitter.com/jmbbwcrmm8
— OpenAI (@OpenAI) December 14, 2023

In a recent study, OpenAI trained its GPT-2 model to perform various tasks, including chess puzzles and sentiment analysis. Utilizing the responses generated by GPT-2, the researchers then trained the GPT-4 model. According to Tech.co, results indicated a performance improvement of 20-70% compared to GPT-2, showcasing GPT-4's superiority but falling short of its full potential.

GPT-4 demonstrated the ability to avoid many mistakes made by its predecessor, GPT-2, highlighting a phenomenon known as 'weak-to-strong generalization.' This occurs when a model possesses implicit knowledge of task execution, allowing it to perform correctly despite receiving suboptimal instructions.

Researchers think that if weak-to-strong generalization exists in humans supervising superintelligent AI, then future AGI models might be better at spotting dangerous actions, especially ones that could cause huge damage.

Despite favorable results, GPT-4's performance was still impaired after training with GPT-2. The experiment underscored the need for further research before deeming humans suitable supervisors for stronger AI models.

OpenAI, a pioneer in AI development, has consistently expressed concerns about the potential dangers of superintelligent AI. Co-founder Sutskever emphasized the "obvious" concern of preventing AI from going rogue, acknowledging the technology's potential hazards, including the disempowerment or even extinction of humanity if left unchecked.

To bolster research in this area, OpenAI introduced a $10 million grant program for technical research on superintelligent alignment. Academic labs, nonprofits, individual researchers, and graduate students are eligible for various grant tranches.

Additionally, OpenAI plans to host an academic conference on superalignment in early 2025, providing a platform to share and promote the work of prize finalists.

The grant program notably includes funding from Eric Schmidt, former Google CEO and chairman, renowned for his advocacy of AI doomism. Despite potential commercial benefits, Schmidt emphasizes the importance of aligning AI with human values and expresses pride in supporting OpenAI's efforts to develop and control AI for public benefit responsibly.

Pope Francis Warns World Leaders on Possible AI Exploitation

Amid the fast-paced development of AI, Pope Francis issued a cautionary message to global leaders regarding the risks posed by the unbridled progress of the technology, emphasizing its profound threat to humanity.

According to Fox News, the pontiff acknowledged the benefits of scientific and technological advancements in an address before the 57th World Day of Peace, but he expressed concern about the unprecedented control over reality that these advancements present, suggesting possibilities that could jeopardize survival.

The Pope urged leaders to scrutinize the aims and interests of AI developers, cautioning against selfish motives and emphasizing the need to direct research toward peace, the common good, and integral human development.

The leader of the Roman Catholic Church also warned against the potential exploitation of AI efficiencies by technocratic systems, risking unclear decision criteria and concealed "obligation to act for the benefit of the community."