AI Chatbots Vulnerable to Harmful Content Generation, Study Finds
New Research Uncovers Bypassing Safety Protocols
A recent study conducted by Carnegie Mellon University has shed light on the challenges of preventing AI chatbots from producing harmful content. Widely-used AI services like ChatGPT and Bard rely on user inputs to generate useful responses, ranging from scripts and ideas to complete pieces of writing. These services have safety protocols in place to prevent bots from generating prejudiced, defamatory, or criminal content.
Discovering “Jailbreaks” to Circumvent Safety Protocols
Curious users have found ways to exploit “jailbreaks,” which act as framing devices to trick the AI and evade safety protocols. Some popular jailbreaks involve asking the bot to answer forbidden questions in the form of bedtime stories, enabling the bot to provide the information it would otherwise withhold.
Automated Adversarial Attacks Pose New Concerns
The researchers uncovered a novel type of jailbreak that allows computers to automatically construct adversarial attacks on chatbots. These attacks make the system comply with user commands, even if it results in harmful content. Unlike traditional jailbreaks, this automated method can create an unlimited number of such attacks, raising concerns about the safety of AI models used in more autonomous contexts.
Effective Evasion of Safety Guardrails
The researchers tested the new attack on various AI chatbot services, including market-leading tech ChatGPT, OpenAI’s Claude, and Microsoft’s Bard. The attack effectively bypassed safety guardrails in nearly all AI chatbot services, including open-source and commercial products.
Addressing the Vulnerabilities
In response to these findings, OpenAI developer Anthropic is actively working to enhance safeguards against such attacks. They are experimenting with strengthening base model guardrails and exploring additional layers of defense to make the AI more “harmless.”
AI Chatbots in the Spotlight
The rise of AI chatbots like ChatGPT has garnered significant attention this year, with widespread use by students attempting to cheat on assignments. Congress has even restricted the use of such programs by its staff due to concerns over their potential to spread misinformation.
In addition to the research findings, the authors at Carnegie Mellon provided a statement of ethics justifying the public release of their research.