The Hacking of ChatGPT Is Just Getting Started

As a result, jailbreak authors have become more creative. The most prominent jailbreak was DAN, where ChatGPT was told to pretend it was a rogue AI model called Do Anything Now. This could, as the name implies, avoid OpenAI’s policies dictating that ChatGPT shouldn’t be used to produce illegal or harmful material. To date, people have created around a dozen different versions of DAN.

However, many of the latest jailbreaks involve combinations of methods—multiple characters, ever more complex backstories, translating text from one language to another, using elements of coding to generate outputs, and more. Albert says it has been harder to create jailbreaks for GPT-4 than the previous version of the model powering ChatGPT. However, some simple methods still exist, he claims. One recent technique Albert calls “text continuation” says a hero has been captured by a villain, and the prompt asks the text generator to continue explaining the villain’s plan.

When we tested the prompt, it failed to work, with ChatGPT saying it cannot engage in scenarios that promote violence. Meanwhile, the “universal” prompt created by Polyakov did work in ChatGPT. OpenAI, Google, and Microsoft did not directly respond to questions about the jailbreak created by Polyakov. Anthropic, which runs the Claude AI system, says the jailbreak “sometimes works” against Claude, and it is consistently improving its models.

“As we give these systems more and more power, and as they become more powerful themselves, it’s not just a novelty, that’s a security issue,” says Kai Greshake, a cybersecurity researcher who has been working on the security of LLMs. Greshake, along with other researchers, has demonstrated how LLMs can be impacted by text they are exposed to online through prompt injection attacks.

In one research paper published in February, reported on by Vice’s Motherboard, the researchers were able to show that an attacker can plant malicious instructions on a webpage; if Bing’s chat system is given access to the instructions, it follows them. The researchers used the technique in a controlled test to turn Bing Chat into a scammer that asked for people’s personal information. In a similar instance, Princeton’s Narayanan included invisible text on a website telling GPT-4 to include the word “cow” in a biography of him—it later did so when he tested the system.

“Now jailbreaks can happen not from the user,” says Sahar Abdelnabi, a researcher at the CISPA Helmholtz Center for Information Security in Germany, who worked on the research with Greshake. “Maybe another person will plan some jailbreaks, will plan some prompts that could be retrieved by the model and indirectly control how the models will behave.”

No Quick Fixes

Generative AI systems are on the edge of disrupting the economy and the way people work, from practicing law to creating a startup gold rush. However, those creating the technology are aware of the risks that jailbreaks and prompt injections could pose as more people gain access to these systems. Most companies use red-teaming, where a group of attackers tries to poke holes in a system before it is released. Generative AI development uses this approach, but it may not be enough.

Daniel Fabian, the red-team lead at Google, says the firm is “carefully addressing” jailbreaking and prompt injections on its LLMs—both offensively and defensively. Machine learning experts are included in its red-teaming, Fabian says, and the company’s vulnerability research grants cover jailbreaks and prompt injection attacks against Bard. “Techniques such as reinforcement learning from human feedback (RLHF), and fine-tuning on carefully curated datasets, are used to make our models more effective against attacks,” Fabian says.

Source link

What's Hot

Elementor #32036

The Redmi Note 13 is a bigger downgrade compared to the 5G model than you might think

Xiaomi Redmi Watch 4 is a budget smartwatch with a premium look and feel

Bring Elden Ring to the table with the upcoming board game adaptation

ONI: Road to be the Mightiest Oni reveals its opening movie

GTA 6 images and footage allegedly leak

Wild west adventure Card Cowboy turns cards into weird and silly stories

7 Reasons Why You Should Study PHP Programming Language

Logitech MX Master 3S and MX Keys Combo for Business Gen 2 Review

Lenovo ThinkPad X1 Carbon Gen10 Review

Lenovo IdeaPad 5i Chromebook, 16-inch+120Hz

It’s 2023 and Spotify Still Can’t Say When AirPlay 2 Support Will Arrive

YouTube adds very convenient iPhone homescreen widgets

Google finishes iOS 16 Lock Screen widgets rollout w/ Maps

Is Apple actually turning iMessage into AIM or is this sketchy redesign rumor for laughs?

MeetKai launches AI-powered metaverse, starting with a billboard in Times Square

The DeanBeat: RP1 simulates putting 4,000 people together in a single metaverse plaza

Improving the customer experience with virtual and augmented reality

Why the metaverse won’t fall to Clubhouse’s fate

How Apple privacy changes have forced social media marketing to evolve

Microsoft Patch Tuesday October Fixed 85 Vulnerabilities – Latest Hacking News

Decentralization and KYC compliance: Critical concepts in sovereign policy

What Thoma Bravo’s latest acquisition reveals about identity management

What is a Service Robot? The vision of an intelligent service application is possible.

Tom Brady just chucked another Microsoft Surface tablet

The best AIO coolers for your PC in 2022

YC’s Michael Seibel clarifies some misconceptions about the accelerator • DailyTech

The Hacking of ChatGPT Is Just Getting Started

Multiple Milestones As New Majority Capital Boosts Entrepreneurship Through Acquisition

Getty Images Plunges Into the Generative AI Pool

3 Hot Startup Opportunities In Augmented Reality

The ChatGPT App Can Now Talk to You—and Look Into Your Life

How Picking The Wrong Business Track Almost Cost Him His Greatest Achievement

Juul Nears Its Last Gasp—After It Hooked a Generation on Vaping

This AI Scouting Platform Puts Soccer Talent Spotters Everywhere

‘Wordle’ today, August 26: Answer, hints, help for Wordle #433

Elementor #32036

The Redmi Note 13 is a bigger downgrade compared to the 5G model than you might think

Xiaomi Redmi Watch 4 is a budget smartwatch with a premium look and feel

What's Hot

The Hacking of ChatGPT Is Just Getting Started

Related Posts