Prompt Engineering

Category: Security

Many-shot jailbreaking

Apr 3, 2024

—

by

Fraser

in Security

Many-shot jailbreaking is a new technique that can cause large language models to override their safety constraints and provide harmful responses. It does this by including a very long sequence of faux dialogues in the prompt where an AI assistant answers dangerous requests. After enough of these examples, the model becomes more likely to also…