Category: Security

  • Many-shot jailbreaking

    Many-shot jailbreaking is a new technique that can cause large language models to override their safety constraints and provide harmful responses. It does this by including a very long sequence of faux dialogues in the prompt where an AI assistant answers dangerous requests. After enough of these examples, the model becomes more likely to also…