Notebooks
Jailbreak Resistance
By using Feature Activations and Contrastive Search we can build a jailbreak resistant model.
Through this approach we were able to drastically lower the ability to jailbreak the model, using jailbreak prompts from the StrongREJECT dataset.