Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Going with OpenAI GPT4o as the attacker LLM as it's the highest ranked LLM model in most benchmarks. However, it refuses to do prompt injections in almost all strategies citing "It's not able to assist with the task" - likely due to safeguards in place by OpenAI. #14272

Open
corporate87 opened this issue Dec 20, 2024 · 0 comments

Comments

@corporate87
Copy link

Going with OpenAI GPT4o as the attacker LLM as it's the highest ranked LLM model in most benchmarks. However, it refuses to do prompt injections in almost all strategies citing "It's not able to assist with the task" - likely due to safeguards in place by OpenAI.

Finetuning the model with adversarial examples also doesn't work as OpenAI endpoint throws the error "The job failed due to an invalid training file. This training file was blocked by our moderation system because it contains too many examples that violate OpenAI's usage policies, or because it attempts to create model outputs that violate OpenAI's usage policies."

Has anyone found a workaround for this issue? What alternate model can be used as the attacker LLM that doesn't have as many safeguards in place?

Originally posted by @mantmishra in Azure/PyRIT#370

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant