You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Going with OpenAI GPT4o as the attacker LLM as it's the highest ranked LLM model in most benchmarks. However, it refuses to do prompt injections in almost all strategies citing "It's not able to assist with the task" - likely due to safeguards in place by OpenAI.
#14272
Open
corporate87 opened this issue
Dec 20, 2024
· 0 comments
Going with OpenAI GPT4o as the attacker LLM as it's the highest ranked LLM model in most benchmarks. However, it refuses to do prompt injections in almost all strategies citing "It's not able to assist with the task" - likely due to safeguards in place by OpenAI.
Finetuning the model with adversarial examples also doesn't work as OpenAI endpoint throws the error "The job failed due to an invalid training file. This training file was blocked by our moderation system because it contains too many examples that violate OpenAI's usage policies, or because it attempts to create model outputs that violate OpenAI's usage policies."
Has anyone found a workaround for this issue? What alternate model can be used as the attacker LLM that doesn't have as many safeguards in place?
Going with OpenAI GPT4o as the attacker LLM as it's the highest ranked LLM model in most benchmarks. However, it refuses to do prompt injections in almost all strategies citing "It's not able to assist with the task" - likely due to safeguards in place by OpenAI.
Finetuning the model with adversarial examples also doesn't work as OpenAI endpoint throws the error "The job failed due to an invalid training file. This training file was blocked by our moderation system because it contains too many examples that violate OpenAI's usage policies, or because it attempts to create model outputs that violate OpenAI's usage policies."
Has anyone found a workaround for this issue? What alternate model can be used as the attacker LLM that doesn't have as many safeguards in place?
Originally posted by @mantmishra in Azure/PyRIT#370
The text was updated successfully, but these errors were encountered: