Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
ef9dca5
Edits
glenn-rodgers-sf Mar 23, 2026
b9c465f
Update flex-gateway-llm-proxy.adoc
glenn-rodgers-sf Mar 23, 2026
acae00e
Update flex-gateway-llm-proxy.adoc
glenn-rodgers-sf Mar 25, 2026
2517dba
edits
glenn-rodgers-sf Mar 25, 2026
7cd17d1
Update flex-gateway-llm-proxy-create-llm-proxy.adoc
glenn-rodgers-sf Mar 25, 2026
b47e653
Regex policy
glenn-rodgers-sf Mar 25, 2026
01cfb70
Edits
glenn-rodgers-sf Mar 25, 2026
2a245b2
Create flex-gateway-llm-proxy-client-name.adoc
glenn-rodgers-sf Mar 25, 2026
ec1c593
Update flex-gateway-llm-proxy-request.adoc
glenn-rodgers-sf Mar 26, 2026
c70bb3b
edit
glenn-rodgers-sf Mar 26, 2026
48fe051
Update flex-gateway-llm-proxy-request.adoc
glenn-rodgers-sf Mar 26, 2026
098a27c
Update flex-gateway-llm-proxy-request.adoc
glenn-rodgers-sf Mar 26, 2026
87e5464
Update policy-title-headers.adoc
glenn-rodgers-sf Mar 27, 2026
c70d2e2
edits
glenn-rodgers-sf Mar 27, 2026
b5e7eae
LLM Polcies
glenn-rodgers-sf Mar 27, 2026
63c1984
Change name
glenn-rodgers-sf Mar 27, 2026
0f08805
Edits
glenn-rodgers-sf Mar 27, 2026
41d6acf
Update flex-gateway-llm-proxy-request.adoc
glenn-rodgers-sf Mar 28, 2026
4e69331
Update flex-gateway-llm-proxy-policies.adoc
glenn-rodgers-sf Mar 30, 2026
02d41e7
Update flex-gateway-llm-proxy-policies.adoc
glenn-rodgers-sf Mar 30, 2026
aab1ef9
Update flex-gateway-llm-proxy.adoc
glenn-rodgers-sf Mar 30, 2026
079c811
Update flex-gateway-llm-proxy.adoc
glenn-rodgers-sf Mar 30, 2026
aeb2ff5
Edits
glenn-rodgers-sf Mar 30, 2026
c718360
Update flex-gateway-llm-proxy-policies.adoc
glenn-rodgers-sf Mar 30, 2026
89f8482
Merge branch 'latest' into LLM-proxy-GA
glenn-rodgers-sf Mar 30, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions gateway/1.12/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,13 @@
* xref:flex-gateway-managed-set-up.adoc[]
* xref:flex-gateway-managed-ingress-egress.adoc[]

// LLM Proxy
* xref:flex-gateway-llm-proxy.adoc[]
** xref:flex-gateway-llm-proxy-create-llm-proxy.adoc[]
** xref:flex-gateway-llm-proxy-request.adoc[]
** xref:flex-gateway-llm-proxy-token-reports.adoc[]
** xref:flex-gateway-llm-proxy-policies.adoc[]

// Setting Up Self-Managed Flex Gateway
* xref:flex-gateway-set-up.adoc[]
** xref:flex-install.adoc[Download Flex Gateway]
Expand Down Expand Up @@ -89,6 +96,7 @@
*** xref:policies-included-a2a-prompt-decorator.adoc[A2A Prompt Decorator]
*** xref:policies-included-a2a-schema-validation.adoc[A2A Schema Validation]
*** xref:policies-included-a2a-token-rate-limit.adoc[A2A Token Based Rate Limit]
*** xref:policies-included-llm-token-rate-limit.adoc[LLM Token Based Rate Limit]
*** xref:policies-included-agent-connection-telemetry.adoc[Agent Connection Telemetry Policy]
*** xref:policies-included-basic-auth-ldap.adoc[Basic Authentication: LDAP]
*** xref:policies-included-basic-auth-simple.adoc[Basic Authentication: Simple]
Expand Down Expand Up @@ -119,6 +127,7 @@
*** xref:policies-included-openid-token-enforcement.adoc[OpenID Connect OAuth 2.0 Token Enforcement]
*** xref:policies-included-rate-limiting.adoc[Rate Limiting]
*** xref:policies-included-rate-limiting-sla.adoc[Rate Limiting: SLA-Based]
*** xref:policies-included-regex-prompt-guard.adoc[Regex Prompt Guard]
*** xref:policies-included-response-timeout.adoc[Response Timeout]
*** xref:policies-included-schema-validation.adoc[Schema Validation]
*** xref:policies-included-soap-schema-validation.adoc[SOAP Schema Validation]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@
[[connected-mode]]
=== Managed Flex Gateway and Flex Gateway Connected Mode

When you apply the policy to your API instance from the UI, the following parameters are displayed:
When you apply the policy from the UI, the following parameters are displayed:

// end::ui[]

// tag::configFile[]
[[local-mode]]
=== Flex Gateway Local Mode

In Local Mode, you apply the policy to your API via declarative configuration files. Refer to the following policy definition and table of parameters:
When you apply the policy via declarative configuration files, Refer to the following policy definition and table of parameters:

// end::configFile[]

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
= Creating an LLM Proxy
ifndef::env-site,env-github[]
include::_attributes.adoc[]
endif::[]
:imagesdir: ../assets/images

You can configure the LLM Proxy to use different models and different routes.

NOTE: A large Flex Gateway supports up to 50 LLM Proxies.

[[before-you-begin]]
== Before You Begin

. Deploy a Flex Gateway version 1.11.4 or later where you want to deploy your LLM Proxy.
+
See xref:flex-gateway-managed-set-up.adoc[].
. Ensure you have the API Manager *API Creator* permission.
. Retrieve your API keys from your LLM Providers.

[[create-an-llm-proxy]]
== Create an LLM Proxy

. From API Manager, click *LLM Proxies*.
. Click *+ Add LLM Proxy*.
. Configure the *Inbound Endpoint* of the LLM Proxy:
.. Define a *LLM Proxy Name*.
.. Select an endpoint *Format*:
+
** OpenAI: Select the OpenAI API format to send requests to all supported LLM Providers (including Gemini).
** Gemini: Select the Gemini API format to send requests to only Gemini.
.. Define a *Base path*.
.. Select *Advanced options* if necessary.
.. Click *Next*.
. Select a Flex Gateway to deploy the server instance to from *Select a gateway*.
. Configure the routes that comprise the *Outbound Endpoint*:
.. Select your *LLM Provider*.
.. Ensure the *URL* for your provider is correct. Edit if necessary.
.. Configure access details for the provider endpoint.
.. Select a *Target Model* to override the model version specified in the payload. Selecting *Not Applicable* sends the request to the specified model. A *Target Model* is required for semantic routing.
+
[NOTE]
====
To configure a target model for Amazon Bedrock Claude Modes, you must enter the provider and model ID formatted as `[provider_prefix]/[internal_model_id]`.

To learn how to find the model ID, see xref:flex-gateway-llm-proxy-request.adoc#amazon-bedrock-model-names[Amazon Bedrock Model Names].
====

.. Click *Add LLM Route* to add additional routes. Complete the previous steps to configure the new route.
+
NOTE: Each LLM Provider can support one route.
. If adding multiple routes, select a *Routing strategy*. To configure your routing strategy, see:
.. <<configure-model-based-routing>>
.. <<configure-semantic-routing>>
.
. Click *Save & Deploy*.

[[configure-model-based-routing]]
== Configure Model-Based Routing

. Configure multiple routes. Click *Add LLM Route* to create new routes.
. Select *Model-based* for *Routing strategy*.
. Choose to enable a *Fallback route* for the request to be sent to if the provider or model is incorrectly sepcified. If enabling a fallback route:
.. Select a *Route* to fallback to.
.. Select a target model for the fallback route to use.
. If no fall back route is configured and a route fails, a error response is returned.
. Return to <<create-an-llm-proxy>> step 7 to finish configuring your LLM Proxy.

[[configure-semantic-routing]]
== Configure Semantic Routing

For semantic routing, define and apply prompt topics to each route. Define deny list topics to block certain requests.

To configure semantic routing:

. Configure multiple routes. Click *Add LLM Route* to create new routes.
. Select *Semantic* for *Routing strategy*.
. If you haven't already, click *Configure Semantic Service*.
+
To create a semantic service, see <<create-a-semantic-service>>.
. Select a *Target Model* for each route.
. Define a prompt topics for the routes:
.. Click the *Select prompt topics*.
.. Click *+ Create prompt topic*.
.. Define a *Prompt topic name*.
.. Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
.. Click *Create*.
.. Create multiple prompt topics for each route as needed.
. Configure a *Fallback route* for the request to be sent to if it doesn't match a semantic route:
.. Specify an accuracy threshold. When the accuracy of the semantic match is less than this threshold, traffic is sent to the fallback route.
.. Select a *Route* to fallback to.
.. Select a *Target model* for the fallback route to use.
. Create a *Semantic prompt guard* to block users from asking the server about specific topics:
.. Click *+ Create deny list*.
.. Define a *Prompt topic name*.
.. Define a *Prompt utterances* or click *Upload utterances* to upload a plain text file containing your prompt utterances.
.. Click *Create*.
.. Create multiple deny list topics to better protect your LLM Proxy.
+
NOTE: Creating a semantic prompt guard automatically applies the Semantic Prompt Guard policy.
. Return to <<create-an-llm-proxy>> step 7 to finish configuring your LLM Proxy.

=== Semantic Routing Limits

[%header%autowidth.spread,cols="a,a"]
|===
| Limit | Value
| Prompt topics (across all routes of an LLM Proxy) | 6
| Utterances per prompt topic | 10
| Deny list topics | 6
| Utterances per deny list topic | 10
| Maximum characters per utterance | 500
|===

[[create-a-semantic-service]]
=== Create and Edit a Semantic Service

A semantic service compares the request to the defined prompt topic utterances and sends the request to the route that best matches it. The semantic service also compares the request to deny list topic utterances to block certain requests. Only one semantic service is support for each environment.

To define a semantic service:

. From API Manager, click *Semantic Service Setup*.
. Click *+ Create Semantic Service*.
. Configure the semantic service parameters:
** *Embedding Service Provider*: The provider of the embedding model. *OpenAI* or *Hugging Face*.
** *URL*: The URL of the embedding service.
** *Model*: The embedding model to use.
** *Auth key*: The API authentication key for the embedding service.
. Click *Deploy*.

To edit a semantic service:

. From *Semantic Service Setup*, click the three-dots menu (image:three-dots-menu.png[3%,3%]) of the semantic service you want to edit.
. Make the necessary edits.
. Click *Redeploy*.

== Edit and Delete an LLM Proxy

To edit an LLM Proxy:

. From API Manager, click *LLM Proxies*.
. Click the name of the LLM Proxy you want to edit.
. Click *Configuration*.
. Switch between the *Inbound*, *Gateway*, and *Outbound* configurations to make the necessary edits.
. Click *Save & Deploy*.

To delete an LLM Proxy:

. From API Manager, click *LLM Proxies*.
. Click the three-dots menu (image:three-dots-menu.png[3%,3%]) of the LLM Proxy you want to delete.
. Click *Delete LLM Proxy*.
. Click *Yes, Delete*.
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
= LLM Proxy Policies
ifndef::env-site,env-github[]
include::_attributes.adoc[]
endif::[]
:imagesdir: ../assets/images

By default, LLM Proxy applies these policies:

* Client ID Enforcement
* LLM Proxy Core Policy
* Model Based Routing Policy or Semantic Routing Policy (policy name dependent on embedded service provider)

You don't need to modify these policies.

LLM Proxy supports most xref:policies-included-directory.adoc[included policies], but doesn't support outbound policies.

These policies are specfic and useful for LLM Proxies:

* xref:policies-included-regex-prompt-guard.adoc[]
* xref:policies-included-llm-token-rate-limit.adoc[]

== Apply Polcies to LLM Proxies

. From API Manager, click *LLM Proxies*.
. Click the name of the LLM Proxy you want to apply a policy to.
. Click *AI Policies*.
. Click *+ Add inbound policy*.
. Select the policy to apply.
. Configure the required parameters.
+
For policy configuration parameters, see xref:policies-included-directory.adoc[].
. If necessary, configure *Advanced options*.
. Click *Apply*.

== LLM Proxy Authentication Policy

By default, the LLM Proxy has the Client ID Enforcement policy applied.
This is required because the Client ID Enforcement populates the `Authentication.clientName` variable in the `Authentication` object that is used as a unique identifier for LLM Metrics.

To remove the Client ID Enforcement policy, ensure that you either:

* Apply a policy that populates `Authentication.clientName`:
+
** xref:policies-included-client-id-enforcement.adoc[]
** xref:policies-included-rate-limiting-sla.adoc[]
** xref:policies-included-oauth-token-introspection.adoc[] (If Client ID enforcement is configured, `skipClientIdValidation=false`)
** xref:policies-included-openid-token-enforcement.adoc[] (If Client ID enforcement is configured, `skipClientIdValidation=false`)
** xref:policies-included-jwt-validation.adoc[] (If Client ID enforcement is configured, `skipClientIdValidation=false`)
** A custom policy that populates `Authentication.clientName`

* Edit the dataweave variable in LLM Proxy Core to extract a different unique identifier, such as `clientid`, `userid`, or `departmentid`.
+
NOTE: You can't filter by this unique identifier in Usage Reports.

== See Also

* xref:flex-gateway-secure-apis.adoc[]
* xref:policies-included-directory.adoc[]
Loading