feat: Automated Deployment Lookup & Model Constants #50

MatKuhr · 2024-08-01T14:48:51Z

No description provided.

marikaner · 2024-08-02T12:31:29Z

packages/gen-ai-hub/src/core/aicore.ts

+export class AiDeployment {
+  id: string;
+  scenarioId?: string;
+  constructor(deploymentId: string) {
+    this.id = deploymentId;
+  }
+}


[q] I don't see why this should be a class. Why not just an interface?

TS noob at work, I played around with different concepts. Will change it 👍🏻

marikaner · 2024-08-02T12:34:18Z

packages/gen-ai-hub/src/core/aicore.ts

+
+// TODO: figure out what the best search criteria are
+// TODO: discuss to use model?: FoundationModel instead for a search
+export async function resolveDeployment(opts: { scenarioId: string, executableId?: string, modelName?: string, modelVersion?: string }, destination?: HttpDestination): Promise<AiDeployment> {


[q] Adding to your question of what the best search criteria are: Maybe I am understanding something wrong, but it seems to me that in your current design the model information are required when calling chatCompletion(), but here the scenario ID is required. Does this go together? Seems inconsistent to me.

A scenario will always be required. When using OpenAI the scenario currently always is foundation-models, orchestration is orchestration. Currently, nothing else is possible, but users will be able to create their own scenarios in the future.

A model is not always required, e.g. for orchestration there is no model information on the deployment:

{ "configurationId": "1234", "configurationName": "orchestration", "deploymentUrl": "https://foo.com/1234", "details": { "resources": { "backend_details": {} }, "scaling": { "backend_details": {} } }, "id": "1234", "scenarioId": "orchestration", "status": "RUNNING" }

marikaner · 2024-08-02T12:37:30Z

packages/gen-ai-hub/src/client/openai/openai-client.ts


 const apiVersion = '2024-02-01';

 /**
 * OpenAI GPT Client.
 */
-export class OpenAiClient implements BaseClient<BaseLlmParameters> {
+export class OpenAiClient {


[q] I don't remember what the BaseClient was doing, but don't we need it?

I don't think it was doing anything. It allowed for representing data using a generic <T> in the HTTP client code, that is any now. But since we don't do anything with the data, but just pass it to the HTTP client as body, I figured it's not needed right now.

I mostly removed it to make the refactoring easier, I could probably add it back in if we want to.

marikaner · 2024-08-02T13:51:20Z

packages/gen-ai-hub/src/client/openai/openai-client.ts

  async chatCompletion(
+    model: OpenAiChatModel,
    data: OpenAiChatCompletionParameters,
+    deploymentResolver: DeploymentResolver = getDeploymentResolver(model),


[req] Previously we discussed the idea of a deployment configuration, where the user could set different parameters to identify the deployment, e.g. { deploymentId: 'xx'} or { modelName: 'xx' }. I think this was quite a good idea as it is simply and clear, but not too verbose.
What I like about your proposal is, that the model for the OpenAiClient is restricted to OpenAI models only, which makes sense and I would like to keep that.
However, I don't like:

That users have to specify a constant to identify the model. I would prefer a string literal union type, so that they don't need to import anything additional. I guess the versioning might affect the design here.

The separation between model and deployment resolver. This is not 100% clear to me yet, but semantically I don't think the separation is needed, because the user provides this for the same purpose. I think the original idea of the deployment configuration covers this in a cleaner way. It could even be extended by a deployment resolver, however, I wonder whether this isn't premature. Are we aware of any use cases, where users would want to overwrite the resolution behavior?

Good points, I also asked myself these two questions when implementing this.

(1) The reason I went with a complex type to represent the models is that it would be useful to have more information about them, e.g. if they are text/embedding/image models, which versions they have or if they are multi-modal. Maybe more information in the future, e.g. if they allow for other specific features like response format.

However, I'm not sure yet which aspects we want to be constant, and where we would need to have dynamic information. E.g., if we want to allow the user to select a specific version of a model, we either need (modelName * version) constants, or some constructor / function to create an object dynamically.

Finally, I'd agree that string is simpler. I don't know how common enum or enum-like style is in TS. If we prefer, we can stick to string for simplicity or alternatively allow both options somehow.

(2) I split model and deployment as I see them as conceptually different things: A model is a description of a foundation model and its capabilities (e.g. versions, parameter options etc.). A deployment is a connectivity detail of AI Core (has a URL, is linked to a configuration, scenario, resource group).

So far it just so happens that only one of the two is needed for the basic OpenAI related functionality, since we currently only care about the deployment id. But already for Orchestration the model is part of the payload and not tied to any deployment. Also, a deployment can be changed at runtime to host different models.

Separating the two allows us to evolve the connectivity aspects independently from another. For example, we could (and probably should in some way) re-use the model constants in the Orchestration service API, assisting users with which models are available.

Still, it should be as convenient as possible for the user. What I liked about the current approach is that it nudges users into using the convenience functionality, and away from hard-coding GUIDs into their application code. Maybe this is also possible with other approaches. I am curious what else we will come up with 😉

marikaner · 2024-08-02T13:52:29Z

packages/gen-ai-hub/src/core/aicore.ts

+  const deployment = deploymentList[0];
+  const result = new AiDeployment(deployment.id)
+  result.scenarioId = deployment.scenarioId;
+  return result;


If we used an interface instead of a class for AiDeployment this could be simplified:

Suggested change

const deployment = deploymentList[0];

const result = new AiDeployment(deployment.id)

result.scenarioId = deployment.scenarioId;

return result;

const { id, scenarioId } = deploymentList[0];

return { id, scenarioId };

deekshas8 · 2024-08-12T08:34:40Z

packages/gen-ai-hub/src/core/http-client.ts

-  const mergedRequestConfig = {
-    ...mergeWithDefaultRequestConfig(apiVersion, requestConfig),
-    data: JSON.stringify(body)
-  };

  const targetUrl =
    aiCoreDestination.url +
    '/v2/inference/deployments/' +


[q] Can we not pass /v2/inference/deployments/ part of the url also from the openai/orchestration chatCompletion method?
For the apiVersion, the check params: apiVersion ? { 'api-version': apiVersion } : {} should ensure that its passed only when needed? EndpointOptions is internal anyway, so we have control over when this needs to be passed?

[q] Can we not pass /v2/inference/deployments/ part of the url also from the openai/orchestration chatCompletion method?

we could, but to what benefit? The assumption of this method is that it will always serve inference requests, otherwise the EndpointOptions doesn't make sense anymore.

We could consider having two methods, something like: executeRequest(path, payload) and executeInferenceRequest(EndpointOptions, payload).

For the apiVersion, the check params: apiVersion ? { 'api-version': apiVersion } : {} should ensure that its passed only when needed? EndpointOptions is internal anyway, so we have control over when this needs to be passed?

Here IDK what you mean..

This was not originally intended to serve only inference requests, atleast at the time we considered only llm access requests and not ai-core at all. Hence hardcoded the inference part. This was meant as a wrapper around the generic http-client from Cloud SDK for all requests (unless the req are completely different). In that direction, my thought was to take the /v2/inference/deployments/ part away from this function and pass it instead from the respective function as /v2/inference/deployments/completion. Then the path is merged to the baseUrl fetched from the destination to create the final url.

The request config merge code feel duplicated to me atm with the exception of apiVersion. This is optional in EndpointOptions anyway and is not set by the user. I assumed with this check params: apiVersion ? { 'api-version': apiVersion } : {} we can ensure this is set only when needed.

Anyway, I didn't review the whole PR so I might be missing context. If there are reasons why this function cannot stay generic, then its fine.

Ah okay, got it. Yes, I think we can make this generic, there is no particular reason for the current design. In this PR, I just went with what I felt makes adding the deployment ID lookup and testing easier, but this wasn't the main focus of this PR of course.

I would suggest we extract this entire change out of this PR and do it as a follow up to #61 / with the fix to the execute method.

I'll probably anyway have to reset this PR later on, removing anything that is not explicitly the constants + deployment ID lookup 😄

MatKuhr · 2024-08-06T11:57:34Z

packages/gen-ai-hub/src/client/openai/openai-client.ts

    );
    return response.data;
  }
+
+  mergeRequestConfig(requestConfig?: CustomRequestConfig): HttpRequestConfig {


TODO: fix this to be a deep merge instead

MatKuhr · 2024-08-12T12:15:44Z

packages/gen-ai-hub/src/core/http-client.ts

-  const mergedRequestConfig = {
-    ...mergeWithDefaultRequestConfig(apiVersion, requestConfig),
-    data: JSON.stringify(body)
-  };

  const targetUrl =
    aiCoreDestination.url +
    '/v2/inference/deployments/' +


[q] Can we not pass /v2/inference/deployments/ part of the url also from the openai/orchestration chatCompletion method?

we could, but to what benefit? The assumption of this method is that it will always serve inference requests, otherwise the EndpointOptions doesn't make sense anymore.

We could consider having two methods, something like: executeRequest(path, payload) and executeInferenceRequest(EndpointOptions, payload).

For the apiVersion, the check params: apiVersion ? { 'api-version': apiVersion } : {} should ensure that its passed only when needed? EndpointOptions is internal anyway, so we have control over when this needs to be passed?

Here IDK what you mean..

MatKuhr · 2024-08-18T22:49:01Z

moved to #75

MatKuhr added 7 commits August 1, 2024 09:25

WIP

57ff202

Heavy WIP

d2affce

Progress

4e240cf

More refactoring

509a24a

Try fix inference mocking

e70a33e

Implement deployment retrieval

7b2ab54

Update more tests

71ff4ad

marikaner reviewed Aug 2, 2024

View reviewed changes

MatKuhr and others added 7 commits August 4, 2024 21:47

Change AiDeployment to interace

c78de50

Update jest config

d4e9d76

Merge from main

f6d1d0b

Update E2E Tests

8c8c8f2

Updates

c629e13

More jest fixes

37d4bd5

Merge branch 'main' into feat/model-constants

63e7139

deekshas8 reviewed Aug 12, 2024

View reviewed changes

MatKuhr commented Aug 12, 2024

View reviewed changes

MatKuhr closed this Aug 18, 2024

MatKuhr deleted the feat/model-constants branch August 18, 2024 22:49

MatKuhr mentioned this pull request Aug 19, 2024

feat: Model Discovery #75

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Automated Deployment Lookup & Model Constants #50

feat: Automated Deployment Lookup & Model Constants #50

MatKuhr commented Aug 1, 2024

marikaner Aug 2, 2024

MatKuhr Aug 2, 2024

marikaner Aug 2, 2024

MatKuhr Aug 2, 2024

marikaner Aug 2, 2024

MatKuhr Aug 2, 2024

marikaner Aug 2, 2024

MatKuhr Aug 2, 2024

marikaner Aug 2, 2024

deekshas8 Aug 12, 2024

MatKuhr Aug 12, 2024

deekshas8 Aug 13, 2024

MatKuhr Aug 13, 2024

MatKuhr Aug 6, 2024

MatKuhr Aug 12, 2024

MatKuhr commented Aug 18, 2024

feat: Automated Deployment Lookup & Model Constants #50

feat: Automated Deployment Lookup & Model Constants #50

Conversation

MatKuhr commented Aug 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MatKuhr commented Aug 18, 2024