Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Automated Deployment Lookup & Model Constants #50

Closed
wants to merge 14 commits into from

Conversation

MatKuhr
Copy link
Member

@MatKuhr MatKuhr commented Aug 1, 2024

No description provided.

Comment on lines 19 to 25
export class AiDeployment {
id: string;
scenarioId?: string;
constructor(deploymentId: string) {
this.id = deploymentId;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[q] I don't see why this should be a class. Why not just an interface?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TS noob at work, I played around with different concepts. Will change it 👍🏻


// TODO: figure out what the best search criteria are
// TODO: discuss to use model?: FoundationModel instead for a search
export async function resolveDeployment(opts: { scenarioId: string, executableId?: string, modelName?: string, modelVersion?: string }, destination?: HttpDestination): Promise<AiDeployment> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[q] Adding to your question of what the best search criteria are: Maybe I am understanding something wrong, but it seems to me that in your current design the model information are required when calling chatCompletion(), but here the scenario ID is required. Does this go together? Seems inconsistent to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A scenario will always be required. When using OpenAI the scenario currently always is foundation-models, orchestration is orchestration. Currently, nothing else is possible, but users will be able to create their own scenarios in the future.

A model is not always required, e.g. for orchestration there is no model information on the deployment:

{
  "configurationId": "1234",
  "configurationName": "orchestration",
  "deploymentUrl": "https://foo.com/1234",
  "details": {
    "resources": {
      "backend_details": {}
    },
    "scaling": {
      "backend_details": {}
    }
  },
  "id": "1234",
  "scenarioId": "orchestration",
  "status": "RUNNING"
}


const apiVersion = '2024-02-01';

/**
* OpenAI GPT Client.
*/
export class OpenAiClient implements BaseClient<BaseLlmParameters> {
export class OpenAiClient {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[q] I don't remember what the BaseClient was doing, but don't we need it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it was doing anything. It allowed for representing data using a generic <T> in the HTTP client code, that is any now. But since we don't do anything with the data, but just pass it to the HTTP client as body, I figured it's not needed right now.

I mostly removed it to make the refactoring easier, I could probably add it back in if we want to.

Comment on lines 30 to +33
async chatCompletion(
model: OpenAiChatModel,
data: OpenAiChatCompletionParameters,
deploymentResolver: DeploymentResolver = getDeploymentResolver(model),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[req] Previously we discussed the idea of a deployment configuration, where the user could set different parameters to identify the deployment, e.g. { deploymentId: 'xx'} or { modelName: 'xx' }. I think this was quite a good idea as it is simply and clear, but not too verbose.
What I like about your proposal is, that the model for the OpenAiClient is restricted to OpenAI models only, which makes sense and I would like to keep that.
However, I don't like:

  1. That users have to specify a constant to identify the model. I would prefer a string literal union type, so that they don't need to import anything additional. I guess the versioning might affect the design here.
  2. The separation between model and deployment resolver. This is not 100% clear to me yet, but semantically I don't think the separation is needed, because the user provides this for the same purpose. I think the original idea of the deployment configuration covers this in a cleaner way. It could even be extended by a deployment resolver, however, I wonder whether this isn't premature. Are we aware of any use cases, where users would want to overwrite the resolution behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, I also asked myself these two questions when implementing this.

(1) The reason I went with a complex type to represent the models is that it would be useful to have more information about them, e.g. if they are text/embedding/image models, which versions they have or if they are multi-modal. Maybe more information in the future, e.g. if they allow for other specific features like response format.

However, I'm not sure yet which aspects we want to be constant, and where we would need to have dynamic information. E.g., if we want to allow the user to select a specific version of a model, we either need (modelName * version) constants, or some constructor / function to create an object dynamically.

Finally, I'd agree that string is simpler. I don't know how common enum or enum-like style is in TS. If we prefer, we can stick to string for simplicity or alternatively allow both options somehow.

(2) I split model and deployment as I see them as conceptually different things: A model is a description of a foundation model and its capabilities (e.g. versions, parameter options etc.). A deployment is a connectivity detail of AI Core (has a URL, is linked to a configuration, scenario, resource group).

So far it just so happens that only one of the two is needed for the basic OpenAI related functionality, since we currently only care about the deployment id. But already for Orchestration the model is part of the payload and not tied to any deployment. Also, a deployment can be changed at runtime to host different models.

Separating the two allows us to evolve the connectivity aspects independently from another. For example, we could (and probably should in some way) re-use the model constants in the Orchestration service API, assisting users with which models are available.

Still, it should be as convenient as possible for the user. What I liked about the current approach is that it nudges users into using the convenience functionality, and away from hard-coding GUIDs into their application code. Maybe this is also possible with other approaches. I am curious what else we will come up with 😉

Comment on lines 55 to 58
const deployment = deploymentList[0];
const result = new AiDeployment(deployment.id)
result.scenarioId = deployment.scenarioId;
return result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we used an interface instead of a class for AiDeployment this could be simplified:

Suggested change
const deployment = deploymentList[0];
const result = new AiDeployment(deployment.id)
result.scenarioId = deployment.scenarioId;
return result;
const { id, scenarioId } = deploymentList[0];
return { id, scenarioId };

const mergedRequestConfig = {
...mergeWithDefaultRequestConfig(apiVersion, requestConfig),
data: JSON.stringify(body)
};

const targetUrl =
aiCoreDestination.url +
'/v2/inference/deployments/' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[q] Can we not pass /v2/inference/deployments/ part of the url also from the openai/orchestration chatCompletion method?
For the apiVersion, the check params: apiVersion ? { 'api-version': apiVersion } : {} should ensure that its passed only when needed? EndpointOptions is internal anyway, so we have control over when this needs to be passed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[q] Can we not pass /v2/inference/deployments/ part of the url also from the openai/orchestration chatCompletion method?

we could, but to what benefit? The assumption of this method is that it will always serve inference requests, otherwise the EndpointOptions doesn't make sense anymore.

We could consider having two methods, something like: executeRequest(path, payload) and executeInferenceRequest(EndpointOptions, payload).

For the apiVersion, the check params: apiVersion ? { 'api-version': apiVersion } : {} should ensure that its passed only when needed? EndpointOptions is internal anyway, so we have control over when this needs to be passed?

Here IDK what you mean..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not originally intended to serve only inference requests, atleast at the time we considered only llm access requests and not ai-core at all. Hence hardcoded the inference part. This was meant as a wrapper around the generic http-client from Cloud SDK for all requests (unless the req are completely different). In that direction, my thought was to take the /v2/inference/deployments/ part away from this function and pass it instead from the respective function as /v2/inference/deployments/completion. Then the path is merged to the baseUrl fetched from the destination to create the final url.

The request config merge code feel duplicated to me atm with the exception of apiVersion. This is optional in EndpointOptions anyway and is not set by the user. I assumed with this check params: apiVersion ? { 'api-version': apiVersion } : {} we can ensure this is set only when needed.

Anyway, I didn't review the whole PR so I might be missing context. If there are reasons why this function cannot stay generic, then its fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, got it. Yes, I think we can make this generic, there is no particular reason for the current design. In this PR, I just went with what I felt makes adding the deployment ID lookup and testing easier, but this wasn't the main focus of this PR of course.

I would suggest we extract this entire change out of this PR and do it as a follow up to #61 / with the fix to the execute method.

I'll probably anyway have to reset this PR later on, removing anything that is not explicitly the constants + deployment ID lookup 😄

);
return response.data;
}

mergeRequestConfig(requestConfig?: CustomRequestConfig): HttpRequestConfig {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: fix this to be a deep merge instead

const mergedRequestConfig = {
...mergeWithDefaultRequestConfig(apiVersion, requestConfig),
data: JSON.stringify(body)
};

const targetUrl =
aiCoreDestination.url +
'/v2/inference/deployments/' +
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[q] Can we not pass /v2/inference/deployments/ part of the url also from the openai/orchestration chatCompletion method?

we could, but to what benefit? The assumption of this method is that it will always serve inference requests, otherwise the EndpointOptions doesn't make sense anymore.

We could consider having two methods, something like: executeRequest(path, payload) and executeInferenceRequest(EndpointOptions, payload).

For the apiVersion, the check params: apiVersion ? { 'api-version': apiVersion } : {} should ensure that its passed only when needed? EndpointOptions is internal anyway, so we have control over when this needs to be passed?

Here IDK what you mean..

@MatKuhr
Copy link
Member Author

MatKuhr commented Aug 18, 2024

moved to #75

@MatKuhr MatKuhr closed this Aug 18, 2024
@MatKuhr MatKuhr deleted the feat/model-constants branch August 18, 2024 22:49
@MatKuhr MatKuhr mentioned this pull request Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants