Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Caching Support #395

Open
legraphista opened this issue Aug 6, 2024 · 2 comments
Open

Missing Caching Support #395

legraphista opened this issue Aug 6, 2024 · 2 comments
Labels
api: aiplatform Issues related to the googleapis/nodejs-vertexai API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@legraphista
Copy link

Describe the solution you'd like
Missing caching support equivalent to the Python SDK or Gemini TS SDK

Describe alternatives you've considered
I can create & manage caches with raw requests to the API endpoint, but I cannot use them as the cached_content cannot be passed through the library to the request

@legraphista legraphista added priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Aug 6, 2024
@product-auto-label product-auto-label bot added the api: aiplatform Issues related to the googleapis/nodejs-vertexai API. label Aug 6, 2024
@NimJay
Copy link

NimJay commented Sep 27, 2024

Hi @legraphista, :)
It looks like CachedContent support was added in v1.8.0. But I recommend using the latest version (v1.8.1 as of Sep 26).

@NimJay
Copy link

NimJay commented Sep 30, 2024

How to use Gemini's Context Caching (on Google Cloud's Vertex AI, with TypeScript)

Here's how you'd use Gemini's Context Caching via TypeScript, Google Cloud (Vertex AI), and @google-cloud/vertexai.

1. Evaluate pricing

First, make sure the pricing (of Gemini's Context Caching) and other benefits (e.g., lower latency) makes sense for your use case. For instance, if you're only making about 10 requests a day, it might not be worth the effort/price. There's also a minimum size for your cache (32,769 tokens as of Sep 30, 2024). 1 token ≈ 3.6 characters.

2. Create CachedContent

Create a CachedContent. It has a default life span of 1 hour, so update ttl (time-to-live) to your needs. When you create it, Google Cloud will give your CachedContent a unique name. You'll later (when you need to generate content using ) use that name to reference your CachedContent.

const LLM_NAME = `gemini-1.5-flash-002`; // Make sure your model choice is up-to-date and fits your use case

// Example googleCloudRegion value: "us-central1". More info: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations
export async function createCachedContent(
  googleCloudProjectId: string, googleCloudRegion: string, initialPrompt: string,
): Promise<CachedContent | undefined> {
  const vertexAI = new VertexAI({ project: googleCloudProjectId, location: googleCloudRegion });
  // cachedContent.name is automatically generated at server side and cannot be specified by users
  const cachedContent: CachedContent = {
    displayName: 'My cached content',
    model: `projects/${googleCloudProjectId}/locations/${googleCloudRegion}/publishers/google/models/${LLM_NAME}`,
    systemInstruction: '',
    contents: [{ role: 'user', parts: [{ text: initialPrompt }] }],
    ttl: `${3600 * 3}s`, // 1 hour = 3600s
  };
  const createdCachedContent = await vertexAI.preview.cachedContents.create(cachedContent);
  // createdCachedContent.name will look like projects/123456781249/locations/us-central1/cachedContents/12345678471874431234
  if (!createdCachedContent || !createdCachedContent.name) {
    console.error('Failed to create CachedContent.');
    return;
  }
  console.log({ createdCachedContent });
  return createdCachedContent;
}

3. Get CachedContent name

The name is the unique identifier of your CachedContent. The previous createCachedContent() function returns a CachedContent object containing the name. But if you weren't able to grab the name, you can list your CachedContent objects in your Google Cloud project:

export async function listCachedContents(
  googleCloudProjectId: string, googleCloudRegion: string,
): Promise<CachedContent[] | undefined> {
  const vertexAI = new VertexAI({ project: googleCloudProjectId, location: googleCloudRegion });
  const cachedContentsResponse = await vertexAI.preview.cachedContents.list();
  return cachedContentsResponse.cachedContents; // If 0 CachedContents, this will be undefined
}

4. Reference the CachedContent when generating content

Generating content works the same as it would without context caching, except you now (to use your CachedContent) need to use the getGenerativeModelFromCachedContent() method (instead of getGenerativeModel()) and pass in your CachedContent (at least, the name and model fields).

const LLM_NAME = `gemini-1.5-flash-002`; // Make sure your model choice is up-to-date and fits your use case

const generationConfig = {
  temperature: 0.1, // Lower values = less creative, more predictable, more factually accurate
  topP: 0.1, // Lower values = less creative, more predictable, more factually accurate
  maxOutputTokens: 1000,
};

const safetySettings: SafetySetting[] = [
  {
    category: HarmCategory.HARM_CATEGORY_HARASSMENT,
    threshold: HarmBlockThreshold.BLOCK_NONE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
    threshold: HarmBlockThreshold.BLOCK_NONE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
    threshold: HarmBlockThreshold.BLOCK_NONE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
    threshold: HarmBlockThreshold.BLOCK_NONE,
  },
];

export async function generateContentUsingCachedContent(
  googleCloudProjectId: string, googleCloudRegion: string, cachedContentName: string, prompt: string,
): Promise<void> {
  const vertexAI = new VertexAI({ project: googleCloudProjectId, location: googleCloudRegion });
  const request: GenerateContentRequest = {
    contents: [{ role: 'user', parts: [{ text: prompt }] }],
    generationConfig,
    safetySettings,
  };

  // Create a CachedContent object (at minimum, you need the name and model field)
  const cachedContent = {
    name: cachedContentName,
    model: `projects/${googleCloudProjectId}/locations/${googleCloudRegion}/publishers/google/models/${LLM_NAME}`
  };
  const generativeModel = vertexAI.preview.getGenerativeModelFromCachedContent(cachedContent, { model: LLM_NAME });

  // Make the request to Google Cloud / Vertex AI
  console.log(`\n🤖 Sending message to Gemini:\n${prompt}`);
  const result = await generativeModel.generateContent(request);

  // Parse the response from Google Cloud
  if (result && result.response && result.response.candidates && result.response.candidates[0]) {
    const resultCandidate = result.response.candidates[0];
    if (resultCandidate.content.parts[0].text) {
      console.log(`\n🤖 Gemini's response:\n${resultCandidate.content.parts[0].text}`);
      return;
    }
  } else {
    console.error("Unexpected response format:", result);
  }
}

Resources

Always refer to the official docs at cloud.google.com for Google Cloud related guidance:

  1. Context caching overview
  2. Create a context cache
  3. Use a context cache

etc.

You might find more up-to-date, official Google Cloud samples in:

  1. https://github.com/GoogleCloudPlatform/nodejs-docs-samples
  2. https://github.com/GoogleCloudPlatform/generative-ai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: aiplatform Issues related to the googleapis/nodejs-vertexai API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

2 participants