Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatPlugin response time too slow #1234

Open
nurkmez2 opened this issue Dec 12, 2024 · 1 comment
Open

ChatPlugin response time too slow #1234

nurkmez2 opened this issue Dec 12, 2024 · 1 comment

Comments

@nurkmez2
Copy link

nurkmez2 commented Dec 12, 2024

Describe the bug
The following functions in https://github.com/microsoft/chat-copilot/blob/main/webapi/Plugins/Chat/ChatPlugin.cs take much time
model : GPT-4o in Azure

- GetAudienceAsync = 22965 ms

https://github.com/microsoft/chat-copilot/blob/main/webapi/Plugins/Chat/ChatPlugin.cs#L363

- ExtractChatHistory= 9623

https://github.com/microsoft/chat-copilot/blob/main/webapi/Plugins/Chat/ChatPlugin.cs#L111

- GetUserIntentAsync = 10615

https://github.com/microsoft/chat-copilot/blob/main/webapi/Plugins/Chat/ChatPlugin.cs#L406

To Reproduce
Steps to reproduce the behavior:
Run web API app and ask a question with a context

Expected behavior
Faster response time

Screenshots
If applicable, add screenshots to help explain your problem.

Platform

  • Windows
  • Visual Studio, VS Code
  • Language: C#, JS
  • Source: [e.g. latest version

Additional context

  • What can be done to improve response time of those ?
  • How can ExtractChatHistory & GetAudienceAsync &GetUserIntentAsync function more effectively with Semantic Kernel?
@imsharukh1994
Copy link
Contributor

  1. Asynchronous Programming:
    Ensure all functions are asynchronous to avoid blocking the main thread.

  2. Cache Results:
    Cache frequently accessed data like audience information and chat history to avoid redundant calculations.

Use in-memory caching (e.g., MemoryCache or Redis) for storing repeated data.
3. Optimize API Calls:
For functions making API calls to GPT-4, use parallel requests to reduce wait time.

Use batching for multiple requests or streaming responses for large results.
4. Database Optimization:
For ExtractChatHistory:

Index your database on frequently queried fields (e.g., user_id, timestamp).
Use pagination to fetch only the necessary data instead of the entire history.
5. Pre-process Input Locally:
For GetUserIntentAsync:

Clean and pre-process the input before sending it to GPT-4.
Cache common intents to avoid recalculating for repeated queries.

`public async Task GetUserIntentAsync(string input)
{
// Cache common intents
var cachedIntent = Cache.Get($"intent_{input}");
if (cachedIntent != null) return cachedIntent;

// Use Semantic Kernel or GPT-4 with asynchronous processing
var intent = await GetIntentFromAPIAsync(input);

// Cache result
Cache.Set($"intent_{input}", intent);

return intent;

}

public async Task GetAudienceAsync()
{
// Parallelize API calls if needed
var tasks = new List { Task.Run(() => FetchAudienceData()) };
await Task.WhenAll(tasks);
return tasks[0].Result;
}
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants