Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(DiffFlameGraph): Fix the "Explain Flame Graph" (AI) feature #129

Merged
merged 6 commits into from
Aug 26, 2024

Conversation

grafakus
Copy link
Contributor

@grafakus grafakus commented Aug 23, 2024

✨ Description

Related issue(s): is caused by #119

This PR aims at fixing the "Explain Flame Graph" feature on the comparison view. Namely, before this PR:

  • the isDiff prop was not passed to the AiPanel component, leading to an incorrect AI analysis based solely on a single profile
  • obtaining the queries required to fetch the profiles in DOT format relied on parseLeftRightUrlSearchParams which worked only for the legacy pages

📖 Summary of the changes

To fix the issues, this PR introduces the new SceneAiPanel (based on the non-Scenes AiPanel), used both in SceneFlameGraph and SceneDiffFlameGraph.

See the diff tab for specific comments.

🧪 How to test?

Manually after checking this PR branch in local:

  • The analysis for a single flamegraph should work as before
  • The analysis for a diff flamegraph should work

In both case, the API requests made by the browser should have the correct query(-ies) and the correct time range(s). In the case of a single flame graph, the time range must be the last one used to fetch the main timeseries. And in the case of a diff flame graph, they must be equal to the ranges selected by the user on the baseline & comparison timeseries.

@grafakus grafakus requested review from bryanhuhta and ifrost August 23, 2024 19:33
@github-actions github-actions bot added the fix label Aug 23, 2024
Copy link
Contributor

github-actions bot commented Aug 23, 2024

Unit test coverage

Lines Statements Branches Functions
Coverage: 10%
10.7% (464/4333) 8.26% (134/1622) 8.03% (107/1332)

@grafakus
Copy link
Contributor Author

As a note for the next weeks: we need to start adding end-to-end tests to capture these kind of regressions earlier.

}

return (
<Button
Copy link
Contributor Author

@grafakus grafakus Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the opportunity to remove the "new" badge:
image

The feature has already a few months now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this move.

@@ -0,0 +1,53 @@
import { css } from '@emotion/css';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This component, as well as other files have been copied from the "shared" folder and customized for the Scenes app. Once we remove the code for the legacy Comparison pages (which is still present), we'll clean up everything.

// eslint-disable-next-line @tanstack/query/exhaustive-deps
queryKey: [
'diff-profile',
dataSourceUid,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised I forgot to add it to the key in the previous PR 🤦🏾‍♂️

const lastTimeRange = newDataState.data.timeRange;

// For the "Function Details" feature only
timelineAndProfileApiClient.setLastTimeRange(lastTimeRange);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also migrate the GitHub integration to Scenes at some point in the future.

@grafakus grafakus changed the title fix(Ai): Fix "Explain Flame Graph" feature fix(DiffFlameGraph): Fix the "Explain Flame Graph" (AI) feature Aug 23, 2024
/>
)}
</Panel>

{sidePanel.isOpen('ai') && <AiPanel className={styles.sidePanel} onClose={sidePanel.close} />}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The missing isDiff prop caused the feature to be broken.

@@ -88,6 +96,13 @@ export class SceneDiffFlameGraph extends SceneObjectBase<SceneDiffFlameGraphStat
profile: profile as FlamebearerProfile,
settings,
fetchSettingsError,
ai: {
panel: aiPanel,
fetchParams: [
Copy link
Contributor Author

@grafakus grafakus Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now pass the correct parameters (queries and time ranges).

isLoading: isFetchingProfileData,
isFetchingProfileData,
hasProfileData,
profileData,
settings,
ai: {
panel: aiPanel,
fetchParams: [{ query, timeRange: lastTimeRange }],
Copy link
Contributor Author

@grafakus grafakus Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now easier to understand where the last time range is used and what for (vs calling timelineAndProfileApiClient.setLastTimeRange() for the GitHub integration feature).


const query = useBuildPyroscopeQuery(model, key as string);

useEffect(() => {
Copy link
Contributor Author

@grafakus grafakus Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By removing this code and the one below in useToggleSidePanel we are getting rid of the query URL search parameter.

bryanhuhta
bryanhuhta previously approved these changes Aug 23, 2024
Copy link
Contributor

@bryanhuhta bryanhuhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it a variety of ways locally and everything seems to work 👍

}

return (
<Button
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this move.


import { OpenAiReply } from '../domain/useOpenAiChatCompletions';

// yeah, I know...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😁

// hack to force UrlSyncManager to handle a new location
// this will sync the state from the URL by calling updateFromUrl() on all the time ranges (`SceneTimeRange` and our custom `SceneTimeRangeWithAnnotations`) that are defined on `SceneComparePanel`
// if not, landing on this view will result in empty URL search parameters (to/from and diffTo/diffFrom) which will make shareable links useless
locationService.partial({}, true); // replace to avoid creating history items
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crazy side effect of removing https://github.com/grafana/explore-profiles/pull/129/files#diff-5db2774cbc84b771946755dd3cc399bf32509129f5bf13d2177950b07f3d6119L58, the URL search parameters for all the time ranges were wiped.

Does anybody know a better solution (idiomatic to Scenes)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed because time range is added dynamically after the URL is synced? Maybe we could use UrlSyncManager.handleNewLocation but we need to upgrade scenes to latest version (which we should do anyway but not part of this PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed because time range is added dynamically after the URL is synced?

Yes

@ifrost
Copy link
Contributor

ifrost commented Aug 26, 2024

Played with a bit and it looks good, I'll have a look at the code in a bit. A few comments:


suggestion: Align max-nodes used for analysis with user settings

We use hardcoded max-nodes=100 for the analysis. It may yield more (or less) detailed analysis than what's visible in the graph based on user settings. Would it make sense to use user settings and cap it at 100 if uses uses more + some info about how many nodes are being processed?


suggestion: Refresh or close analysis panel when query changes

It seems like we don't close or refresh the analysis when the profile type changes (only when service changes) which may lead to stale results being shown. Would it make sense to hide the analysis or refresh it automatically when query changes (profile type or filters)?

Copy link
Contributor

@ifrost ifrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocking from my end but I think we should create follow-ups on code duplication, scene updates and todos

Profile in DOT format:
${profiles[0]}
`,
anton: (profileType: string, profiles: string[]) => `
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Where is it used? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, I'll proceed with cleanup in the near future when I remove the legacy comparison views code


import { buildPrompts, model } from './buildLlmPrompts';

// taken from "@grafana/experimental"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Could we import it directly from @grafana/experimental?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type is not exported.


import { DataSourceProxyClient } from '../../../infrastructure/series/http/DataSourceProxyClient';

// dot format returns string (TODO: json format later)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: What's the process for tackling code TODOs? Should we create follow-up tasks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It all depends.

For instance, this one has been tackled in this PR.

But generally speaking, because we gave more importance to speed and flexibility, we were not strict. So far, I've used private reminders to do some code maintenance whenever I could. I now believe we should have some kind of process. Let's talk!

@@ -0,0 +1,153 @@
import { llms } from '@grafana/experimental';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: There's load of code duplication between SceneAiPanel/domain/useOpenAiChatCompletions.ts and AiPanel/domain/useOpenAiChatCompletions.ts. Do we need both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AiPanel/domain/useOpenAiChatCompletions.ts is obsolete, so I'll remove it ASAP

// hack to force UrlSyncManager to handle a new location
// this will sync the state from the URL by calling updateFromUrl() on all the time ranges (`SceneTimeRange` and our custom `SceneTimeRangeWithAnnotations`) that are defined on `SceneComparePanel`
// if not, landing on this view will result in empty URL search parameters (to/from and diffTo/diffFrom) which will make shareable links useless
locationService.partial({}, true); // replace to avoid creating history items
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed because time range is added dynamically after the URL is synced? Maybe we could use UrlSyncManager.handleNewLocation but we need to upgrade scenes to latest version (which we should do anyway but not part of this PR)

@grafakus
Copy link
Contributor Author

Thank you for the suggestions!

suggestion: Align max-nodes used for analysis with user settings

We use hardcoded max-nodes=100 for the analysis. It may yield more (or less) detailed analysis than what's visible in the graph based on user settings. Would it make sense to use user settings and cap it at 100 if uses uses more + some info about how many nodes are being processed?

I believe we capped it to 100:

  • to limit the number of OpenAI tokens
  • because it didn't provide better results with higher values

@aleks-p may is it correct?

suggestion: Refresh or close analysis panel when query changes

It seems like we don't close or refresh the analysis when the profile type changes (only when service changes) which may lead to stale results being shown. Would it make sense to hide the analysis or refresh it automatically when query changes (profile type or filters)?

We do: https://github.com/grafana/explore-profiles/pull/129/files#diff-950b99b789b4fc4787eb1cf83aa00347e27cd96fbb1dcd6c9ef3bd839a45ba3bR147

@aleks-p
Copy link
Contributor

aleks-p commented Aug 26, 2024

I believe we capped it to 100:

  • to limit the number of OpenAI tokens
  • because it didn't provide better results with higher values

@aleks-p may is it correct?

Yes, though I think the question is more about starting with the user's max nodes and capping that at 100. This would have an impact if the user is setting max nodes to less than 100 which is fairly unlikely but possible. In this case the AI can "talk" about nodes that the user can't see.

The issue is a bit more subtle though. The backend determines 2 maxNodes values when making a request for a DOT profile. One is used to retrieve an intermediate flame graph and a second one to convert that flame graph to a text (DOT) report (see this).

TLDR: doing something like min(maxNodes, 100) could make sense for cases where users configure low values (which I don't think they do). The mapping between flame graph and DOT report nodes is not 1:1 though so there could always be cases where the AI reports things that are not visible to the user.

@grafakus grafakus merged commit a40c02b into main Aug 26, 2024
5 of 6 checks passed
@grafakus grafakus deleted the fix/ai-feature branch August 26, 2024 12:59
@grafakus
Copy link
Contributor Author

grafakus commented Aug 26, 2024

@aleks-p may is it correct?

Yes, though I think the question is more about starting with the user's max nodes and capping that at 100. This would have an impact if the user is setting max nodes to less than 100 which is fairly unlikely but possible. In this case the AI can "talk" about nodes that the user can't see.

The issue is a bit more subtle though. The backend determines 2 maxNodes values when making a request for a DOT profile. One is used to retrieve an intermediate flame graph and a second one to convert that flame graph to a text (DOT) report (see this).

TLDR: doing something like min(maxNodes, 100) could make sense for cases where users configure low values (which I don't think they do). The mapping between flame graph and DOT report nodes is not 1:1 though so there could always be cases where the AI reports things that are not visible to the user.

Thank you for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants