fix(DiffFlameGraph): Fix the "Explain Flame Graph" (AI) feature #129

grafakus · 2024-08-23T19:33:20Z

✨ Description

Related issue(s): is caused by #119

This PR aims at fixing the "Explain Flame Graph" feature on the comparison view. Namely, before this PR:

the isDiff prop was not passed to the AiPanel component, leading to an incorrect AI analysis based solely on a single profile
obtaining the queries required to fetch the profiles in DOT format relied on parseLeftRightUrlSearchParams which worked only for the legacy pages

📖 Summary of the changes

To fix the issues, this PR introduces the new SceneAiPanel (based on the non-Scenes AiPanel), used both in SceneFlameGraph and SceneDiffFlameGraph.

See the diff tab for specific comments.

🧪 How to test?

Manually after checking this PR branch in local:

The analysis for a single flamegraph should work as before
The analysis for a diff flamegraph should work

In both case, the API requests made by the browser should have the correct query(-ies) and the correct time range(s). In the case of a single flame graph, the time range must be the last one used to fetch the main timeseries. And in the case of a diff flame graph, they must be equal to the ranges selected by the user on the baseline & comparison timeseries.

github-actions · 2024-08-23T19:34:36Z

Unit test coverage

Lines	Statements	Branches	Functions
	10.7% (464/4333)	8.26% (134/1622)	8.03% (107/1332)

grafakus · 2024-08-23T19:35:30Z

As a note for the next weeks: we need to start adding end-to-end tests to capture these kind of regressions earlier.

grafakus · 2024-08-23T19:37:20Z

src/pages/ProfilesExplorerView/components/SceneAiPanel/components/AiButton/AIButton.tsx

+  }
+
+  return (
+    <Button


I took the opportunity to remove the "new" badge:

The feature has already a few months now.

I agree with this move.

grafakus · 2024-08-23T19:38:49Z

src/pages/ProfilesExplorerView/components/SceneAiPanel/components/AiButton/AIButton.tsx

@@ -0,0 +1,53 @@
+import { css } from '@emotion/css';


This component, as well as other files have been copied from the "shared" folder and customized for the Scenes app. Once we remove the code for the legacy Comparison pages (which is still present), we'll clean up everything.

grafakus · 2024-08-23T19:41:55Z

...neExploreDiffFlameGraph/components/SceneDiffFlameGraph/infrastructure/useFetchDiffProfile.ts

    // eslint-disable-next-line @tanstack/query/exhaustive-deps
    queryKey: [
      'diff-profile',
+      dataSourceUid,


I realised I forgot to add it to the key in the previous PR 🤦🏾‍♂️

grafakus · 2024-08-23T19:42:28Z

src/pages/ProfilesExplorerView/components/SceneExploreServiceFlameGraph/SceneFlameGraph.tsx

+          const lastTimeRange = newDataState.data.timeRange;
+
+          // For the "Function Details" feature only
+          timelineAndProfileApiClient.setLastTimeRange(lastTimeRange);


We should also migrate the GitHub integration to Scenes at some point in the future.

grafakus · 2024-08-23T19:51:19Z

...components/SceneExploreDiffFlameGraph/components/SceneDiffFlameGraph/SceneDiffFlameGraph.tsx

            />
          )}
        </Panel>

-        {sidePanel.isOpen('ai') && <AiPanel className={styles.sidePanel} onClose={sidePanel.close} />}


The missing isDiff prop caused the feature to be broken.

grafakus · 2024-08-23T19:52:15Z

...components/SceneExploreDiffFlameGraph/components/SceneDiffFlameGraph/SceneDiffFlameGraph.tsx

@@ -88,6 +96,13 @@ export class SceneDiffFlameGraph extends SceneObjectBase<SceneDiffFlameGraphStat
        profile: profile as FlamebearerProfile,
        settings,
        fetchSettingsError,
+        ai: {
+          panel: aiPanel,
+          fetchParams: [


We now pass the correct parameters (queries and time ranges).

grafakus · 2024-08-23T19:56:11Z

src/pages/ProfilesExplorerView/components/SceneExploreServiceFlameGraph/SceneFlameGraph.tsx

        isLoading: isFetchingProfileData,
        isFetchingProfileData,
        hasProfileData,
        profileData,
        settings,
+        ai: {
+          panel: aiPanel,
+          fetchParams: [{ query, timeRange: lastTimeRange }],


It's now easier to understand where the last time range is used and what for (vs calling timelineAndProfileApiClient.setLastTimeRange() for the GitHub integration feature).

grafakus · 2024-08-23T20:07:12Z

src/pages/ProfilesExplorerView/domain/variables/FiltersVariable/FiltersVariable.tsx


    const query = useBuildPyroscopeQuery(model, key as string);

-    useEffect(() => {


By removing this code and the one below in useToggleSidePanel we are getting rid of the query URL search parameter.

…be set on the timeseries data

bryanhuhta

Tested it a variety of ways locally and everything seems to work 👍

bryanhuhta · 2024-08-23T21:52:55Z

src/pages/ProfilesExplorerView/components/SceneAiPanel/components/AiButton/AIButton.tsx

+  }
+
+  return (
+    <Button


I agree with this move.

bryanhuhta · 2024-08-23T21:53:27Z

src/pages/ProfilesExplorerView/components/SceneAiPanel/components/AiReply.tsx

+
+import { OpenAiReply } from '../domain/useOpenAiChatCompletions';
+
+// yeah, I know...


grafakus · 2024-08-23T23:12:24Z

...es/ProfilesExplorerView/components/SceneExploreDiffFlameGraph/SceneExploreDiffFlameGraph.tsx

+    // hack to force UrlSyncManager to handle a new location
+    // this will sync the state from the URL by calling updateFromUrl() on all the time ranges (`SceneTimeRange` and our custom `SceneTimeRangeWithAnnotations`) that are defined on `SceneComparePanel`
+    // if not, landing on this view will result in empty URL search parameters (to/from and diffTo/diffFrom) which will make shareable links useless
+    locationService.partial({}, true); // replace to avoid creating history items


Crazy side effect of removing https://github.com/grafana/explore-profiles/pull/129/files#diff-5db2774cbc84b771946755dd3cc399bf32509129f5bf13d2177950b07f3d6119L58, the URL search parameters for all the time ranges were wiped.

Does anybody know a better solution (idiomatic to Scenes)?

Is it needed because time range is added dynamically after the URL is synced? Maybe we could use UrlSyncManager.handleNewLocation but we need to upgrade scenes to latest version (which we should do anyway but not part of this PR)

Is it needed because time range is added dynamically after the URL is synced?

Yes

ifrost · 2024-08-26T09:29:37Z

Played with a bit and it looks good, I'll have a look at the code in a bit. A few comments:

suggestion: Align max-nodes used for analysis with user settings

We use hardcoded max-nodes=100 for the analysis. It may yield more (or less) detailed analysis than what's visible in the graph based on user settings. Would it make sense to use user settings and cap it at 100 if uses uses more + some info about how many nodes are being processed?

suggestion: Refresh or close analysis panel when query changes

It seems like we don't close or refresh the analysis when the profile type changes (only when service changes) which may lead to stale results being shown. Would it make sense to hide the analysis or refresh it automatically when query changes (profile type or filters)?

ifrost

Nothing blocking from my end but I think we should create follow-ups on code duplication, scene updates and todos

ifrost · 2024-08-26T09:35:11Z

src/pages/ProfilesExplorerView/components/SceneAiPanel/domain/buildLlmPrompts.ts

+    Profile in DOT format:
+    ${profiles[0]}
+`,
+    anton: (profileType: string, profiles: string[]) => `


question: Where is it used? 🤔

It's not, I'll proceed with cleanup in the near future when I remove the legacy comparison views code

ifrost · 2024-08-26T09:42:31Z

src/pages/ProfilesExplorerView/components/SceneAiPanel/domain/useOpenAiChatCompletions.ts

+
+import { buildPrompts, model } from './buildLlmPrompts';
+
+// taken from "@grafana/experimental"


question: Could we import it directly from @grafana/experimental?

This type is not exported.

ifrost · 2024-08-26T09:45:14Z

src/pages/ProfilesExplorerView/components/SceneAiPanel/infrastructure/ProfileApiClient.ts

+
+import { DataSourceProxyClient } from '../../../infrastructure/series/http/DataSourceProxyClient';
+
+// dot format returns string (TODO: json format later)


question: What's the process for tackling code TODOs? Should we create follow-up tasks?

It all depends.

For instance, this one has been tackled in this PR.

But generally speaking, because we gave more importance to speed and flexibility, we were not strict. So far, I've used private reminders to do some code maintenance whenever I could. I now believe we should have some kind of process. Let's talk!

ifrost · 2024-08-26T09:51:09Z

src/pages/ProfilesExplorerView/components/SceneAiPanel/domain/useOpenAiChatCompletions.ts

@@ -0,0 +1,153 @@
+import { llms } from '@grafana/experimental';


question: There's load of code duplication between SceneAiPanel/domain/useOpenAiChatCompletions.ts and AiPanel/domain/useOpenAiChatCompletions.ts. Do we need both?

AiPanel/domain/useOpenAiChatCompletions.ts is obsolete, so I'll remove it ASAP

ifrost · 2024-08-26T10:14:49Z

...es/ProfilesExplorerView/components/SceneExploreDiffFlameGraph/SceneExploreDiffFlameGraph.tsx

+    // hack to force UrlSyncManager to handle a new location
+    // this will sync the state from the URL by calling updateFromUrl() on all the time ranges (`SceneTimeRange` and our custom `SceneTimeRangeWithAnnotations`) that are defined on `SceneComparePanel`
+    // if not, landing on this view will result in empty URL search parameters (to/from and diffTo/diffFrom) which will make shareable links useless
+    locationService.partial({}, true); // replace to avoid creating history items


Is it needed because time range is added dynamically after the URL is synced? Maybe we could use UrlSyncManager.handleNewLocation but we need to upgrade scenes to latest version (which we should do anyway but not part of this PR)

grafakus · 2024-08-26T12:00:35Z

Thank you for the suggestions!

suggestion: Align max-nodes used for analysis with user settings

We use hardcoded max-nodes=100 for the analysis. It may yield more (or less) detailed analysis than what's visible in the graph based on user settings. Would it make sense to use user settings and cap it at 100 if uses uses more + some info about how many nodes are being processed?

I believe we capped it to 100:

to limit the number of OpenAI tokens
because it didn't provide better results with higher values

@aleks-p may is it correct?

suggestion: Refresh or close analysis panel when query changes

It seems like we don't close or refresh the analysis when the profile type changes (only when service changes) which may lead to stale results being shown. Would it make sense to hide the analysis or refresh it automatically when query changes (profile type or filters)?

We do: https://github.com/grafana/explore-profiles/pull/129/files#diff-950b99b789b4fc4787eb1cf83aa00347e27cd96fbb1dcd6c9ef3bd839a45ba3bR147

aleks-p · 2024-08-26T12:38:08Z

I believe we capped it to 100:

to limit the number of OpenAI tokens

because it didn't provide better results with higher values

@aleks-p may is it correct?

Yes, though I think the question is more about starting with the user's max nodes and capping that at 100. This would have an impact if the user is setting max nodes to less than 100 which is fairly unlikely but possible. In this case the AI can "talk" about nodes that the user can't see.

The issue is a bit more subtle though. The backend determines 2 maxNodes values when making a request for a DOT profile. One is used to retrieve an intermediate flame graph and a second one to convert that flame graph to a text (DOT) report (see this).

TLDR: doing something like min(maxNodes, 100) could make sense for cases where users configure low values (which I don't think they do). The mapping between flame graph and DOT report nodes is not 1:1 though so there could always be cases where the AI reports things that are not visible to the user.

grafakus · 2024-08-26T13:17:47Z

@aleks-p may is it correct?

Yes, though I think the question is more about starting with the user's max nodes and capping that at 100. This would have an impact if the user is setting max nodes to less than 100 which is fairly unlikely but possible. In this case the AI can "talk" about nodes that the user can't see.

The issue is a bit more subtle though. The backend determines 2 maxNodes values when making a request for a DOT profile. One is used to retrieve an intermediate flame graph and a second one to convert that flame graph to a text (DOT) report (see this).

TLDR: doing something like min(maxNodes, 100) could make sense for cases where users configure low values (which I don't think they do). The mapping between flame graph and DOT report nodes is not 1:1 though so there could always be cases where the AI reports things that are not visible to the user.

Thank you for the clarification!

grafakus added 2 commits August 23, 2024 20:50

fix(Ai): Fix "Explain Flame Graph" feature

0e11d5e

refactor(SceneAiPanel): Add parameters validation

d686d32

grafakus requested review from bryanhuhta and ifrost August 23, 2024 19:33

github-actions bot added the fix label Aug 23, 2024

grafakus commented Aug 23, 2024

View reviewed changes

grafakus changed the title ~~fix(Ai): Fix "Explain Flame Graph" feature~~ fix(DiffFlameGraph): Fix the "Explain Flame Graph" (AI) feature Aug 23, 2024

grafakus commented Aug 23, 2024

View reviewed changes

chore(SidePanel): Remove unwanted code

b560caf

grafakus commented Aug 23, 2024

View reviewed changes

fix(SceneTimeRangeWithAnnotations): Prevent null annotation range to …

0b10c93

…be set on the timeseries data

bryanhuhta previously approved these changes Aug 23, 2024

View reviewed changes

grafakus added 2 commits August 24, 2024 01:08

fix(SceneTimeRangeWithAnnotations): Force state sync from the URL

2a53689

chore(SceneDiffFlameGraph): Small message change

e839994

grafakus dismissed bryanhuhta’s stale review via e839994 August 23, 2024 23:09

grafakus requested a review from bryanhuhta August 23, 2024 23:09

grafakus commented Aug 23, 2024

View reviewed changes

ifrost approved these changes Aug 26, 2024

View reviewed changes

grafakus mentioned this pull request Aug 26, 2024

feat(FlameGraph): Add missing export menu #132

Merged

grafakus merged commit a40c02b into main Aug 26, 2024
5 of 6 checks passed

grafakus deleted the fix/ai-feature branch August 26, 2024 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(DiffFlameGraph): Fix the "Explain Flame Graph" (AI) feature #129

fix(DiffFlameGraph): Fix the "Explain Flame Graph" (AI) feature #129

grafakus commented Aug 23, 2024 •

edited

Loading

github-actions bot commented Aug 23, 2024 •

edited

Loading

grafakus commented Aug 23, 2024

grafakus Aug 23, 2024 •

edited

Loading

bryanhuhta Aug 23, 2024

grafakus Aug 23, 2024

grafakus Aug 23, 2024

grafakus Aug 23, 2024

grafakus Aug 23, 2024

grafakus Aug 23, 2024 •

edited

Loading

grafakus Aug 23, 2024 •

edited

Loading

grafakus Aug 23, 2024 •

edited

Loading

bryanhuhta left a comment

bryanhuhta Aug 23, 2024

bryanhuhta Aug 23, 2024

grafakus Aug 23, 2024

ifrost Aug 26, 2024

grafakus Aug 26, 2024

ifrost commented Aug 26, 2024

ifrost left a comment

ifrost Aug 26, 2024

grafakus Aug 26, 2024

ifrost Aug 26, 2024

grafakus Aug 26, 2024

ifrost Aug 26, 2024

grafakus Aug 26, 2024

ifrost Aug 26, 2024

grafakus Aug 26, 2024

ifrost Aug 26, 2024

grafakus commented Aug 26, 2024

aleks-p commented Aug 26, 2024 •

edited

Loading

grafakus commented Aug 26, 2024 •

edited

Loading


		const query = useBuildPyroscopeQuery(model, key as string);

		useEffect(() => {


		import { OpenAiReply } from '../domain/useOpenAiChatCompletions';

		// yeah, I know...


		import { buildPrompts, model } from './buildLlmPrompts';

		// taken from "@grafana/experimental"


		import { DataSourceProxyClient } from '../../../infrastructure/series/http/DataSourceProxyClient';

		// dot format returns string (TODO: json format later)

		@@ -0,0 +1,153 @@
		import { llms } from '@grafana/experimental';

fix(DiffFlameGraph): Fix the "Explain Flame Graph" (AI) feature #129

fix(DiffFlameGraph): Fix the "Explain Flame Graph" (AI) feature #129

Conversation

grafakus commented Aug 23, 2024 • edited Loading

✨ Description

📖 Summary of the changes

🧪 How to test?

github-actions bot commented Aug 23, 2024 • edited Loading

Unit test coverage

grafakus commented Aug 23, 2024

grafakus Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grafakus Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

grafakus Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

grafakus Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

bryanhuhta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ifrost commented Aug 26, 2024

ifrost left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grafakus commented Aug 26, 2024

aleks-p commented Aug 26, 2024 • edited Loading

grafakus commented Aug 26, 2024 • edited Loading

grafakus commented Aug 23, 2024 •

edited

Loading

github-actions bot commented Aug 23, 2024 •

edited

Loading

grafakus Aug 23, 2024 •

edited

Loading

grafakus Aug 23, 2024 •

edited

Loading

grafakus Aug 23, 2024 •

edited

Loading

grafakus Aug 23, 2024 •

edited

Loading

aleks-p commented Aug 26, 2024 •

edited

Loading

grafakus commented Aug 26, 2024 •

edited

Loading