[frontend] Kubeflow does N queries for in the "Runs" page #11346

asaff1 · 2024-10-31T16:16:19Z

Environment

How did you deploy Kubeflow Pipelines (KFP)?
Kubeflow pipelines standalone, AWS setup

Steps to reproduce

Open your network panel in dev tools. Navigate to the Runs page and look at your network console. Try to increase page size in the UI, and see many requests.

When navigating to the "Runs" page, kubeflow will send an API call to fetch a list of runs. There are two problems here:

This query returns a total_size field, which does unneeded COUNT(*) query on the whole "run_details" table, the count is not even displayed in the UI.
More important, after the run list is fetched, the UI will do an API call per run (which is a DB query) to get its associated pipeline. This is very slow. Instead, the runs API could simply do an SQL JOIN to get the pipeline info.
This is really slow, even for page size of 10, tested with my medium size RDS instance. When page size is 100, this page will do over 100 SQL queries.

Expected result

To get the runs data with the pipeline info, one query should be enough. Page should load faster.

Materials and Reference

Impacted by this bug? Give it a 👍.

The text was updated successfully, but these errors were encountered:

droctothorpe · 2024-10-31T19:07:10Z

#10797 should resolve this. Make sure that you're on the latest version of KFP.

asaff1 · 2024-10-31T19:48:58Z

@droctothorpe I see.
BTW, same issue for the experiments page - it loads the last 5 runs times the number of experiments shown.
I'm using kubeflow 2.1.0 and not latest 2.3.0 because of regressions that were introduced (probably when argo was upgraded - for example retry doesn't work)

asaff1 · 2024-10-31T20:43:42Z

@droctothorpe I've tried to replace only the image: in the ml-pipeline-ui deployment. I changed from gcr.io/ml-pipeline/frontend:2.1.0 to gcr.io/ml-pipeline/frontend:2.3.0. And still, the issue exists.. I do see the PR in the changelog of 2.3.0, so it is really strange..

droctothorpe · 2024-11-05T14:20:52Z

That's odd. Maybe look at the source code in the chrome console and make sure it includes the code in question. Alternatively, you can try to build and push a new image from the master branch and deploy that. Another option is to run the UI in development mode from your local machine. You can see how to do that here.

asaff1 · 2024-11-05T14:55:13Z

@droctothorpe I believe that the build of 2.3.0 didn't build your PR. Would be great if you can try it yourself, test the gcr.io/ml-pipeline/frontend:2.3.0 image.

rimolive · 2024-11-19T11:52:22Z

@asaff1 Can you confirm your issue is solved if you build the image manually and use in your deployment? If the latest image did not fix this issue, we need to reopen it.

asaff1 · 2024-11-21T14:02:40Z

@rimolive I will try to build and will update. What I can say for sure is that the 2.3.0 image still have the bug. This doesn't match the release notes of the 2.3.0.

github-actions · 2025-01-21T07:41:33Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

droctothorpe · 2025-01-21T14:49:16Z

2.4 was just released, @asaff1. Have you tried against that version of the frontend?

asaff1 · 2025-01-21T15:34:45Z

@droctothorpe As for now, I don't see an available image for frontend version 2.4.0.
docker pull gcr.io/ml-pipeline/frontend:2.4.0 gives an error (not found).
The latest that works is docker pull gcr.io/ml-pipeline/frontend:2.3.0

droctothorpe · 2025-01-21T16:13:54Z

Crud. Guess you'll need to build yourself until that's cut.

HumairAK · 2025-01-21T22:46:33Z

@asaff1 in 2.4 we have moved away from GCR and instead are using GHCR: https://github.com/kubeflow/pipelines/pkgs/container/kfp-frontend

asaff1 · 2025-01-22T08:27:32Z

@HumairAK
I see. I've tried ghcr.io/kubeflow/kfp-frontend:2.4.0, still in the frontend in the "Runs" page it does many queries to get associated pipeline per run. The issue is not solved.

droctothorpe · 2025-01-22T14:54:09Z

Thanks for confirming, @asaff1. I'll take a look at this.

droctothorpe · 2025-01-23T03:32:35Z

@asaff1, I just tested on my local machine. I have 2.3 deployed on a local (colima-backed) k8s cluster. The image is set to gcr.io/ml-pipeline/frontend:2.3.0. It's only making a single request to the backend even though there are 10 distinct runs. Here's a screenshot as evidence:

The non-duplicated request being made is: http://localhost:3000/apis/v2beta1/runs?page_token=&page_size=10&sort_by=created_at%20desc&filter=%257B%2522predicates%2522%253A%255B%257B%2522key%2522%253A%2522storage_state%2522%252C%2522operation%2522%253A%2522NOT_EQUALS%2522%252C%2522string_value%2522%253A%2522ARCHIVED%2522%257D%255D%257D.

droctothorpe · 2025-01-23T03:34:17Z

@droctothorpe I see. BTW, same issue for the experiments page - it loads the last 5 runs times the number of experiments shown. I'm using kubeflow 2.1.0 and not latest 2.3.0 because of regressions that were introduced (probably when argo was upgraded - for example retry doesn't work)

And here's the experiments page only making a single request to the runs API:

droctothorpe · 2025-01-23T03:39:23Z

To rule out the possibility that multiple experiments are the problem, I tested that too. Still just one runs request:

asaff1 · 2025-01-23T09:41:15Z

@droctothorpe Looks different on my end - In your screenshot of the "Runs" page I see the [View pipeline] placeholder, on my 2.3.0 UI it shows the actual pipeline name. Is it possible it is due to a configuration difference passed to the UI deployment, or something different the UI gets from the backend?

I only tried to upgrade the UI image to 2.3.0 (and 2.4.0), I didn't upgrade the API backend (ml-pipeline), and not the whole kubeflow installation. My backend (and the kubeflow release) is still at 2.1.0. I assumed this issue is only a UI bug. (This is what I understood from your fix PR)

droctothorpe · 2025-01-23T11:38:35Z

I recommend checking if you can reproduce the error against a full KFP 2.3.0 install.

asaff1 · 2025-02-03T14:35:04Z

@droctothorpe I cannot fully install 2.3.0, because it had another bugs. (I checked full kubeflow installations from 2.1.0 to 2.4.0 and settled on 2.1.0).
I don't understand why I cannot upgrade the UI alone? as I see in your fix PR, it only modifies the UI.

droctothorpe · 2025-02-03T14:51:41Z

@asaff1 The PR was tested against and contributed as a fix for the current version of the code, not older versions and not environments that mix and match different versions. You can try upgrading just your KFP backend version or diffing the relevant code between 2.1.0 and the tip of master to identify the discrepancy.

asaff1 added area/frontend kind/bug labels Oct 31, 2024

github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jan 21, 2025

github-actions bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[frontend] Kubeflow does N queries for in the "Runs" page #11346

[frontend] Kubeflow does N queries for in the "Runs" page #11346

asaff1 commented Oct 31, 2024

droctothorpe commented Oct 31, 2024

asaff1 commented Oct 31, 2024

asaff1 commented Oct 31, 2024

droctothorpe commented Nov 5, 2024

asaff1 commented Nov 5, 2024

rimolive commented Nov 19, 2024

asaff1 commented Nov 21, 2024

github-actions bot commented Jan 21, 2025

droctothorpe commented Jan 21, 2025

asaff1 commented Jan 21, 2025

droctothorpe commented Jan 21, 2025

HumairAK commented Jan 21, 2025

asaff1 commented Jan 22, 2025

droctothorpe commented Jan 22, 2025

droctothorpe commented Jan 23, 2025

droctothorpe commented Jan 23, 2025

droctothorpe commented Jan 23, 2025

asaff1 commented Jan 23, 2025 •

edited

Loading

droctothorpe commented Jan 23, 2025

asaff1 commented Feb 3, 2025

droctothorpe commented Feb 3, 2025

[frontend] Kubeflow does N queries for in the "Runs" page #11346

[frontend] Kubeflow does N queries for in the "Runs" page #11346

Comments

asaff1 commented Oct 31, 2024

Environment

Steps to reproduce

Expected result

Materials and Reference

droctothorpe commented Oct 31, 2024

asaff1 commented Oct 31, 2024

asaff1 commented Oct 31, 2024

droctothorpe commented Nov 5, 2024

asaff1 commented Nov 5, 2024

rimolive commented Nov 19, 2024

asaff1 commented Nov 21, 2024

github-actions bot commented Jan 21, 2025

droctothorpe commented Jan 21, 2025

asaff1 commented Jan 21, 2025

droctothorpe commented Jan 21, 2025

HumairAK commented Jan 21, 2025

asaff1 commented Jan 22, 2025

droctothorpe commented Jan 22, 2025

droctothorpe commented Jan 23, 2025

droctothorpe commented Jan 23, 2025

droctothorpe commented Jan 23, 2025

asaff1 commented Jan 23, 2025 • edited Loading

droctothorpe commented Jan 23, 2025

asaff1 commented Feb 3, 2025

droctothorpe commented Feb 3, 2025

asaff1 commented Jan 23, 2025 •

edited

Loading