Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: More optimal IterateHierarchyV2 and iterateChildrenV2 [#600] #601

Conversation

andrii-korotkov-verkada
Copy link
Contributor

Closes #600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking isParentOf, which can give O(tree_size * namespace_resources_count) time complexity. The v2 algorithms construct the graph and have O(namespace_resources_count) time complexity. See more details in the linked issues.

@andrii-korotkov-verkada
Copy link
Contributor Author

Testing with ArgoCD argoproj/argo-cd#18972

…j#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>
@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 600-more-optimal-iterate-hierarchy-and-iterate-children branch from 6279c76 to d777c9a Compare July 7, 2024 21:37
@andrii-korotkov-verkada
Copy link
Contributor Author

Looks really good on live cluster. ~300ms instead of almost ~4m for the same application!
Screenshot 2024-07-07 at 4 29 56 PM

@andrii-korotkov-verkada
Copy link
Contributor Author

andrii-korotkov-verkada commented Jul 9, 2024

Here are some perf views of the system collected following argoproj/argo-cd#13534 (comment).

The build is from master on 2024/07/07 including argoproj/argo-cd#18972, argoproj/argo-cd#18694, #601, #603.

Screenshot 2024-07-09 at 3 47 18 PM
Screenshot 2024-07-09 at 3 48 02 PM
Screenshot 2024-07-09 at 3 48 39 PM
Screenshot 2024-07-09 at 4 21 08 PM
Screenshot 2024-07-09 at 9 35 32 PM
Screenshot 2024-07-09 at 9 35 20 PM

pkg/cache/cluster.go Outdated Show resolved Hide resolved
crenshaw-dev and others added 3 commits July 16, 2024 17:16
Signed-off-by: Michael Crenshaw <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
…j#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>
@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 600-more-optimal-iterate-hierarchy-and-iterate-children branch from d777c9a to 335ff88 Compare July 16, 2024 21:24
…hy-and-iterate-children' into iterate-improvements

Signed-off-by: Michael Crenshaw <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 600-more-optimal-iterate-hierarchy-and-iterate-children branch from 335ff88 to 905c87e Compare July 16, 2024 23:39
Copy link

codecov bot commented Jul 16, 2024

Codecov Report

Attention: Patch coverage is 82.79570% with 16 lines in your changes missing coverage. Please review.

Project coverage is 58.38%. Comparing base (fa0e8d6) to head (905c87e).
Report is 3 commits behind head on master.

Files Patch % Lines
pkg/cache/cluster.go 86.11% 6 Missing and 4 partials ⚠️
pkg/cache/resource.go 71.42% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #601      +/-   ##
==========================================
+ Coverage   55.91%   58.38%   +2.46%     
==========================================
  Files          42       42              
  Lines        4900     5008     +108     
==========================================
+ Hits         2740     2924     +184     
+ Misses       1937     1840      -97     
- Partials      223      244      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pkg/cache/cluster.go Outdated Show resolved Hide resolved
…j#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>
@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 600-more-optimal-iterate-hierarchy-and-iterate-children branch from 905c87e to af08910 Compare July 16, 2024 23:47
@crenshaw-dev
Copy link
Member

Could you take a look at this PR? andrii-korotkov-verkada#1

I'll rebase, the force-push messed up the diff.

…hy-and-iterate-children' into iterate-improvements

Signed-off-by: Michael Crenshaw <[email protected]>
@crenshaw-dev
Copy link
Member

Pushed.

@andrii-korotkov-verkada
Copy link
Contributor Author

@crenshaw-dev, looks good! Thanks for improving the code. I've merged the changes.

@crenshaw-dev
Copy link
Member

@andrii-korotkov-verkada thanks! I may have a few more as I continue digging through the code. Bear with me. I plan to stick with it this week until we get it merged, as long as you have time to keep working on it!

@crenshaw-dev
Copy link
Member

More fun: andrii-korotkov-verkada#2

crenshaw-dev and others added 2 commits July 16, 2024 20:29
Signed-off-by: Michael Crenshaw <[email protected]>
discard unneeded copies of child resources as we go
@andrii-korotkov-verkada
Copy link
Contributor Author

Sounds good! Thanks for the help. I'd be pretty active on this.

Signed-off-by: Michael Crenshaw <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
func buildGraph(nsNodes map[kube.ResourceKey]*Resource) (map[kube.ResourceKey][]kube.ResourceKey, map[kube.ResourceKey]map[types.UID]*Resource) {
// Prepare to construct a graph
nodesByUID := make(map[types.UID][]*Resource, len(nsNodes))
nodeByGraphKey := make(map[graphKey]*Resource, len(nsNodes))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need nodeByGraphKey, or could we just use nsNodes, since all the resources should be in the same namespace?

Copy link
Member

@crenshaw-dev crenshaw-dev Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using nsNodes passes gitops-engine unit tests and saves a lot of memory, but I'm skeptical. I can put up a PoC tomorrow to analyze.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodeByGraphKey is for efficient node lookup during uid backfill. I don't know if we avoid it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's slightly different comparing to the kube resource key, e.g. uses api version instead of group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the original reasoning for this distinction, but left it for backwards compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah of all the memory optimizations I looked at, this one would worry me the most. But its 25MB -> 17MB, which gets it pretty close to the memory footprint of IterateHierarchy v1.

I'll put up the PoC to look at and run Argo CD unit tests, but let's assume we're sticking with nodeByGraphKey unless we're super satisfied with the idea of switching to nsNodes.

Signed-off-by: Michael Crenshaw <[email protected]>
@crenshaw-dev
Copy link
Member

Probably the last graph building optimization: andrii-korotkov-verkada#3

Signed-off-by: Michael Crenshaw <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
@crenshaw-dev
Copy link
Member

@andrii-korotkov-verkada up for considering this one? andrii-korotkov-verkada#4

I think we do risk missing parent relationships due to Resources lacking the correct namespace field. But so far unit tests, Argo CD e2e tests, and running in an Intuit system looks okay.

I think it would be worth the risk for cutting the memory use in half and saving ~30% execution time.

@andrii-korotkov-verkada
Copy link
Contributor Author

In general it'd make sense to me since all other places use group. I'll merge it.

use nsNodes instead of dupe map
@crenshaw-dev crenshaw-dev changed the title chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [#600] feat: More optimal IterateHierarchyV2 and iterateChildrenV2 [#600] Jul 18, 2024
@crenshaw-dev
Copy link
Member

Test results from Intuit:

Before, IterateHierarchy was taking about 10% of the application controller's time while refreshing apps. Now it takes around 1%. So roughly 6s before, now less than 1s out of a 60s profile.

Heap use has gone up by about 2x, but it wasn't memory-heavy before, and it hasn't really increased in a problematic way.

Steady-state CPU and memory is basically the same as before.

I used log metrics to measure 95th percentile reconciliation, getting the resource tree, and setting managed resources.

Reconciliation time is about the same, maaaaybe 25% faster. Getting the resource tree is about the same amount of time. Setting the managed resources now takes ~one fifth the time it did before, down to 200ms from 1000ms.

I think the vast majority of that improvement doesn't actually come from the graph pre-build optimization, but instead comes from avoiding iterating over all the managed resources (an optimization which doesn't actually depend on pre-building the graph).

In summary: my test showed no performance regressions, no functionality regressions, and maybe a slight performance improvement related to the new algorithm. I suspect the relatively small improvement is because we have relatively few resources per-namespace. Bigger namespaces will have a bigger CPU win and (probably) a higher memory cost.

@crenshaw-dev crenshaw-dev merged commit 6b2984e into argoproj:master Jul 18, 2024
2 checks passed
@andrii-korotkov-verkada
Copy link
Contributor Author

Thanks for all the testing and support! Yeah, I guess we just have quite large default namespace (which isn't best practice, but here we are), hence I saw larger wins. Longest running processor in the largest cluster went down from 30-60min to 1-2min.

mpelekh pushed a commit to mpelekh/gitops-engine that referenced this pull request Jul 26, 2024
…#600] (argoproj#601)

* chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>

* improvements to graph building

Signed-off-by: Michael Crenshaw <[email protected]>

* use old name

Signed-off-by: Michael Crenshaw <[email protected]>

* chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>

* finish merge

Signed-off-by: Michael Crenshaw <[email protected]>

* chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>

* discard unneeded copies of child resources as we go

Signed-off-by: Michael Crenshaw <[email protected]>

* remove unnecessary comment

Signed-off-by: Michael Crenshaw <[email protected]>

* make childrenByUID sparse

Signed-off-by: Michael Crenshaw <[email protected]>

* eliminate duplicate map

Signed-off-by: Michael Crenshaw <[email protected]>

* fix comment

Signed-off-by: Michael Crenshaw <[email protected]>

* add useful comment back

Signed-off-by: Michael Crenshaw <[email protected]>

* use nsNodes instead of dupe map

Signed-off-by: Michael Crenshaw <[email protected]>

* remove unused struct

Signed-off-by: Michael Crenshaw <[email protected]>

* skip invalid APIVersion

Signed-off-by: Michael Crenshaw <[email protected]>

---------

Signed-off-by: Andrii Korotkov <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
Co-authored-by: Michael Crenshaw <[email protected]>
mpelekh pushed a commit to mpelekh/gitops-engine that referenced this pull request Aug 1, 2024
…#600] (argoproj#601)

* chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>

* improvements to graph building

Signed-off-by: Michael Crenshaw <[email protected]>

* use old name

Signed-off-by: Michael Crenshaw <[email protected]>

* chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>

* finish merge

Signed-off-by: Michael Crenshaw <[email protected]>

* chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>

* discard unneeded copies of child resources as we go

Signed-off-by: Michael Crenshaw <[email protected]>

* remove unnecessary comment

Signed-off-by: Michael Crenshaw <[email protected]>

* make childrenByUID sparse

Signed-off-by: Michael Crenshaw <[email protected]>

* eliminate duplicate map

Signed-off-by: Michael Crenshaw <[email protected]>

* fix comment

Signed-off-by: Michael Crenshaw <[email protected]>

* add useful comment back

Signed-off-by: Michael Crenshaw <[email protected]>

* use nsNodes instead of dupe map

Signed-off-by: Michael Crenshaw <[email protected]>

* remove unused struct

Signed-off-by: Michael Crenshaw <[email protected]>

* skip invalid APIVersion

Signed-off-by: Michael Crenshaw <[email protected]>

---------

Signed-off-by: Andrii Korotkov <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
Co-authored-by: Michael Crenshaw <[email protected]>
mpelekh pushed a commit to mpelekh/gitops-engine that referenced this pull request Aug 27, 2024
…#600] (argoproj#601)

* chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>

* improvements to graph building

Signed-off-by: Michael Crenshaw <[email protected]>

* use old name

Signed-off-by: Michael Crenshaw <[email protected]>

* chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>

* finish merge

Signed-off-by: Michael Crenshaw <[email protected]>

* chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>

* discard unneeded copies of child resources as we go

Signed-off-by: Michael Crenshaw <[email protected]>

* remove unnecessary comment

Signed-off-by: Michael Crenshaw <[email protected]>

* make childrenByUID sparse

Signed-off-by: Michael Crenshaw <[email protected]>

* eliminate duplicate map

Signed-off-by: Michael Crenshaw <[email protected]>

* fix comment

Signed-off-by: Michael Crenshaw <[email protected]>

* add useful comment back

Signed-off-by: Michael Crenshaw <[email protected]>

* use nsNodes instead of dupe map

Signed-off-by: Michael Crenshaw <[email protected]>

* remove unused struct

Signed-off-by: Michael Crenshaw <[email protected]>

* skip invalid APIVersion

Signed-off-by: Michael Crenshaw <[email protected]>

---------

Signed-off-by: Andrii Korotkov <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
Co-authored-by: Michael Crenshaw <[email protected]>
mpelekh pushed a commit to mpelekh/gitops-engine that referenced this pull request Sep 10, 2024
…#600] (argoproj#601)

* chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>

* improvements to graph building

Signed-off-by: Michael Crenshaw <[email protected]>

* use old name

Signed-off-by: Michael Crenshaw <[email protected]>

* chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>

* finish merge

Signed-off-by: Michael Crenshaw <[email protected]>

* chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600]

Closes argoproj#600

The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues.

Signed-off-by: Andrii Korotkov <[email protected]>

* discard unneeded copies of child resources as we go

Signed-off-by: Michael Crenshaw <[email protected]>

* remove unnecessary comment

Signed-off-by: Michael Crenshaw <[email protected]>

* make childrenByUID sparse

Signed-off-by: Michael Crenshaw <[email protected]>

* eliminate duplicate map

Signed-off-by: Michael Crenshaw <[email protected]>

* fix comment

Signed-off-by: Michael Crenshaw <[email protected]>

* add useful comment back

Signed-off-by: Michael Crenshaw <[email protected]>

* use nsNodes instead of dupe map

Signed-off-by: Michael Crenshaw <[email protected]>

* remove unused struct

Signed-off-by: Michael Crenshaw <[email protected]>

* skip invalid APIVersion

Signed-off-by: Michael Crenshaw <[email protected]>

---------

Signed-off-by: Andrii Korotkov <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
Co-authored-by: Michael Crenshaw <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement more optimal versions of IterateHierarchy and iterateChildren based on nodes graph
2 participants