-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
I've been looking at ways to reduce RA memory use, because for my usecase it's regularly using >80GB! It has to dip into swap and performance tanks.
Right now, rust analyzer always uses GreenNodeBuilder::default()
, which creates a new NodeCache
for every file/text section.
I ran into issues implementing it the "right" way because I couldn't figure out how to get a NodeCache
into the database methods (such as base-db:lib.rs:parse
).
So, to run some quick benchmarks just to see if it's worth it, I implemented cache sharing the "wrong" way (using a global static mutex).
This gave a modest performance increase, and a substantial reduction in peak memory.
I also tried:
- interning green nodes when they are "mutated" with methods such as
splice_children
. This seems promising for continuous running. Cache hitrate for mutations was 45-60% depending on the repo, and didn't cause noticable perf degradation. - Disabling or tweaking
>3 children
NodeCache heuristic. Disabling gave another big win in peak memory, but cost a lot of performance, tweaking is better.
Running rust-analyzer analysis-stats on itself in release mode:
Metric | Baseline | Global Cache | Global Cache Without Heuristic | Global Cache With Heuristic 4 |
---|---|---|---|---|
Peak Memory | 4675.54MB | 4083.73MB (-12.7%) | 3739.20MB (-20.0%) | 3930.96MB (-16%) |
Idle Memory (post LRU clear) | 2908MB | 2869MB (-1.3%) | 2863MB (-1.5%) | 2867MB (-1.4%) |
NodeCache::node hitrate | 44.27% | 72.51% | 83.60% | 76.62% |
NodeCache::token hitrate | 87.40% | 98.38% | 98.38% | 98.38% |
Runtime | 87.37s | 82.05s (-6.1%) | 91.93s (+5.2%) | 81.60s (-6.6%) |
Because duplication rate grows with codebase size, the memory reduction should grow superlinearly.
I chose a larger repo, but not so large to use swap, to avoid skewing the performance metrics.
Stats from a larger repo:
Metric | Baseline | Global Cache | Global Cache Without Heuristic | Global Cache With Heuristic 4 |
---|---|---|---|---|
Peak Memory | 23015.48MB | 20466.32MB (-11.1%) | 17741.96MB (-22.9%) | 19970.84MB (-13.2%) |
Idle Memory (post LRU clear) | 11944MB | 11936MB (-0.1%) | 11927MB (-0.1%) | 11933MB (-0.1%) |
NodeCache::node hitrate | 44.30% | 71.13% | 87.33% | 76.23% |
NodeCache::token hitrate | 89.38% | 99.31% | 99.31% | 99.31% |
Runtime | 148.49s | 145.36s (-2.1%) | 167.40s (+12.7%) | 144.28s (-2.8%) |
I didn't run any benchmarks for continuous editing, but they should benefit even more from this due to the fact they do small re-parses, so cache sharing has more benefit.
Activity
ChayimFriedman2 commentedon Apr 20, 2025
We are well aware the rowan has performance problems, both in terms of speed and in terms of memory usage. There are efforts undergoing to improve this. Currently they are focusing in preconditions (removing mutable syntax trees, which is a lot of work). Thank you for suggesting a potential way to improve. We will evaluate that, but this will have to wait until we are ready to refactor rowan (that is, until all preconditions will be fulfilled).
chlobes commentedon Apr 22, 2025
I don't think this requires changes to rowan. This would be a downstream change in how rust-analyzer uses rowan's existing api.
Veykril commentedon Apr 22, 2025
Note that sharing the cache in itself will slowly leak the nodes / tokens though, that is we won't ever free them anymore from what I can tell.
chlobes commentedon Apr 23, 2025
That's true.
In my benchmarks I added a function that cleaned unused entries from the cache, and found it to be a negligable performance cost.
(weak_count + strong_count == 1 implies the only place it is used is the cache, and we can safely remove it without duplicating nodes)
(alternatively, we can regularly set
cache = NodeCache::default()
and cheaply and simply evict all entries)The cache used <0.1% of total memory either way, but to entirely avoid any gradual memory leaks, we could run this cleanup regularly (maybe alongside salsa lru eviction?).
Alternatively, we could modify the cache itself to either:
NodeCache::node/NodeCache::token
ArcInner
s in theNodeCache
itself, so that the cache is a single source of truth that gets emptied when all copies are droppedbut that really would require a small or big (respectively) change to rowan.
Veykril commentedon Apr 23, 2025
Either way I think we'd happy to accept something like this! It is going to be a while before we switch out rowan and this sounds like a decent win (even if mainly useful for startup memory).