feat: microoptimize `_LRUCacheWrapper.call` #683

BobTheBuidler · 2025-08-15T03:35:48Z

This PR just microoptimizes attribute lookups in _LRUCacheWrapper.__call__ to streamline call speed

Checklist

I think the code is well written
Unit tests for the changes exist

Dreamsorcerer · 2025-08-15T17:50:09Z

I'm not convinced this is going to make a noticeable difference. A good first step would be to add codspeed benchmarks as we've done on the aiohttp repo.

BobTheBuidler · 2025-08-16T13:48:34Z

Benchmark or no benchmark, in any case it reduces the number of PyObject_GetAttr calls that the interpreter will make.

For each cache hit, there will be one less PyObject_GetAttr C call made.
For each cache miss, there will be two less.
For each cache miss with a maxsize, there will be 5 less.

Whether or not those changes will be "noticeable" depends on how you define noticeable. But regardless of the definition the interpreter will make less C-API calls with this code, and the user's CPU will not be processing instructions that it has no need to process.

Before I spend any more time here, how do you define "noticeable"? We already know it will make an unknown difference greater than zero. Probably small, of course, but every little bit helps, particularly for this type of middleware-ish library. We really need a benchmark when the reduction in the instructions themselves is measurable and the PR doesn't particularly harm readability?

Dreamsorcerer · 2025-08-17T12:59:37Z

It's generally frowned upon to modify Python code and make it less readable for nanosecond optimisation. If there's no real-world benefit to such optimisation, which can generally be seen in realistic benchmarks (e.g. like some of the small optimisations in aiohttp), then there's likely no point in spending time on this. There are much better approaches that could improve performance by several orders of magnitude more.

These optimisations may also depend on the platform they run on. e.g. If we compiled this code with mypyc, there may be exactly 0 difference in performance.

Dreamsorcerer · 2025-08-17T13:02:47Z

In aiohttp we've had several micro-optimisations that have gone through as they have a significant (e.g. 5% in a tight benchmark) difference to performance measured in codspeed. Likewise, we also rejected several micro-optimisations where the difference was neglible.

BobTheBuidler · 2025-08-17T15:29:34Z

It's generally frowned upon to modify Python code and make it less readable for nanosecond optimisation.

I'd agree, but I don't think any of that is happening here. More thoughts below.

If there's no real-world benefit to such optimisation, which can generally be seen in realistic benchmarks (e.g. like some of the small optimisations in aiohttp), then there's likely no point in spending time on this.

Respectfully, we're spending more time doing this chat than it took me to refactor out all 5 uses of the dot operator!

There are much better approaches that could improve performance by several orders of magnitude more.

Sure, but this is both low-hanging fruit and already implemented. Nothing in this PR or review prevents any of the other approaches from also being implemented.

If we compiled this code with mypyc, there may be exactly 0 difference in performance.

This is where I'll have to disagree, I've merged 10 PRs into the mypyc codebase in just this past 2 weeks.

If you compile both versions of this code with mypyc, which is actually exactly what I'm doing (You can check out the C code on my fork!) and the reason I ended up here in the first place, the difference is even more obvious in the C code. There is zero chance that "calling a C function 5 extra times" and "calling a C function 0 extra times" has "0 difference in performance" when compiled with mypyc. An impossibility, really. A difference is essentially guaranteed, the CPU receives less instructions and goes thru less wasted cycles. The only real question, as you've noted above, is the magnitude.

But anywho, I suppose none of this matters if we don't have an actual performance target in mind. Are you saying the improvement must be > 5% in order to merge a microoptimization PR?

Dreamsorcerer · 2025-08-17T15:41:44Z

This is where I'll have to disagree, if you compile both versions of this code with mypyc, which is actually exactly what I'm doing (You can check out the C code on my fork!) and the reason I ended up here in the first place, the difference is even more obvious in the C code. There is zero chance that "calling a C function 5 extra times" and "calling a C function 0 extra times" has "0 difference in performance" when compiled with mypyc. An impossibility, really. A difference is essentially guaranteed, the CPU receives less instructions and goes thru less wasted cycles. The only real question, as you've noted above, is the magnitude.

This is still dependent on the current state of mypyc though. I suspect the difference you see is because OrderedDict is not native in mypyc today, that could change in a future release. The documentation explicitly says not to do these kind of micro-optimisations: https://mypyc.readthedocs.io/en/latest/performance_tips_and_tricks.html#using-fast-native-features

It's entirely possible that a future release sees this change become a performance degradation. Without meaningful benchmarks to catch any such changes, it doesn't seem worthwhile to me.

BobTheBuidler · 2025-08-17T15:45:13Z

Mypyc is not able to safely optimize away any of the PyObject_GetAttr calls either now or in the future, since that would violate python semantics in that the get method of descriptors would not be called as expected.

You see a difference because one version of the code calls PyObject_GetAttr more times than the other version, for no benefit as we arent accessing descriptors here.

In the event of future optimizations, all instances where we currently have a GetAttr call would become equally faster, but again, it would never become the case that plural calls are faster than a singular call. That's simply not possible.

Again, what number are you looking for in a benchmark? Is the minimum performance impact really 5%?

BobTheBuidler · 2025-08-17T15:49:17Z

that could change in a future release

I know, I'm the one who wrote the PR, its in review already lol

OrderedDict won't necessarily become a fully-fledged native class, but we will be able to use the more specific PyDict_GetItem function from python's C-API, just like we'd currently do for a true instance of builtins.dict.

But that isn't hugely relevant here because when compiled with mypyc these C API calls become C-struct member lookups (but still in that case, singular takes less time than plural)

BobTheBuidler · 2025-08-17T15:53:32Z

The documentation explicitly says not to do these kind of micro-optimisations: https://mypyc.readthedocs.io/en/latest/performance_tips_and_tricks.html#using-fast-native-features

I'm familiar with those docs, but that documentation specifically says not to do early binding, which only applies to methods.

If you notice in that example, the value they are caching is not an object, its a method which is bound to the object in question. That's what they refer to with "binding"

We aren't doing any of that here. We are looking up attributes one time, and then using them repeatedly after that without any extra lookups.

Dreamsorcerer · 2025-08-18T00:13:40Z

early binding, which only applies to methods.

If you follow the link, it says "References to functions, types, most attributes, and methods in the same compilation unit use early binding".

So, my understanding is that if OrderdDict becomes native and the async-lru class is being compiled as native, then what it says still applies and this change would actually decrease performance.

Mypyc is not able to safely optimize away any of the PyObject_GetAttr calls

I was under the understanding that this is exactly what it does for native classes. As it can see the entire class upfront, it can know that no descriptors exist. But, regardless, some functionality is simply not supported with native classes (I've had to opt-out of native classes on occasion for some projects).

BobTheBuidler · 2025-08-18T00:39:15Z

I was under the understanding that this is exactly what it does for native classes.

Yeah this is mostly the case, but it just replaces the C API call with a struct member lookup in each case where a C API call previously existed as noted above

As it can see the entire class upfront, it can know that no descriptors exist.

Though this is actually a really good idea for optimization, I'll try to encode this logic into a mypyc PR. We'd essentially just need to detect whether a particular native class defines its own __get__. But even if/when I implement that, the C code will match the compiled C code of this PR. (as in, the final compiled code in both cases would have one lookup for each attr that is looked-up.)

Do we have a numeric performance improvement target in mind for the benchmark or should I just consider this PR dead? At this point I've asked 3 times what number would convince you and you haven't given me anything to work with. I could always just fork and compile faster-async-lru and call it a day, but the goal was to improve the actual lib that people widely use, and do it in a way that doesn't harm readability.

Dreamsorcerer · 2025-08-18T12:09:56Z

Do we have a numeric performance improvement target in mind for the benchmark or should I just consider this PR dead?

I'd atleast want a regression test to know if some future change breaks the performance improvement here. So, let's say if the improvement is enough to stand out from the noise of a benchmark, then we'll use it. codspeed does some things to eliminate syscalls etc. from benchmarks, but they'll still have a degree of random noise.

BobTheBuidler · 2025-08-23T17:12:02Z

Okay great, thanks. I'll try to put that together this week.

…o call

BobTheBuidler · 2025-08-25T17:07:22Z

Benchmarks added, let me know if this looks like what you had in mind

tests/benchmarks/test_bench_async_lru.py

Dreamsorcerer · 2025-08-25T19:04:49Z

.github/workflows/codspeed.yml

Probably also want this in a separate PR in order to see the change this PR produces..

requirements.txt

Co-authored-by: Sam Bull <[email protected]>

BobTheBuidler · 2025-08-27T14:45:42Z

looks like I was using an invalid pytest.mark.codspeed

I've replaced it with pytest.mark.benchmark but should I just remove the pytestmark line entirely?

Dreamsorcerer · 2025-08-27T17:05:40Z

I've replaced it with pytest.mark.benchmark but should I just remove the pytestmark line entirely?

I'd just look at how it's been done on aiohttp, multidict etc. I've added @bdraco to review, so hopefully he can also provide some guidance.

…o call

BobTheBuidler · 2025-08-27T18:00:48Z

I created a new PR #688 with just the codspeed stuff. Once we merge that I can rebase this PR on top to cleanup the diff.

codecov · 2025-09-27T12:45:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.23%. Comparing base (2913fe2) to head (303f9d5).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #683      +/-   ##
==========================================
+ Coverage   97.22%   97.23%   +0.01%     
==========================================
  Files          12       12              
  Lines         793      796       +3     
  Branches       41       41              
==========================================
+ Hits          771      774       +3     
  Misses         20       20              
  Partials        2        2

Flag	Coverage Δ
unit	`95.85% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codspeed-hq · 2025-09-27T12:47:25Z

CodSpeed Performance Report

Merging #683 will not alter performance

_{Comparing BobTheBuidler:call (303f9d5) with master (2913fe2)}

Summary

✅ 39 untouched

feat: microoptimize _LRUCacheWrapper.__call__

68ca039

BobTheBuidler and others added 3 commits August 23, 2025 13:12

Merge branch 'master' into call

89da95d

feat: benchmarks

8fec92d

Merge branch 'call' of https://github.com/BobTheBuidler/async-lru int…

f7f7a1f

…o call

Dreamsorcerer reviewed Aug 25, 2025

View reviewed changes

tests/benchmarks/test_bench_async_lru.py Outdated Show resolved Hide resolved

Dreamsorcerer reviewed Aug 25, 2025

View reviewed changes

Dreamsorcerer requested a review from bdraco August 25, 2025 19:05

Dreamsorcerer added 2 commits August 25, 2025 20:06

Update requirements-dev.txt

79e7b0b

Update requirements.txt

f849a05

Dreamsorcerer reviewed Aug 25, 2025

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

Dreamsorcerer and others added 5 commits August 25, 2025 20:09

Update requirements.txt

c1b5fe9

Update tests/benchmarks/test_bench_async_lru.py

205652e

Co-authored-by: Sam Bull <[email protected]>

Update test_bench_async_lru.py

c30878e

Merge branch 'master' into call

549a1ce

Update test_bench_async_lru.py

5f8e75d

BobTheBuidler added 3 commits August 27, 2025 17:58

fix attribute err

10fff06

Merge branch 'call' of https://github.com/BobTheBuidler/async-lru int…

b833d0f

…o call

remove unused mark

a4eaf2f

Dreamsorcerer removed the request for review from bdraco August 28, 2025 10:51

BobTheBuidler added 2 commits September 27, 2025 08:42

Merge branch 'master' into call

d9818eb

Delete tests/benchmarks/test_bench_async_lru.py

303f9d5

feat: microoptimize _LRUCacheWrapper.__call__ #683

Are you sure you want to change the base?

feat: microoptimize _LRUCacheWrapper.__call__ #683

Uh oh!

Conversation

BobTheBuidler commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

Dreamsorcerer commented Aug 15, 2025

Uh oh!

BobTheBuidler commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dreamsorcerer commented Aug 17, 2025

Uh oh!

Dreamsorcerer commented Aug 17, 2025

Uh oh!

BobTheBuidler commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dreamsorcerer commented Aug 17, 2025

Uh oh!

BobTheBuidler commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BobTheBuidler commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BobTheBuidler commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dreamsorcerer commented Aug 18, 2025

Uh oh!

BobTheBuidler commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dreamsorcerer commented Aug 18, 2025

Uh oh!

BobTheBuidler commented Aug 23, 2025

Uh oh!

BobTheBuidler commented Aug 25, 2025

Uh oh!

Uh oh!

Dreamsorcerer Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BobTheBuidler commented Aug 27, 2025

Uh oh!

Dreamsorcerer commented Aug 27, 2025

Uh oh!

BobTheBuidler commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #683 will not alter performance

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: microoptimize `_LRUCacheWrapper.call` #683

feat: microoptimize `_LRUCacheWrapper.call` #683

BobTheBuidler commented Aug 15, 2025 •

edited

Loading

BobTheBuidler commented Aug 16, 2025 •

edited

Loading

BobTheBuidler commented Aug 17, 2025 •

edited

Loading

BobTheBuidler commented Aug 17, 2025 •

edited

Loading

BobTheBuidler commented Aug 17, 2025 •

edited

Loading

BobTheBuidler commented Aug 17, 2025 •

edited

Loading

BobTheBuidler commented Aug 18, 2025 •

edited

Loading

BobTheBuidler commented Aug 27, 2025 •

edited

Loading

codecov bot commented Sep 27, 2025 •

edited

Loading

codspeed-hq bot commented Sep 27, 2025 •

edited

Loading