Skip to content

Conversation

@jparise
Copy link
Member

@jparise jparise commented Apr 2, 2025

When the set of ignored names doesn't use shell-style wildcards, we can use the faster frozenset.__contains__ base implementation rather than the more expensive iteration-based fnmatch'ing approach. This is true for the default ignore list, and I expect its the more common case by far even for those who add their own ignored names.

In a simple local benchmark, this results in a 2x speed improvement for that common (default) path, which I think justifies the small additional complexity.

When the set of ignored names doesn't use shell-style wildcards, we can
use the faster `frozenset.__contains__` base implementation rather than
the more expensive iteration-based fnmatch'ing approach. This is true
for the default ignore list, and I expect its the more common case by
far even for those who add their own ignored names.

In a simple local benchmark, this results in a 2x speed improvement for
that common (default) path, which I think justifies the small additional
complexity.
@jparise jparise requested a review from sigmavirus24 April 2, 2025 15:35
@jparise jparise force-pushed the nameset-fast-path branch from a5d04e8 to 8a091f8 Compare April 2, 2025 15:37
@jparise jparise force-pushed the nameset-fast-path branch from 8a091f8 to 53491e1 Compare April 2, 2025 15:37

def __new__(cls, iterable: Iterable[str]):
obj = super().__new__(cls, iterable)
obj._fnmatch = any(c in r"*?[" for name in iterable for c in name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it faster to loop over the characters in the name or the 3 characters in the raw string which doesn't strictly need to be raw?

Also would it be faster still to turn a name into a frozen set and look for the intersection of that with the frozen set of those three characters?

Copy link
Member Author

@jparise jparise Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually fastest (by 2-3x) to check one wild character at a time, presumable because we hit a memchr or similar fast path internally:

any("*" in s or "?" in s or "[" in s for s in iterable)

@jparise jparise merged commit 95db1bc into PyCQA:main Apr 3, 2025
6 checks passed
@jparise jparise deleted the nameset-fast-path branch April 3, 2025 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants