Skip to content

gh-123471: Make itertools.chain thread-safe #135689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Jun 18, 2025

The itertools.chain has two attributes that are mutated duration iteration, making it non-thread safe.

Options to make it thread safe

i) Use a lock (simple, but adds a performance penalty for the single-threaded case)
ii) Make the iterator thread safe using atomics.

For the second option we can stop clearing the lz->source on exhaustion (this is a standard mitigation to avoid setting lz->source to zero (and doing a decref) while another thread is still using lz->source). To signal the iterator that lz->source is exhausted we can either add a new attribute, or replace the while (lz->source != NULL) with while (true). This creates a slower path the the iterator is exhausted, but typically once the iterator is exhausted performance is not relevant any longer. The lz->active is more problematic. Once lz->active is exhausted, we need to update this with a new one (e.g. PyIter_Next(lz->source). But lz->active only has a single reference count. So updating in one thread means having to decref the old lz->active. Possible mitigations: a) atomically load-and-incref the lz->active in chain_next b) add a new attribute that contains all the exhausted values of lz->active and clear this once the chainobject itself is deallocated.

The second option is complex and has some performance issues (e.g. more incref/decrefs, keeping references to exhausted iterators longer in memory), so in this PR we pick the first option.

@rhettinger rhettinger removed their request for review June 18, 2025 20:58
@rhettinger
Copy link
Contributor

rhettinger commented Jun 18, 2025

Options to make it thread safe

i) Use a lock (simple, but adds a performance penalty for the single-threaded case)
ii) Make the iterator thread safe using atomics.

Can you use an #ifdef to have different code paths for the gil/nogil builds? Ideally, the nogil case should be kept in its current form, clean, fast, and highly optimized.

ISTM that most of the "make thread-safe" patches going in everywhere from heapq to itertools adds overhead to the builds that don't get any benefit from it. The code is getting harder to read and maintain. AFAICT most of these edits are going in without tests so it hard to tell whether they are achieving their goal. Also there seem to be differing notions of what "thread-safe" means, from just not-segfaulting to strict guarantees that underlying iterators are non called concurrently.

Also, have you been testing for performance impacts on the standard nogil build that everyone uses in production? Almost the entire point of itertools is to add minimal overhead to the underying data producers. The chain_next calls are effectively a very tight loop where adding almost anything degrades the performance that people were seeking when they choose to link various itertools together. In the case of chain, the alternative is a nested for-loop, for iterator in iterators: for value in iterator: .... If the chain itertool can't beat the pure python alternative, it loses part of its raison d'être.

barrier.wait()
while True:
try:
_ = next(it)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the extra assignment to _?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some linters complain if the value is not assigned. I can remove it though, it is not needed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linters tend to complain about a lot of things in the source. I'd just remove it.

@ZeroIntensity
Copy link
Member

Can you use an #ifdef to have different code paths for the gil/nogil builds? Ideally, the nogil case should be kept in its current form, clean, fast, and highly optimized.

The lock is a critical section. It's completely compiled away on the normal build, so there shouldn't be any overhead. If you're worried about the extra nested function call, I'm pretty sure that the compiler will inline it.

Also there seem to be differing notions of what "thread-safe" means, from just not-segfaulting to strict guarantees that underlying iterators are non called concurrently.

Yeah, there's a bit of a double-standard here. Basically:

  • Do people use the function/object concurrently?
    • Yes: It's actually usable in a multithreaded environment.
    • No: It's only memory-safe.

It's not totally clear to me where itertools falls. I'm aware of itertools.cycle being used in multithreaded cases, but maybe not chain?

@eendebakpt
Copy link
Contributor Author

Can you use an #ifdef to have different code paths for the gil/nogil builds? Ideally, the nogil case should be kept in its current form, clean, fast, and highly optimized.

The refactoring with chain_next_lock_held is because we cannot return from a critical section. Rewriting would mean using gotos to exit. This looks like this: main...eendebakpt:chain_next_v2. Because the method_lock_held design is used elsewhere in the codebase I prefer that approach.

ISTM that most of the "make thread-safe" patches going in everywhere from heapq to itertools adds overhead to the builds that don't get any benefit from it. The code is getting harder to read and maintain. AFAICT most of these edits are going in without tests so it hard to tell whether they are achieving their goal. Also there seem to be differing notions of what "thread-safe" means, from just not-segfaulting to strict guarantees that underlying iterators are non called concurrently.

The fact that the code is getting more complex is imo unavoidable with PEP 703. The (conditional) acceptance of the PEP is a trade-off between more complexity and the benefit of FT performance. In this PR the goal is to avoid crashes in the FT build (see #124397). The test from the PR segfaults on my system on current main (probability of the segfauls depends on the number_of_iterations parameter). In the PR title and news entry I have used the term "thread-safe" because this is what is now commonly used. I am open to suggestions for a better formulation, as "thread-safe" often suggests more safety than there really is.

Also, have you been testing for performance impacts on the standard nogil build that everyone uses in production? Almost the entire point of itertools is to add minimal overhead to the underying data producers. The chain_next calls are effectively a very tight loop where adding almost anything degrades the performance that people were seeking when they choose to link various itertools together. In the case of chain, the alternative is a nested for-loop, for iterator in iterators: for value in iterator: .... If the chain itertool can't beat the pure python alternative, it loses part of its raison d'être.

As Peter commented above, there is no performance impact to be expected from this PR. I tested this with the script below and the performance variation is within the noise. In the script I also compared the current C implementation with the pure python implementation from the itertools documentation. The pure python implementation is about 18% slower. Based on the 18% performance difference, I would say the main reason for itertools.chain is to allow for functional programming and cleaner code. Another way of looking at the performance: suppose itertools.chain was coded in Python, would we accept a PR to convert it to C for a 18% performance increase?

Benchmark script
import itertools
import statistics
import time


def generator_chain(*iterables):
    # chain('ABC', 'DEF') → A B C D E F
    for iterable in iterables:
        yield from iterable


class python_chain:
    def __init__(self, *iterables):
        self.iterables = iter(iterables)
        self.active = None

    def __next__(self):
        if self.active is None:
            self.active = iter(next(self.iterables))

        while True:
            try:
                return next(self.active)
            except StopIteration:
                self.active = iter(next(self.iterables))

    def __iter__(self):
        return self


data = [(1, 2, 3)] * 2
N = 10_000

times = [[], [], []]
options = [itertools.chain, generator_chain, python_chain]

for _ in range(60):
    for jj, chain_method in enumerate(options):
        t0 = time.perf_counter()
        for _ in range(N):
            s = 0
            for x in chain_method(*data):
                s += x**2
        times[jj].append(time.perf_counter() - t0)


for jj, chain_method in enumerate(options):
    n = len(times[jj])
    w = sum(times[jj])
    m = max(times[jj]) * n

    v = statistics.median(times[jj]) * n

    if jj==0:
        w0 = w
    ratio = w / w0    
    print(f"{chain_method} mean {w:.3f} / max {m:.3f} / median {v:.3f} ratio {ratio:.3f}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants