gh-123471: Make itertools.chain thread-safe #135689

eendebakpt · 2025-06-18T19:24:32Z

The itertools.chain has two attributes that are mutated duration iteration, making it non-thread safe.

Options to make it thread safe

i) Use a lock (simple, but adds a performance penalty for the single-threaded case)
ii) Make the iterator thread safe using atomics.

For the second option we can stop clearing the lz->source on exhaustion (this is a standard mitigation to avoid setting lz->source to zero (and doing a decref) while another thread is still using lz->source). To signal the iterator that lz->source is exhausted we can either add a new attribute, or replace the while (lz->source != NULL) with while (true). This creates a slower path the the iterator is exhausted, but typically once the iterator is exhausted performance is not relevant any longer. The lz->active is more problematic. Once lz->active is exhausted, we need to update this with a new one (e.g. PyIter_Next(lz->source). But lz->active only has a single reference count. So updating in one thread means having to decref the old lz->active. Possible mitigations: a) atomically load-and-incref the lz->active in chain_next b) add a new attribute that contains all the exhausted values of lz->active and clear this once the chainobject itself is deallocated.

The second option is complex and has some performance issues (e.g. more incref/decrefs, keeping references to exhausted iterators longer in memory), so in this PR we pick the first option.

Issue: Make concurrent iteration over pairwise, combinations, permutations, cwr, product, etc. from itertools safe under free-threading #123471

rhettinger · 2025-06-18T21:10:33Z

Options to make it thread safe

i) Use a lock (simple, but adds a performance penalty for the single-threaded case)
ii) Make the iterator thread safe using atomics.

Can you use an #ifdef to have different code paths for the gil/nogil builds? Ideally, the nogil case should be kept in its current form, clean, fast, and highly optimized.

ISTM that most of the "make thread-safe" patches going in everywhere from heapq to itertools adds overhead to the builds that don't get any benefit from it. The code is getting harder to read and maintain. AFAICT most of these edits are going in without tests so it hard to tell whether they are achieving their goal. Also there seem to be differing notions of what "thread-safe" means, from just not-segfaulting to strict guarantees that underlying iterators are non called concurrently.

Also, have you been testing for performance impacts on the standard nogil build that everyone uses in production? Almost the entire point of itertools is to add minimal overhead to the underying data producers. The chain_next calls are effectively a very tight loop where adding almost anything degrades the performance that people were seeking when they choose to link various itertools together. In the case of chain, the alternative is a nested for-loop, for iterator in iterators: for value in iterator: .... If the chain itertool can't beat the pure python alternative, it loses part of its raison d'être.

Misc/NEWS.d/next/Library/2025-06-18-19-25-32.gh-issue-123471.lx1Xbt.rst

ZeroIntensity · 2025-06-19T13:03:08Z

Lib/test/test_free_threading/test_itertools.py

+            barrier.wait()
+            while True:
+                try:
+                    _ = next(it)


Why the extra assignment to _?

Some linters complain if the value is not assigned. I can remove it though, it is not needed here.

Linters tend to complain about a lot of things in the source. I'd just remove it.

Modules/itertoolsmodule.c

ZeroIntensity · 2025-06-19T13:24:38Z

Can you use an #ifdef to have different code paths for the gil/nogil builds? Ideally, the nogil case should be kept in its current form, clean, fast, and highly optimized.

The lock is a critical section. It's completely compiled away on the normal build, so there shouldn't be any overhead. If you're worried about the extra nested function call, I'm pretty sure that the compiler will inline it.

Also there seem to be differing notions of what "thread-safe" means, from just not-segfaulting to strict guarantees that underlying iterators are non called concurrently.

Yeah, there's a bit of a double-standard here. Basically:

Do people use the function/object concurrently?
- Yes: It's actually usable in a multithreaded environment.
- No: It's only memory-safe.

It's not totally clear to me where itertools falls. I'm aware of itertools.cycle being used in multithreaded cases, but maybe not chain?

Co-authored-by: Peter Bierma <[email protected]>

eendebakpt · 2025-06-19T20:12:09Z

Can you use an #ifdef to have different code paths for the gil/nogil builds? Ideally, the nogil case should be kept in its current form, clean, fast, and highly optimized.

The refactoring with chain_next_lock_held is because we cannot return from a critical section. Rewriting would mean using gotos to exit. This looks like this: main...eendebakpt:chain_next_v2. Because the method_lock_held design is used elsewhere in the codebase I prefer that approach.

ISTM that most of the "make thread-safe" patches going in everywhere from heapq to itertools adds overhead to the builds that don't get any benefit from it. The code is getting harder to read and maintain. AFAICT most of these edits are going in without tests so it hard to tell whether they are achieving their goal. Also there seem to be differing notions of what "thread-safe" means, from just not-segfaulting to strict guarantees that underlying iterators are non called concurrently.

The fact that the code is getting more complex is imo unavoidable with PEP 703. The (conditional) acceptance of the PEP is a trade-off between more complexity and the benefit of FT performance. In this PR the goal is to avoid crashes in the FT build (see #124397). The test from the PR segfaults on my system on current main (probability of the segfauls depends on the number_of_iterations parameter). In the PR title and news entry I have used the term "thread-safe" because this is what is now commonly used. I am open to suggestions for a better formulation, as "thread-safe" often suggests more safety than there really is.

Also, have you been testing for performance impacts on the standard nogil build that everyone uses in production? Almost the entire point of itertools is to add minimal overhead to the underying data producers. The chain_next calls are effectively a very tight loop where adding almost anything degrades the performance that people were seeking when they choose to link various itertools together. In the case of chain, the alternative is a nested for-loop, for iterator in iterators: for value in iterator: .... If the chain itertool can't beat the pure python alternative, it loses part of its raison d'être.

As Peter commented above, there is no performance impact to be expected from this PR. I tested this with the script below and the performance variation is within the noise. In the script I also compared the current C implementation with the pure python implementation from the itertools documentation. The pure python implementation is about 18% slower. Based on the 18% performance difference, I would say the main reason for itertools.chain is to allow for functional programming and cleaner code. Another way of looking at the performance: suppose itertools.chain was coded in Python, would we accept a PR to convert it to C for a 18% performance increase?

Benchmark script

import itertools
import statistics
import time


def generator_chain(*iterables):
    # chain('ABC', 'DEF') → A B C D E F
    for iterable in iterables:
        yield from iterable


class python_chain:
    def __init__(self, *iterables):
        self.iterables = iter(iterables)
        self.active = None

    def __next__(self):
        if self.active is None:
            self.active = iter(next(self.iterables))

        while True:
            try:
                return next(self.active)
            except StopIteration:
                self.active = iter(next(self.iterables))

    def __iter__(self):
        return self


data = [(1, 2, 3)] * 2
N = 10_000

times = [[], [], []]
options = [itertools.chain, generator_chain, python_chain]

for _ in range(60):
    for jj, chain_method in enumerate(options):
        t0 = time.perf_counter()
        for _ in range(N):
            s = 0
            for x in chain_method(*data):
                s += x**2
        times[jj].append(time.perf_counter() - t0)


for jj, chain_method in enumerate(options):
    n = len(times[jj])
    w = sum(times[jj])
    m = max(times[jj]) * n

    v = statistics.median(times[jj]) * n

    if jj==0:
        w0 = w
    ratio = w / w0    
    print(f"{chain_method} mean {w:.3f} / max {m:.3f} / median {v:.3f} ratio {ratio:.3f}")

Make itertools.chain thread-safe

678f29e

eendebakpt requested a review from rhettinger as a code owner June 18, 2025 19:24

bedevere-app bot added the awaiting review label Jun 18, 2025

bedevere-app bot mentioned this pull request Jun 18, 2025

Make concurrent iteration over pairwise, combinations, permutations, cwr, product, etc. from itertools safe under free-threading #123471

Open

📜🤖 Added by blurb_it.

6d7c934

rhettinger removed their request for review June 18, 2025 20:58

ZeroIntensity reviewed Jun 19, 2025

View reviewed changes

eendebakpt and others added 2 commits June 19, 2025 20:57

Apply suggestions from code review

07881ac

Co-authored-by: Peter Bierma <[email protected]>

Merge branch 'main' into chain_next

e19457e

eendebakpt added 2 commits June 20, 2025 11:06

review comment

af6ba4e

Merge branch 'main' into chain_next

6a3e402

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-123471: Make itertools.chain thread-safe #135689

gh-123471: Make itertools.chain thread-safe #135689

eendebakpt commented Jun 18, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

rhettinger commented Jun 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

ZeroIntensity Jun 19, 2025

Uh oh!

eendebakpt Jun 19, 2025

Uh oh!

ZeroIntensity Jun 20, 2025

Uh oh!

Uh oh!

ZeroIntensity commented Jun 19, 2025

Uh oh!

eendebakpt commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

gh-123471: Make itertools.chain thread-safe #135689

Are you sure you want to change the base?

gh-123471: Make itertools.chain thread-safe #135689

Conversation

eendebakpt commented Jun 18, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhettinger commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ZeroIntensity Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

eendebakpt Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

ZeroIntensity Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ZeroIntensity commented Jun 19, 2025

Uh oh!

eendebakpt commented Jun 19, 2025

Uh oh!

Uh oh!

eendebakpt commented Jun 18, 2025 •

edited by bedevere-app bot

Loading

rhettinger commented Jun 18, 2025 •

edited

Loading