Skip to content

Python 3.14/3.15a build aborting due to OOM during test_functools / test_json #143460

@Thyre

Description

@Thyre

Bug report

Bug description:

In the context of EasyBuild, I'm trying to build Python 3.14.2 for a new toolchain based on GCC 15.2.0 (see this PR easybuilders/easybuild-easyconfigs#25006). Building Python worked fine on a few machines I've tested, all x86-64 based, with various distributions. However, builds consistently failed on two particular systems with similar hardware.

Looking into build failures, I was getting the following output:

0:00:20 load avg: 1.00 [18/43] test_functools
make: *** [Makefile:1020: profile-run-stamp] Killed

Checking the build command, test_functools is hanging indefinitely until seemingly killed by OOM. Checking closer with GDB, I'm getting this stack trace.

#79 0x000040000161fa1c in save_reduce (st=st@entry=0x40000153db10, self=self@entry=0x40002cb0c720, args=<optimized out>, obj=obj@entry=0x40002c6fbfa0) at ./Modules/_pickle.c:4273
#80 0x0000400001619154 in save (st=0x40000153db10, self=0x40002cb0c720, obj=0x40002c6fbfa0, pers_save=<optimized out>) at ./Modules/_pickle.c:4555
#81 0x0000400001616898 in store_tuple_elements (state=0x40000153db10, self=0x40002cb0c720, t=0x4000732a9240, len=1) at ./Modules/_pickle.c:2792
#82 0x000040000161a914 in save_tuple (state=0x40000153db10, self=0x40002cb0c720, obj=0x4000732a9240) at ./Modules/_pickle.c:2872
#83 save (st=st@entry=0x40000153db10, self=self@entry=0x40002cb0c720, obj=0x4000732a9240, pers_save=pers_save@entry=0) at ./Modules/_pickle.c:4434
#84 0x000040000161fa1c in save_reduce (st=st@entry=0x40000153db10, self=self@entry=0x40002cb0c720, args=<optimized out>, obj=obj@entry=0x40002c6fbfa0) at ./Modules/_pickle.c:4273
#85 0x0000400001619154 in save (st=0x40000153db10, self=0x40002cb0c720, obj=0x40002c6fbfa0, pers_save=<optimized out>) at ./Modules/_pickle.c:4555
#86 0x0000400001616898 in store_tuple_elements (state=0x40000153db10, self=0x40002cb0c720, t=0x4000732a9200, len=1) at ./Modules/_pickle.c:2792
#87 0x000040000161a914 in save_tuple (state=0x40000153db10, self=0x40002cb0c720, obj=0x4000732a9200) at ./Modules/_pickle.c:2872
#88 save (st=st@entry=0x40000153db10, self=self@entry=0x40002cb0c720, obj=0x4000732a9200, pers_save=pers_save@entry=0) at ./Modules/_pickle.c:4434

Checking with dmesg, the process was indeed killed with OOM. Having more than 400 GB of system memory, I would assume that this is sufficient to build Python

[18354.456973] [3169487]  9049 3169487     1340      616   458752        0             0 python3
[18354.465830] [3192788]  9049 3192788      216        0   458752        0             0 make
[18354.474416] [3206584]  9049 3206584  8964670  8952086 72220672        0             0 python
[18354.483181] [3207839]     0 3207839      145        0   393216        0             0 mmccrmonitor
[18354.492479] [3207840]     0 3207840      579      198   393216        0             0 psid
[18354.501065] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=nvidia-dcgm.service,mems_allowed=0-1,global_oom,task_memcg=/slurm/user-reuter1/job-14341053/step-0/tasks,task=python,pid=3206584,uid=9049
[18354.520291] Out of memory: Killed process 3206584 (python) total-vm:573738880kB, anon-rss:572924288kB, file-rss:0kB, shmem-rss:9216kB, UID:9049 pgtables:70528kB oom_score_adj:0

I've tried reducing the number at test_recursive_pickle yielding a, probably expected, test failure.

with support.infinite_recursion(100):

The build then fails later on in test_json for the same reason.

#18 0x0000400001dbd5ec in encoder_listencode_obj (s=s@entry=0x40002cbabdc0, writer=writer@entry=0x40002c4422f0, obj=0x40002c4f47d0, indent_level=indent_level@entry=0, indent_cache=indent_cache@entry=0x0) at ./Modules/_json.c:1549
#19 0x0000400001dbda68 in encoder_listencode_list (s=0x40002cbabdc0, writer=0x40002c4422f0, seq=0x40013ab14680, indent_level=0, indent_cache=0x0) at ./Modules/_json.c:1805
#20 encoder_listencode_obj (s=s@entry=0x40002cbabdc0, writer=writer@entry=0x40002c4422f0, obj=obj@entry=0x40013ab14680, indent_level=indent_level@entry=0, indent_cache=indent_cache@entry=0x0) at ./Modules/_json.c:1519
#21 0x0000400001dbd70c in encoder_listencode_obj (s=s@entry=0x40002cbabdc0, writer=writer@entry=0x40002c4422f0, obj=0x40002c4f47d0, indent_level=indent_level@entry=0, indent_cache=indent_cache@entry=0x0) at ./Modules/_json.c:1560
#22 0x0000400001dbda68 in encoder_listencode_list (s=0x40002cbabdc0, writer=0x40002c4422f0, seq=0x40013ab14640, indent_level=0, indent_cache=0x0) at ./Modules/_json.c:1805
#23 encoder_listencode_obj (s=s@entry=0x40002cbabdc0, writer=writer@entry=0x40002c4422f0, obj=obj@entry=0x40013ab14640, indent_level=indent_level@entry=0, indent_cache=indent_cache@entry=0x0) at ./Modules/_json.c:1519

It's worth noting that builds on other platforms (Arch Linux, Fedora, Ubuntu 24) with x86 all worked out fine.

For testing, I've then tried to use GCC 14.3.0, yielding the same results.
Trying Python 3.13.5, the build passed. Python 3.14.1, 3.14.2 and 3.15.0a3 failed with the issues mentioned above.
Dependencies of Python were different between the two GCC versions, but still yielded the same result.

For builds, the following flags were used:

./configure --prefix=/tmp/software/Python/3.14.2-GCCcore-14.3.0  --build=aarch64-unknown-linux-gnu  --host=aarch64-unknown-linux-gnu  --enable-shared  --with-lto  --enable-optimizations  --with-ensurepip=upgrade

I've also tried using an external expat, with no noticeable differences.


Hardware information:

  • Linux Rocky Linux 9.6, AArch64, ARM UNKNOWN (neoverse_v2), 1 x NVIDIA NVIDIA GH200 480GB, NVIDIA driver 580.95.05, Python 3.9.21
  • Linux RHEL 9.6, AArch64, ARM UNKNOWN (neoverse_v2), 1 x NVIDIA NVIDIA GH200 480GB, 570.133.20, Python 3.9.21

Unfortunately, I'm a bit stuck here. Looking through existing issues, I've found #113655, though this issue is related to Stack Overflows and Windows. I've also found PRs like #124264, but Python 3.14 removed these values altogether (#133080). I'm not sure if the issues I'm seeing are related.

Happy to provide more information, if needed.

CPython versions tested on:

3.15, 3.14, 3.13

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    OS-linuxtestsTests in the Lib/test dirtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions