Skip to content

Fix std.Progress panic #23662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

achan1989
Copy link
Contributor

Fixes #21663

The main issue was an ABA problem in the maintenance of the freelist. This has been fixed by making node_freelist_first into a generational index.

Also in this MR are fixes for some other issues I encountered while investigating this.

An example program to reproduce the issue and confirm the fix.

Run like repro 20 3 to reproduce the main issue. Run like repro 200 3 to reproduce the issue behind commit "Fix maintenance of node_end_index when out of node storage memory".

const std = @import("std");
const Progress = std.Progress;
const Thread = std.Thread;

pub fn main() !void {
    var args = std.process.ArgIteratorPosix.init();
    if (args.count != 3) {
        std.debug.print("Usage: repro num_threads steps_per_node\n", .{});
        return;
    }
    _ = args.skip();
    const num_threads = try std.fmt.parseInt(usize, args.next().?, 10);
    if (num_threads > 200) {
        std.debug.print("max 200 threads\n", .{});
        return;
    }
    const steps_per_node = try std.fmt.parseInt(usize, args.next().?, 10);

    var root_progress = Progress.start(.{ .root_name = "root" });
    var threads: [200]Thread = undefined;
    for (0..num_threads) |i| {
        threads[i] = try Thread.spawn(.{}, spam, .{ &root_progress, i, steps_per_node });
    }
    for (0..num_threads) |i| {
        std.debug.print("joining {d}\n", .{i});
        threads[i].join();
    }
}

fn spam(root_progress: *Progress.Node, threadnum: usize, steps_per_node: usize) void {
    var name: [5]u8 = undefined;
    const name_len = std.fmt.formatIntBuf(name[0..name.len], threadnum, 10, .lower, .{ .width = 2 });
    while (true) {
        const node = root_progress.start(name[0..name_len], steps_per_node);
        for (0..steps_per_node) |_| {
            node.completeOne();
        }
        node.end();
    }
}

Prevents a crash in serialize() when there are many threads trying to
call Node.start() at the same time.
Node.init() failed an assert, indicating that we were attempting to
use a node slot that was already in use. This was caused by an ABA
problem involving node_freelist_first.
Fixed by making node_freelist_first into a generational index.
@alexrp alexrp requested a review from andrewrk April 26, 2025 13:31
@alexrp alexrp added this to the 0.14.1 milestone Apr 26, 2025
@@ -307,7 +307,8 @@ pub const Node = struct {
@atomicStore(u8, &storage.name[name_len], 0, .monotonic);

const parent_ptr = parentByIndex(free_index);
assert(parent_ptr.* == .unused);
const current_parent = @atomicLoad(Node.Parent, parent_ptr, .acquire);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In non-safe builds this introduces an unnecessary acquire-load. Not sure what to do about that.

@@ -246,7 +246,7 @@ pub const Node = struct {
}
const index = n.index.unwrap() orelse return;
const parent_ptr = parentByIndex(index);
if (parent_ptr.unwrap()) |parent_index| {
if (@atomicLoad(Parent, parent_ptr, .acquire).unwrap()) |parent_index| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the acquire-load to match the other acquire-load of parentByIndex() in Node.init(). I think it makes sense, but tbh I'm not 100% sure it's necessary.

.opt_index = @intFromEnum(OptionalIndex.none),
};

fn unwrap_index(i: @This()) ?Index {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function names in Zig are camel case

@mlugg
Copy link
Member

mlugg commented Apr 29, 2025

Superseded by #23725; thanks for the work here @achan1989!

@mlugg mlugg closed this Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

std.Progress panic: reached unreachable code
4 participants