Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardcoded assumption about numa domains and threads IDs #16

Closed
giordano opened this issue Jul 24, 2023 · 11 comments
Closed

Hardcoded assumption about numa domains and threads IDs #16

giordano opened this issue Jul 24, 2023 · 11 comments

Comments

@giordano
Copy link
Contributor

giordano commented Jul 24, 2023

In the memory domain scaling function there are some hardcoded assumptions about the fact numa domains and thread ids start from 1. They don't hold on Fugaku, where the cores reserved to users are 12:59, and the corresponding NUMA domains are 3:6. I redefined bwscaling_memory_domains as

@eval BandwidthBenchmark function bwscaling_memory_domains(; min_nnuma=1, max_nnuma=nnuma(),
    max_nthreads=maximum(ncores_per_numa()), kwargs...)
    if Threads.nthreads() < (max_nnuma - min_nnuma + 1) * max_nthreads
        throw(ErrorException("Not enough Julia threads. Please start Julia with at least " *
                             "$(max_nnuma * max_nthreads) threads."))
    elseif !all(>=(max_nthreads), ncores_per_numa()[min_nnuma:max_nnuma])
        throw(ErrorException("Not all memory domains have enough cores (at least " *
                             "max_nthreads=$(max_nthreads)). You may try to set the " *
                             "keyword argument max_nthreads to a lower value."))
    end
    # query system information
    numacpuids = cpuids_per_numa()[min_nnuma:max_nnuma]
    filter!.(!ishyperthread, numacpuids) # drop hyperthreads
    results = DataFrame(;
        Function=String[],
        var"# Threads per domain"=Int64[],
        var"# Memory domains"=Int64[],
        var"Rate (MB/s)"=Float64[]
    )
    for nn in min_nnuma:max_nnuma # how many domains to use
        for nt in 1:max_nthreads # how many threads/cores to use per domain
            println("nnuma=$nn, nthreads_per_numa=$nt")
            total_nthreads = nn * nt
            # select cpuids
            cpuids = @views reduce(vcat, numacpuids[i][1:nt] for i in 1:(nn - min_nnuma + 1))
            # pin threads
            pinthreads(cpuids)
            # run benchmark (all kernels)
            df = bwbench(; nthreads=total_nthreads, kwargs...)
            # store and print result
            for row in eachrow(df)
                push!(results, [row.Function, nt, nn, row.var"Rate (MB/s)"])
            end
            println("Bandwidths (MB/s): ", join([round(bw, digits=2) for bw in df.var"Rate (MB/s)"], ", "))
            flush(stdout)
        end
    end
    return results
end

(and this is still assuming numa domains IDs are consecutive, which in other cases may not be true) but this is still crashing with

ERROR: LoadError: ArgumentError: nthreads 50 must ≥ 1 and ≤ 48. If you want more threads, start Julia with a higher number of threads.
Stacktrace:
 [1] bwbench(; N::Int64, niter::Int64, verbose::Bool, nthreads::Int64, alignment::Int64, write_allocate::Bool)
   @ BandwidthBenchmark /data/ra000019/a04463/julia-depot/packages/BandwidthBenchmark/LGH1D/src/bwbench.jl:35
 [2] bwbench
   @ /data/ra000019/a04463/julia-depot/packages/BandwidthBenchmark/LGH1D/src/bwbench.jl:21 [inlined]
 [3] bwscaling_memory_domains(; min_nnuma::Int64, max_nnuma::Int64, max_nthreads::Int64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})

Edit:

total_nthreads = (nn - min_nnuma + 1) * nt

may help, running again now.

@giordano
Copy link
Contributor Author

Any specific reason why total_nthreads in

total_nthreads = nn * nt
isn't length(cpuids)?

@carstenbauer
Copy link
Member

Well, not really. But the assumption is that we want to scan over nn memory domains with nt threads each. Just changing to length(cpuids) isn't going to change that.

@giordano
Copy link
Contributor Author

It changes with my definition of cpuids above

cpuids = @views reduce(vcat, numacpuids[i][1:nt] for i in 1:(nn - min_nnuma + 1))

😅

@carstenbauer
Copy link
Member

FWIW, I think it would be better to directly support selecting specific NUMA domains for the benchmark than keeping the assumption that they are contiguous.

@carstenbauer
Copy link
Member

It changes with my definition of cpuids above

But total_nthreads = nn * nt also works fine in this case, no?

@carstenbauer
Copy link
Member

carstenbauer commented Jul 24, 2023

Does the following work for you? If yes, I'll push it to master.

"""
Similar to `bwscaling` but measures the memory bandwidth scaling within and across
memory domains. Returns a `DataFrame` in which each row contains the kernel name,
the number of threads per memory domain, the number of domains considered, and the
measured memory bandwidth (in MB/s).

**Keyword arguments**
- `domains`: memory domains to consider (logical indices, i.e. starting at 1)
- `max_nthreads`: maximal number of threads per memory domain to consider
"""
function bwscaling_memory_domains(; domains=1:nnuma(),
    max_nthreads=maximum(ncores_per_numa()), kwargs...)
    if !all(i->i>0 && i<=nnuma(), domains)
        throw(ArgumentError("Invalid argument `numas` (out of bounds)."))
    end
    # query system information
    numacpuids = cpuids_per_numa()[domains]
    filter!.(!ishyperthread, numacpuids) # drop hyperthreads
    if !all(x->length(x) >= max_nthreads, numacpuids)
        throw(ArgumentError("Some memory domains don't have enough CPU-cores. Please provide a smaller value for `max_nthreads`."))
    end
    if Threads.nthreads() < length(domains) * max_nthreads
        throw(ErrorException("Not enough Julia threads. Please start Julia with at least " *
                             "$(length(domains) * max_nthreads) threads."))
    end
    results = DataFrame(;
        Function=String[],
        var"# Threads per domain"=Int64[],
        var"# Memory domains"=Int64[],
        var"Rate (MB/s)"=Float64[]
    )
    for nn in 1:length(domains) # how many domains to use
        for nt in 1:max_nthreads # how many threads/cores to use per domain
            println("domain=$(domains[nn]), nthreads_per_numa=$nt")
            total_nthreads = nn * nt
            # select cpuids
            cpuids = @views reduce(vcat, numacpuids[i][1:nt] for i in 1:nn)
            # @show cpuids
            # pin threads
            pinthreads(cpuids)
            # run benchmark (all kernels)
            df = bwbench(; nthreads=total_nthreads, kwargs...)
            # store and print result
            for row in eachrow(df)
                push!(results, [row.Function, nt, nn, row.var"Rate (MB/s)"])
            end
            println("Bandwidths (MB/s): ", join([round(bw, digits=2) for bw in df.var"Rate (MB/s)"], ", "))
            flush(stdout)
        end
    end
    return results
end

Use as bwscaling_memory_domains(; domains=3:6).

@giordano
Copy link
Contributor Author

But total_nthreads = nn * nt also works fine in this case, no?

No, that causes the error

ERROR: LoadError: ArgumentError: nthreads 50 must ≥ 1 and ≤ 48. If you want more threads, start Julia with a higher number of threads.

length(cpuids) would solve that.

@carstenbauer
Copy link
Member

Does my code above work for you?

@giordano
Copy link
Contributor Author

giordano commented Jul 24, 2023

No:

ERROR: LoadError: ArgumentError: Some memory domains don't have enough CPU-cores. Please provide a smaller value for `max_nthreads`.
Stacktrace:
 [1] bwscaling_memory_domains(; domains::UnitRange{Int64}, max_nthreads::Int64, kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol, Symbol}, NamedTuple{(:min_nnuma, :max_nnuma), Tuple{Int64, Int64}}})
   @ BandwidthBenchmark /vol0003/ra000019/data/a04463/repo/julia-on-ookami/benchmarks/bandwidth/bench.jl:17

Edit: ah, wait, I forgot to pass the right arguments and it discarded the wrong ones.

@giordano
Copy link
Contributor Author

Ok, yes, bwscaling_memory_domains(; domains=3:6) runs without errors

@carstenbauer
Copy link
Member

Great (closed by e70ec27)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants