Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast 1D interpolation #4

Open
3 tasks
sriharshakandala opened this issue Dec 16, 2024 · 14 comments · May be fixed by #3
Open
3 tasks

Fast 1D interpolation #4

sriharshakandala opened this issue Dec 16, 2024 · 14 comments · May be fixed by #3
Assignees
Labels
SDI Software Design Issue

Comments

@sriharshakandala
Copy link
Member

sriharshakandala commented Dec 16, 2024

The Climate Modeling Alliance

Software Design Issue 📜

Purpose

Fast 1D interpolation

Cost/Benefits/Risks

Benefits

Provides fast 1D interpolation.

People and Personnel

Components

Inputs

Results and Deliverables

Fast 1D interpolation

Scope of Work

Tasks

Preview Give feedback

SDI Revision Log

CC

@tapios @sriharshakandala @charleskawczynski @cmbengue

@sriharshakandala sriharshakandala added the SDI Software Design Issue label Dec 16, 2024
@sriharshakandala
Copy link
Member Author

cc: @Sbozzolo

@sriharshakandala sriharshakandala self-assigned this Dec 16, 2024
@sriharshakandala sriharshakandala linked a pull request Dec 16, 2024 that will close this issue
1 task
@Sbozzolo
Copy link
Member

Could you write up the requirements, the proposed design to address them, and how the interface would look like?

@Sbozzolo
Copy link
Member

Sbozzolo commented Dec 17, 2024

I verified that indeed we are not using any interpolation in time, so we don't need that.

I rephrased what, in my view, the requirements are, from this google doc

The instances where Interpolations.jl is currently used are:

  1. Read forcing and parameters from NetCDF files to ClimaCore Fields. This is currently done with the InterpolationsRegridder backend to the SpaceVaryingInputs. This cannot handle the case with 3D z, meaning that first have to resample to a common vertical grid, before putting on the simulation grid. This is also not aware of land-sea masks, meaning that we are currently contaminating costal regions with invalid values in ClimaLand. [*]

  2. Generation of initial data from data read from file.

  • This includes 1D (text) files and 2/3D (NetCDF) files. This is currently not compatible with GPUs, so that people have to go through workarounds (example of error).
  1. Resampling from vertical coordinates to pressure coordiantes in ClimaAnalysis. This is currently slow (but needs to be done only once per simulation).

  2. Resampling from pressure coordiantes to vertical coordiantes in ClimaArtifacts. This is currently slow (but needs to be done only once).

  3. Resampling 2/3D observations and simulated data on a common grid (to compute RMSEs).

In addition to this, Interpolations.jl comes with several dependencies and features we don't want, is not very well maintained, and doesn't have good documentation.

[It is also possible that 2 could be accomplished in a different way (by completing overhauling the way we do initial condition in ClimaAtmos, which is already something we'd like to do).]

From this, it comes that ClimaInterpolations.jl needs to implement:

  • 1D and 2D linear interpolation for possibly non-equispaced, logically cartesian, structured grids
  • Combining 1D + 2D interpolation to interpolate on 3D spaces
    • directly supporting 3D altitude coordinates
    • supporting a user-provided horizontal mask to ignore certain values
  • Support for flat, periodic, linear extrapolation, and throws error boundary conditions
  • Ability to create an interpolator object itp that, when used pointwise, behaves as a normal julia function (for 2)
    • This includes being able to be moved on a GPU and used in a kernel

Nice to have:

  • No/very few external dependencies
    • In particular, no dependency on ClimaCore
    • Dependency on CUDA, if needed, only in an extension
  • 1,2,3D interpolation should not have different interfaces
  • The interpolator object itp should be updatable in-place with ne data with no extra allocations
  • Easy to do 1D interpolation
    • itp = interpolator(x::AbstractVector, y::AbstractVector); itp(x) should just work
  • Faster search algorithm for equispaced or almost equispaced arrays (see DataInterpolations.jl and its dependency FindFirstFunction.jl)
  • Directly supporting interpolation with pressure coordinates

Nice to have, but hard:

  • A smart interpolation scheme for large MPI processes and/or big datasets: meaning that only the data relevant to the current MPI process is loaded (to GPUs)

cc: @tapios @sriharshakandala

[*] In CliMA/ClimaUtilities.jl#143, I implemented support to this feature in a hacky and inefficient way.

@tapios
Copy link

tapios commented Dec 18, 2024

Sounds good. However, we will soon need the interpolation to pressure coordinates and accumulation of averages on them on the fly, i.e., every output step of the model, so this needs to be fast in the end. Hopefully, ClimaInterpolate will enable this.

@charleskawczynski
Copy link
Member

What's the rough timeline for this work to be completed?

@sriharshakandala
Copy link
Member Author

sriharshakandala commented Jan 15, 2025

The following features are currently added in PR #3

  1. Fast 1D interpolation (https://github.com/CliMA/ClimaInterpolations.jl/pull/3/files#diff-ccd7ef9b7ebb158ec8627f0c451207ec99900d9c571f03922529de24ca01e113R43 ) on single column and multiple columns
  2. Fast, multilevel (and single level) bilinear interpolation https://github.com/CliMA/ClimaInterpolations.jl/pull/3/files#diff-e27da1659722e921a941509beaa5c34a5974f382b7e486290b83d69f789f230eR75

Combined together, this also provided 3D interpolation on extruded 3D grids.
Currently, single-threaded CPU, CUDA and Metal support are added.
Docstrings and tests are added. These functions are tested on the above three platforms. Some initial benchmark data is also provided.

I am working on

  1. Setting up the CI
  2. Adding documentation project
  3. Adding the Buildkite pipeline

These will be done within the next few days. I am currently working from the draft branch.

@sriharshakandala
Copy link
Member Author

sriharshakandala commented Jan 15, 2025

Current interfaces:

  1. Linear interpolation
"""
    interpolate1d!(
        ftarget::AbstractArray{FT,N},
        xsource::AbstractArray{FT,N},
        xtarget::AbstractArray{FT,N},
        fsource::AbstractArray{FT,N},
        order,
        extrapolate = Flat(),
    ) where {FT,N}
Interpolate `fsource`, defined on grid `xsource`, onto the `xtarget` grid.
Here the source grid `xsource` is an N-dimensional array of columns.
The first dimension is assumed to be the column dimension. 
Each column can have a different grid.
"""
  1. Bilinear interpolation
"""
    interpolatebilinear!(
        ftarget::AbstractArray{FT,N},
        bilinear::B,
        fsource::AbstractArray{FT,N},
    ) where {FT,N,B}
Interpolate `fsource`, defined on source grid, onto the target grid.
The horizontal source and targets are define in `bilinear`.
Here `fsource` is an N-dimensional array where the last two dimensions are assumed to be horizontal dimensions.
For example, `fsource` can be of size `[n1, n2..., nx, ny]`, where `nx` and `ny` are the horizontal dimensions.
Single horizontal level is also supported.
The number of horizontal levels should be same for both source and target arrays.
"""

@sriharshakandala
Copy link
Member Author

ClimaArtifacts.jl for DYAMONDSummer simulations are updated to used the ClimaInterpolations draft branch above.
The full resolution initial conditions for DYAMONDSummer simulation are successfully tested in the long run at https://buildkite.com/clima/climaatmos-target-gpu-simulations/builds/396#_

@sriharshakandala
Copy link
Member Author

sriharshakandala commented Jan 15, 2025

If people are currently using different interfaces, we will need to add some wrappers for user convenience. I can also add additional optimizations for special cases, as needed.
Another item on the list is to replace are current calls to Interpolations.jl with corresponding ClimaInterpolations.jl calls.

@charleskawczynski
Copy link
Member

charleskawczynski commented Jan 15, 2025

Hi @sriharshakandala, thank you for the information.

@tapios, what's the rough due date for this work to be completed?

@Sbozzolo
Copy link
Member

Sbozzolo commented Jan 15, 2025

My understanding for the feature that was driving the immediate need for a new package was the ability to immediately read files with 3D altitudes without having to pre-process them resampling to a common vertical z. (There are other capabilities we would like in the short term and that we don't have at all, like supporting a mask-aware interpolation to avoid mixing ocean and land values, requiring a new approach/package.)

In other words, we wanted a simpler and more accurate way to interpolate from files to model grids, where files are often defined on a grid defined as follows: (lon, lat, k), with k level and have a 3D array z[lon, lat, k].

Since we already support reading files with (lon, lat, z), the current workflow is to read input data is to preprocess the file and interpolate the it onto a single grid (lat, lon, z), then do linear interpolation to interpolate on every single point of the model grid (using Interpolations.jl). Note that this pre-process step is very slow because Interpolations.jl is not parallel and files can be huge. However, preprocessing is only done once, because once the file is generated, it is reused for all the simulations. Moreover, this problem is embarassingly parallel, and I showed how this can this parallelized (CliMA/ClimaArtifacts@main...gb/dyamond processes the full dyamond inititial conditions in less than 2 hours on the node of the cluster).

My understanding is that the desired workflow would be immediately read the file without the intermediate pre-processing step. This would be simpler and lead to more accurate vertical interpolation.

We can do 1D, 2D (or nD) linear interpolations with Interpolations.jl, but the capability we don't have is to immediately read a file (lat, lon, k) with z[lat, lon, k] (3D altitudes) and use the full 3D altitudes without intermediate step (one step vs two steps). I don't think this problem cannot be trivially decomposed into 2D+1D because neighboring columns can have different z vectors (it can still be decomposed in 2D + 1D, but it requires more work and more care in correctly searching the neighboring points, as I show in CliMA/ClimaUtilities.jl#143).

I showed how this can be done in CliMA/ClimaUtilities.jl#143. This PR shows a proof of concept of how to remove the need for the pre-processing step and do more accurate interpolation with 3D altitudes. This also solves the issue with pre-processing being slow (because pre-processing is not done).

The interface in that PR looks like:

# Arguments
- `data`: A 3D array of data values.
- `xs`: A vector of x-coordinates corresponding to the first dimension of `data`.
- `ys`: A vector of y-coordinates corresponding to the second dimension of `data`.
- `zs`: A 3D array of z-coordinates. `zs[i, j, :]` provides the z-coordinates for the data point `data[i, j, :]`.
- `target_x`: The x-coordinate of the target point.
- `target_y`: The y-coordinate of the target point.
- `target_z`: The z-coordinate of the target point.
"""
interpolation_3d_z(data, xs, ys, zs, target_x, target_y, target_z)

(this is more or less the interface we discussed in the meeting with Tapio and Zhaoyi in December).

It doesn't look like #3 is addressing this capability and the functions it provides are functions we already have (yes, ours are not vectorized, but that's not as important for generating initial data).

Note also that in addition to this capability, there's a big chunk of work related to supporting new interpolations schemes into our simulations that I don't think has been considered yet.

ClimaUtilities is responsable for reading in data from files to the model grids. It currently implements two backends to deal with gridded data:

  • TempestRegridder, that uses ClimaCoreTempestRemap
  • InterpolationsRegridder, that uses Interpolations.jl

If we want to use ClimaInterpolations to read in input files, either ClimaInterpolations uses with the same interface as Interpolations.jl (which it doesn't in the current PR), or a new backend has to be added.

On the topic of interfaces, Interpolations.jl has a object-based interface, not a functional one. This means that one creates an object itp = Interpolator(data, x, y, z) and then can evaluate it as itp(target_x, target_y, target_z). In this way, itp behaves as a normal Julia function. This allows for writing code that can use interpolators and normal functions in a unified way. For example, density(x, y, z) * velocity(x, y, z) would work with density and velocity being functions, interpolators, or a mix of the two (this is opposed to having to distinguish the case where we have an interpolator and manually call density = interpolate(density_data, density_coordinates, x, y, z). I had added this as one of the requirements also because it would signifincaly increase compatibility with Interpolations.jl, allowing for an easier upgrade path.

@sriharshakandala
Copy link
Member Author

  • The current PR implements the 3D interpolation as a combination of bilinear (horizontal) and linear interpolation (vertical). This does it efficiently and on multiple architectures (single thread CPU, NVIDIA and Apple GPUs). Preliminary benchmarks are already added, as discussed above.
  • The next step is to implement the IC this directly in ClimaCore.jl, avoid intermediate steps, after this is merged. Another objective is to replace all uses of Interpolations.jl with equivalents.
  • Here, 1D interpolation is done on the fly. We use an interpolation object for bilinear interpolation for efficiency. However, I can switch them to use the above interface if that is preferred.
  • We can work on mask aware interpolation next. This is not a part of this PR.

@tapios
Copy link

tapios commented Jan 17, 2025

Let's talk about this again if we need to. What I want to have right now is the simplest interpolation that allows us to quickly initialize the atmosphere model at arbitrary resolutions.

We do not need mask-aware interpolations immediately; we can interpolate, e.g., atmosphere and land separately on their grids, and average where we need to after the fact, as for fluxes.

@sriharshakandala
Copy link
Member Author

Sure.

  • I can rebase and add individual PRs for 1D linear interpolation and bilinear interpolation for review.
  • I can also add the third function which combines these two steps with horizontal_first and vertical_first options for the 3D interpolation in a separate PR!
  • If there are any preferred special use cases (uniform spacing, common target z grid, etc.), I can add them as needed.

We can address interface preferences during the review process, get them to align with end user preferences.

I might have misread users preferences a bit on interest for a functors-based interface for these functions. Newer users expressed some issues understanding code when working with functors inside ClimaAtmos.jl initialization code and expressed preference for a simpler alternative in their earlier days. However, we can switch to functors-based interface if this is indeed preferred by the users!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SDI Software Design Issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants