Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experiment] WebGPU backend #1789

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

zcbenz
Copy link
Contributor

@zcbenz zcbenz commented Jan 24, 2025

This PR adds an experiment WebGPU backend which only has support for binary ops, this is not something aimed to be merged, it only means to show the possibility.

The actual shaders and WebGPU API calls are put in a separate project: https://github.com/frost-beta/betann.

To build:

cmake . -Bbuild -DMLX_BUILD_WEBGPU=ON -DMLX_BUILD_EXAMPLES=ON
cmake --build build -j 16

Run the example:

$ ./build/examples/cpp/tutorial
array([5, 7, 9], dtype=float32)

@awni
Copy link
Member

awni commented Jan 26, 2025

@zcbenz that's extremely cool. I'm supportive of exploring the addition of WebGPU as a back-end for MLX.

One initial comment is it would be nice to avoid breaking the "unified memory" programming model. So instead of changing the array API.. it might make more sense to change the WebGPU specific allocator (and maybe kernels) to have a Buffer which can (but don't have to) have both a CPU buffer and a GPU buffer. Then the buffer can manage if/when it needs to make a copy based on if you request the CPU or GPU pointer.

This actually fits pretty well with our notion of Buffer already which has a raw_ptr() method to get the CPU pointer (that could do the copy if needed). Or we could modify that API a little to make it more explicit.

I'm also very curious to know If there are any other major internal API changes needed or if mostly it just plugs in without much difficulty.

@zcbenz
Copy link
Contributor Author

zcbenz commented Jan 27, 2025

Thanks for your support!

What do you think if we add a "null" backend in upstream that mimics the general gpu backend by copying data to a separate buffer and then just calls eval_cpu? In that way we can explore what changes are actually needed for internal APIs without checking in any WebGPU code, and I can incrementally make changes while getting more familiar with WebGPU.

@awni
Copy link
Member

awni commented Jan 27, 2025

What do you think if we add a "null" backend in upstream that mimics the general gpu backend by copying data to a separate buffer and then just calls eval_cpu?

If you a share a PR for what you mean it might be easier to say if this is something we could include upstream. I'm not sure we necessarily need to merge it though.. it seems like it could be ok to just have a fork / branch of this for now until we converge a bit on what's useful there.

@zcbenz zcbenz changed the title [Experiment] WebGPU backend with support for binary ops [Experiment] WebGPU backend Jan 27, 2025
@zcbenz
Copy link
Contributor Author

zcbenz commented Jan 29, 2025

@awni I have updated the code to use a custom allocator to create data the holds both CPU and GPU buffers, can you do a simple review?

There are also 2 allocator design decisions that need your help:

  1. Once the data has been copied from GPU to CPU, the array's data_ptr needs to be updated to point to the CPU data. I added a simple API to reset it, is there a better way to do that?

  2. Most existing code use malloc_or_wait to allocate memory for kernels. In webgpu backend we need to explicitly specify the device where to allocate the memory, which means we can not reuse existing utilities like set_binary_op_output_data or broadcast in webgpu backend.
    Can we add a device parameter to malloc_or_wait? There is no performance penalty, and otherwise we have to duplicate lots of code to simply replace malloc_or_wait with gpu_malloc, like the set_binary_op_output_gpu_data function in this PR.

@zcbenz zcbenz closed this Jan 29, 2025
@zcbenz zcbenz deleted the webgpu branch January 29, 2025 11:14
@zcbenz zcbenz restored the webgpu branch January 29, 2025 11:14
@zcbenz zcbenz reopened this Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants