Skip to content

Commit 650ecc6

Browse files
authored
Update README.md
1 parent 76ce011 commit 650ecc6

File tree

1 file changed

+21
-10
lines changed

1 file changed

+21
-10
lines changed

README.md

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -158,15 +158,24 @@ $ for backend in jax cupy pytorch tensorflow; do
158158

159159
#### Equation of state
160160

161-
![Equation of state on CPU](results/magni-plots/bench-equation_of_state-CPU.png?raw=true) ![Equation of state on GPU](results/magni-plots/bench-equation_of_state-GPU.png?raw=true)
162-
161+
<p align="middle">
162+
<img src="results/magni-plots/bench-equation_of_state-CPU.png?raw=true" width="400">
163+
<img src="results/magni-plots/bench-equation_of_state-GPU.png?raw=true" width="400">
164+
</p>
165+
163166
#### Isoneutral mixing
164167

165-
![Isoneutral mixing on CPU](results/magni-plots/bench-isoneutral_mixing-CPU.png?raw=true) ![Isoneutral mixing on GPU](results/magni-plots/bench-isoneutral_mixing-GPU.png?raw=true)
168+
<p align="middle">
169+
<img src="results/magni-plots/bench-isoneutral_mixing-CPU.png?raw=true" width="400">
170+
<img src="results/magni-plots/bench-isoneutral_mixing-GPU.png?raw=true" width="400">
171+
</p>
166172

167173
#### Turbulent kinetic energy
168174

169-
![Turbulent kinetic energy on CPU](results/magni-plots/bench-turbulent_kinetic_energy-CPU.png?raw=true) ![Turbulent kinetic energy on GPU](results/magni-plots/bench-turbulent_kinetic_energy-GPU.png?raw=true)
175+
<p align="middle">
176+
<img src="results/magni-plots/bench-turbulent_kinetic_energy-CPU.png?raw=true" width="400">
177+
<img src="results/magni-plots/bench-turbulent_kinetic_energy-GPU.png?raw=true" width="400">
178+
</p>
170179

171180
### Full reports
172181

@@ -177,14 +186,16 @@ $ for backend in jax cupy pytorch tensorflow; do
177186

178187
Lessons I learned by assembling these benchmarks: (your mileage may vary)
179188

180-
- The performance of Jax seems very competitive, both on GPU and CPU. It is consistently among the top implementations on CPU, and shows the best performance on GPU.
181-
- Jax' performance on GPU seems to be quite hardware dependent. Jax performance significantly better (relatively speaking) on a Tesla P100 than a Tesla K80.
182-
- Numba is a great choice on CPU if you don't mind writing explicit for loops (which can be more readable than a vectorized implementation), being slightly faster than Jax with little effort.
189+
- The performance of JAX is very competitive, both on GPU and CPU. It is consistently among the top implementations on both platforms.
190+
- Pytorch performs very well on GPU for large problems (slightly better than JAX), but its CPU performance is not great for tasks with many slicing operations.
191+
- Numba is a great choice on CPU if you don't mind writing explicit for loops (which can be more readable than a vectorized implementation), being slightly faster than JAX with little effort.
192+
- JAX performance on GPU seems to be quite hardware dependent. JAX performancs significantly better (relatively speaking) on a Tesla P100 than a Tesla K80.
183193
- If you have embarrasingly parallel workloads, speedups of > 1000x are easy to achieve on high-end GPUs.
184-
- Tensorflow is not great for applications like ours, since it lacks tools to apply partial updates to tensors (in the sense of `tensor[2:-2] = 0.`).
185-
- Don't bother using Pytorch or vanilla Tensorflow on CPU (you won't get much faster than NumPy). Tensorflow with XLA (`experimental_compile`) is great though!
194+
- TPUs are catching up to GPUs. We can now get similar performance to a high-end GPU on these workloads.
195+
- Tensorflow is not great for applications like ours, since it lacks tools to apply partial updates to tensors (such as `tensor[2:-2] = 0.`).
196+
- If you use Tensorflow on CPU, make sure to use XLA (`experimental_compile`) for tremendous speedups.
186197
- CuPy is nice! Often you don't need to change anything in your NumPy code to have it run on GPU (with decent, but not outstanding performance).
187-
- Reaching Fortran performance on CPU with vectorized implementations is hard :)
198+
- Reaching Fortran performance on CPU for non-trivial tasks is hard :)
188199

189200
## Contributing
190201

0 commit comments

Comments
 (0)