[4.1 Introduction]: why `add_python` is faster than `add_numpy` for vectorization `add` #74

bingyao · 2018-08-03T04:57:32Z

I found an opposite conclusion when running the example code in 4.1 Introduction, following code is my results tested in IPython 6.4.0 with Python 3.6.5 and Numpy 1.14.3:

In [1]: import numpy as np

In [2]: import random

In [3]: def add_python(Z1,Z2):
   ...:     return [z1+z2 for (z1,z2) in zip(Z1,Z2)]
   ...: 
   ...: def add_numpy(Z1,Z2):
   ...:     return np.add(Z1,Z2)
   ...: 

In [4]: Z1 = random.sample(range(1000), 100)

In [5]: Z2 = random.sample(range(1000), 100)

# For Python lists `Z1`, `Z2`, `add_python` is faster
In [6]: %timeit add_python(Z1, Z2)
8.25 µs ± 205 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: %timeit add_numpy(Z1, Z2)
16.9 µs ± 235 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [8]: a = np.random.randint(0, 1000, size=100)

In [9]: b = np.random.randint(0, 1000, size=100)
# For Numpy array `a`, `b`, `add_numpy` is faster
In [10]: %timeit add_python(a, b)
22.6 µs ± 816 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [11]: %timeit add_numpy(a, b)
851 ns ± 6.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

rougier · 2018-08-07T07:52:09Z

Interesting. I re-tested it using Python 3.7 and I got:

In [8]: %timeit add_python(Z1,Z2)
8.88 µs ± 423 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [9]: %timeit add_numpy(Z1,Z2)
14.4 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

dwt · 2018-12-13T23:20:33Z

The same thing for me. Using standard python arrays (python 3.7, Mac OS Mojave)

%timeit add_python(Z1, Z2)
6 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit add_numpy(Z1, Z2)
11.1 µs ± 46.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Using np.arrays instead, the timings change in an interesting way:

%timeit add_python(Z3, Z4)
28.5 µs ± 996 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit add_numpy(Z3, Z4)
540 ns ± 21.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit np.add(Z3, Z4)
488 ns ± 8.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Interestingly the python call overhead starts to really show when doing such micro benchmarks.

So to summarize:

numpy is about twice as slow for me with native python lists
numpy is just as fast as expected, with numpy arrays, and python is about twice as slow with bumpy arrays than with native lists

I'd say that is about as expected, so maybe that is what should be compared in the example instead of trying to do both compute paths with native python lists first?

dwt · 2018-12-13T23:35:51Z

I'd say the examples are just way too small to make the differences really visible. When upscaling the input a bit, I get this:

length = 100000

import random
Z1, Z2 = random.sample(range(length), length), random.sample(range(length), length)

%timeit add_python(Z1, Z2)
%timeit [z1+z2 for (z1,z2) in zip(Z1,Z2)]
19.1 ms ± 514 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
15.6 ms ± 395 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit add_numpy(Z1, Z2)
%timeit np.add(Z1, Z2)
11 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.9 ms ± 63.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Z3, Z4 = np.random.sample(length) * 100, np.random.sample(length) * 100

%timeit add_python(Z3, Z4)
%timeit [z3+z4 for (z3,z4) in zip(Z3,Z4)]
16.8 ms ± 93.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
16.7 ms ± 27.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit add_numpy(Z3, Z4)
%timeit np.add(Z3, Z4)
43.1 µs ± 263 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
42.7 µs ± 278 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

rougier · 2018-12-17T09:21:46Z

Nice. Could you make a PR for the book?

dwt · 2018-12-17T09:46:48Z

Sure, but it will probably take me until the Christmas-time.

dwt · 2018-12-17T09:47:07Z

(Also my english is shit, so you will have to improve that probably. Sorry)

rougier · 2018-12-17T15:42:44Z

Mine is the same, not sure I can correct :)

inamoto85 · 2019-02-15T01:52:24Z

Hi @dwt, getting similar results. Can you explain why this is about as expected (due to recent python optimizations on arrays)?

dwt · 2019-02-15T09:54:13Z

My thinking is that you have to think about a numpy operation in three parts. Switching from the python to the c layer, doing the actual computation and then switching back to python.

Now the actual computation part is pretty much always faster than doing the same computation in python. BUT if the context switches take more time than you save by doing the computation faster, then the pure python solution can still be faster.

This is why larger lists / arrays / vectors make the context switch to C more worth it, as the savings in the computation can dominate the costs of switching to the C layer.

inamoto85 · 2019-02-16T06:54:13Z

Thank you for the explanation!

dr-neptune · 2024-01-10T16:27:47Z

I've been playing around with this more today, and it seems that most of the time the python version is faster. My assumption is that addition is already fairly heavily optimized in python, leaving the time dominated by the numpy overhead.

vec_length = 1_000_000
Z1, Z2 = random.sample(range(vec_length), vec_length), random.sample(range(vec_length), vec_length)

# %timeit add_python(Z1, Z2)
# 253 ms ± 4.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# %timeit add_numpy(Z1, Z2)
# 501 ms ± 19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I got similar results at different sizes. It might be worth swapping out this example for something more convoluted to make a point:

def add_python(Z1, Z2):
    return [((z1**2 + z2**2)**0.5) + ((z1 + z2)**3) for z1, z2 in zip(Z1, Z2)]

def add_numpy(Z1, Z2):
    return np.sqrt(Z1**2 + Z2**2) + (Z1 + Z2)**3

vec_length = 1_000_000
Z1, Z2 = random.sample(range(vec_length), vec_length), random.sample(range(vec_length), vec_length)
Z1_np, Z2_np = np.array(Z1, dtype=np.float64), np.array(Z2, dtype=np.float64)

%timeit add_python(Z1, Z2)
# 665 ms ± 20.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit add_numpy(Z1_np, Z2_np)
# 54.2 ms ± 2.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

rougier · 2024-01-22T09:52:05Z

I tried again with the simple add version and 1,000,000 elements, and I get:

%timeit add_python(Z1, Z2)
54.6 ms ± 331 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit add_numpy(Z1_np, Z2_np)
645 µs ± 3.91 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

dr-neptune · 2024-01-22T20:15:35Z

Interesting -- my example is running on python 3.11, windows 10, and numpy 1.24.3. Your results are not only much more apparent, but much faster overall.

rougier · 2024-01-22T21:52:48Z

OSX, macbook M1, Python 3.11, Numpy 1.26.0

bingyao changed the title ~~[4.1 Introduction]: why add_python is faster than add_numpy for vectorization add in my test~~ [4.1 Introduction]: why add_python is faster than add_numpy for vectorization add Aug 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[4.1 Introduction]: why `add_python` is faster than `add_numpy` for vectorization `add` #74

[4.1 Introduction]: why `add_python` is faster than `add_numpy` for vectorization `add` #74

bingyao commented Aug 3, 2018 •

edited

Loading

rougier commented Aug 7, 2018

dwt commented Dec 13, 2018

dwt commented Dec 13, 2018

rougier commented Dec 17, 2018

dwt commented Dec 17, 2018

dwt commented Dec 17, 2018

rougier commented Dec 17, 2018

inamoto85 commented Feb 15, 2019

dwt commented Feb 15, 2019 •

edited

Loading

inamoto85 commented Feb 16, 2019

dr-neptune commented Jan 10, 2024

rougier commented Jan 22, 2024

dr-neptune commented Jan 22, 2024

rougier commented Jan 22, 2024

[4.1 Introduction]: why add_python is faster than add_numpy for vectorization add #74

[4.1 Introduction]: why add_python is faster than add_numpy for vectorization add #74

Comments

bingyao commented Aug 3, 2018 • edited Loading

rougier commented Aug 7, 2018

dwt commented Dec 13, 2018

dwt commented Dec 13, 2018

rougier commented Dec 17, 2018

dwt commented Dec 17, 2018

dwt commented Dec 17, 2018

rougier commented Dec 17, 2018

inamoto85 commented Feb 15, 2019

dwt commented Feb 15, 2019 • edited Loading

inamoto85 commented Feb 16, 2019

dr-neptune commented Jan 10, 2024

rougier commented Jan 22, 2024

dr-neptune commented Jan 22, 2024

rougier commented Jan 22, 2024

[4.1 Introduction]: why `add_python` is faster than `add_numpy` for vectorization `add` #74

[4.1 Introduction]: why `add_python` is faster than `add_numpy` for vectorization `add` #74

bingyao commented Aug 3, 2018 •

edited

Loading

dwt commented Feb 15, 2019 •

edited

Loading