Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numo::DFloat is much slower than the old narray library's NVector #198

Open
mame opened this issue Sep 15, 2021 · 3 comments
Open

Numo::DFloat is much slower than the old narray library's NVector #198

mame opened this issue Sep 15, 2021 · 3 comments

Comments

@mame
Copy link

mame commented Sep 15, 2021

I found that x.inplace + i is much slower than x.add!(i).

$ time ruby -rnarray -e 'x = NVector[0.0, 0.0, 0.0]; i = NVector[1.0, 1.0, 1.0]; 10000000.times { x.add!(i) }; p x'
NVector.float(3):
[ 1.0e+07, 1.0e+07, 1.0e+07 ]

real    0m1.190s
user    0m1.169s
sys     0m0.020s
$ time ruby -rnumo/narray -e 'x = Numo::DFloat[0.0, 0.0, 0.0]; i = Numo::DFloat[1.0, 1.0, 1.0]; xi = x.inplace; 10000000.times { xi + i }; p x'
Numo::DFloat#shape=[3]
[1e+07, 1e+07, 1e+07]

real    0m6.467s
user    0m6.454s
sys     0m0.012s

I have no idea if this is a bug, but @mrkn asked me to create a ticket.

@mrkn
Copy link
Contributor

mrkn commented Sep 15, 2021

I checked the reason of this performance issue by perf.
As a result, the main bottleneck is rb_check_typeddata.

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 4K of event 'cycles'
# Event count (approx.): 4044660818
#
# Overhead  Command          Shared Object       Symbol
# ........  ...............  ..................  ....................................................
#
     7.02%  ruby             libruby.so.3.0.2    [.] rb_check_typeddata
     4.35%  swapper          [kernel.kallsyms]   [k] 0xffffffff8f3faac9
     3.88%  ruby             narray.so           [.] ndloop_run
     3.65%  ruby             narray.so           [.] ndloop_set_stepidx.isra.0
     3.53%  ruby             narray.so           [.] ndloop_init_args.isra.0
     3.45%  ruby             libruby.so.3.0.2    [.] rb_obj_is_kind_of
     2.80%  ruby             libruby.so.3.0.2    [.] rb_typeddata_inherited_p
     2.69%  ruby             narray.so           [.] ndloop_alloc
     2.60%  ruby             libruby.so.3.0.2    [.] gc_sweep_step
     2.60%  ruby             narray.so           [.] ndloop_set_output_narray
     2.57%  ruby             libruby.so.3.0.2    [.] vm_exec_core
     2.38%  ruby             libruby.so.3.0.2    [.] vm_call0_body
     2.13%  ruby             libruby.so.3.0.2    [.] rb_funcallv
     1.52%  ruby             libruby.so.3.0.2    [.] ary_memcpy0
     1.49%  ruby             narray.so           [.] ndloop_release
     1.44%  ruby             libruby.so.3.0.2    [.] rb_gc_writebarrier
     1.36%  ruby             narray.so           [.] na_ndloop_main
     1.33%  ruby             libruby.so.3.0.2    [.] rb_yield_1
     1.30%  ruby             libruby.so.3.0.2    [.] rb_typeddata_inherited_p@plt
     1.19%  ruby             libruby.so.3.0.2    [.] ruby_yyparse
     1.17%  ruby             libruby.so.3.0.2    [.] rb_obj_class
     1.14%  ruby             narray.so           [.] na_release_lock
     1.10%  swapper          [kernel.kallsyms]   [k] 0xffffffff8f3fa754
     1.00%  ruby             narray.so           [.] loop_narray
     0.99%  ruby             libruby.so.3.0.2    [.] vm_call_cfunc_with_frame
     0.95%  ruby             libruby.so.3.0.2    [.] rb_ensure
     0.92%  ruby             narray.so           [.] nary_get_pointer_for_read_write
     0.87%  ruby             narray.so           [.] iter_dfloat_add
     0.84%  ruby             libruby.so.3.0.2    [.] rb_class_real
     0.81%  ruby             libruby.so.3.0.2    [.] ary_ensure_room_for_push
     0.81%  ruby             narray.so           [.] nary_get_pointer_for_read
     0.80%  ruby             libruby.so.3.0.2    [.] rb_wb_protected_newobj_of
     0.74%  ruby             narray.so           [.] na_ndloop
     0.73%  ruby             narray.so           [.] dfloat_add
     0.72%  ruby             libruby.so.3.0.2    [.] vm_yield_setup_args
     0.71%  ruby             libruby.so.3.0.2    [.] rb_ary_push
     0.71%  ruby             libruby.so.3.0.2    [.] rb_ary_tmp_new_from_values
     0.70%  ruby             libruby.so.3.0.2    [.] ruby_sized_xfree
     0.68%  ruby             narray.so           [.] nary_test_reduce
     0.65%  ruby             libruby.so.3.0.2    [.] rb_vm_exec
     0.65%  ruby             libruby.so.3.0.2    [.] rb_obj_alloc
     0.63%  ruby             libc-2.31.so        [.] malloc

@mrkn
Copy link
Contributor

mrkn commented Sep 15, 2021

I investigated with non-optimized ruby to show the full call stack. The result is:

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 6K of event 'cycles'
# Event count (approx.): 5610555644
#
# Children      Self  Command          Shared Object             Symbol
# ........  ........  ...............  ........................  ....................................................................
#
    66.33%     0.00%  ruby             [unknown]                 [.] 0x000055756b40b6f0
            |
            ---0x55756b40b6f0
               |
               |--60.11%--vm_call_cfunc_with_frame
               |          |
               |           --58.78%--dfloat_add
               |                     |
               |                      --58.54%--dfloat_add_self
               |                                |
               |                                 --57.81%--na_ndloop
               |                                           |
               |                                           |--41.13%--rb_ensure
               |                                           |          |
               |                                           |          |--33.67%--ndloop_run
               |                                           |          |          |
               |                                           |          |          |--12.31%--ndloop_set_output
               |                                           |          |          |          |
               |                                           |          |          |           --10.42%--ndloop_set_output_narray
               |                                           |          |          |                     |
               |                                           |          |          |                     |--5.20%--ndloop_set_stepidx
               |                                           |          |          |                     |          |
               |                                           |          |          |                     |           --4.58%--nary_get_pointer_for_read_write
               |                                           |          |          |                     |                     |
               |                                           |          |          |                     |                      --3.45%--na_get_pointer_for_rw
               |                                           |          |          |                     |                                |
               |                                           |          |          |                     |                                 --2.22%--RB_OBJ_FROZEN
               |                                           |          |          |                     |                                           |
               |                                           |          |          |                     |                                            --0.92%--RB_TYPE_P
               |                                           |          |          |                     |                                                      |
               |                                           |          |          |                     |                                                       --0.82%--rb_type
               |                                           |          |          |                     |
               |                                           |          |          |                     |--1.78%--ndloop_find_inplace
               |                                           |          |          |                     |
               |                                           |          |          |                      --1.14%--rbimpl_size_mul_or_raise
               |                                           |          |          |
               |                                           |          |          |--7.89%--ndloop_init_args
               |                                           |          |          |          |
               |                                           |          |          |          |--1.91%--ndloop_set_stepidx
               |                                           |          |          |          |          |
               |                                           |          |          |          |           --1.13%--nary_get_pointer_for_read
               |                                           |          |          |          |
               |                                           |          |          |          |--0.78%--ndfunc_set_bufcp
               |                                           |          |          |          |
               |                                           |          |          |          |--0.66%--rb_type
               |                                           |          |          |          |
               |                                           |          |          |          |--0.65%--rbimpl_size_mul_or_raise
               |                                           |          |          |          |
               |                                           |          |          |          |--0.64%--iter_dfloat_add
               |                                           |          |          |          |
               |                                           |          |          |           --0.52%--ndfunc_set_user_loop
               |                                           |          |          |
               |                                           |          |          |--2.55%--loop_narray
               |                                           |          |          |          |
               |                                           |          |          |           --0.61%--iter_dfloat_add
               |                                           |          |          |
               |                                           |          |          |--0.93%--ndloop_cast_args
               |                                           |          |          |
               |                                           |          |           --0.59%--ndloop_alloc
               |                                           |          |
               |                                           |          |--2.45%--ndloop_release
               |                                           |          |          |
               |                                           |          |           --1.67%--na_release_lock
               |                                           |          |                     |
               |                                           |          |                     |--0.76%--na_release_lock
               |                                           |          |                     |          |
               |                                           |          |                     |           --0.64%--ndloop_alloc
               |                                           |          |                     |
               |                                           |          |                      --0.62%--ndloop_alloc
               |                                           |          |
               |                                           |           --1.82%--ndloop_alloc
               |                                           |
               |                                           |--13.89%--na_ndloop_main
               |                                           |          |
               |                                           |          |--4.67%--ndloop_alloc
               |                                           |          |          |
               |                                           |          |           --2.46%--ndloop_find_max_dimension
               |                                           |          |                     |
               |                                           |          |                      --0.72%--rb_array_len
               |                                           |          |
               |                                           |          |--2.11%--ndloop_cast_args
               |                                           |          |          |
               |                                           |          |           --0.85%--rb_type
               |                                           |          |
               |                                           |          |--0.93%--ndloop_set_output_narray
               |                                           |          |
               |                                           |          |--0.60%--ndloop_set_stepidx
               |                                           |          |
               |                                           |           --0.55%--ndloop_find_inplace
               |                                           |
               |                                            --0.93%--rbimpl_size_mul_or_raise
               |
                --0.95%--ndloop_alloc

@mrkn
Copy link
Contributor

mrkn commented Sep 15, 2021

From the result with non-optimized ruby, rb_check_typeddata doesn't seem to be the main bottleneck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants