Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix buffer index with FCD #692

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

[BugFix] Fix buffer index with FCD #692

wants to merge 2 commits into from

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 25, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 25, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 126. Improved: $\large\color{#35bf28}32$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 38.5820μs 16.9390μs 59.0355 KOps/s 56.7014 KOps/s $\color{#35bf28}+4.12\%$
test_plain_set_stack_nested 62.0160μs 16.8238μs 59.4398 KOps/s 55.7812 KOps/s $\textbf{\color{#35bf28}+6.56\%}$
test_plain_set_nested_inplace 53.4090μs 19.2936μs 51.8306 KOps/s 48.7430 KOps/s $\textbf{\color{#35bf28}+6.33\%}$
test_plain_set_stack_nested_inplace 45.8860μs 19.1518μs 52.2144 KOps/s 49.1715 KOps/s $\textbf{\color{#35bf28}+6.19\%}$
test_items 43.7610μs 2.3919μs 418.0775 KOps/s 408.6106 KOps/s $\color{#35bf28}+2.32\%$
test_items_nested 0.8285ms 0.2653ms 3.7694 KOps/s 3.6932 KOps/s $\color{#35bf28}+2.06\%$
test_items_nested_locked 0.3440ms 0.2681ms 3.7296 KOps/s 3.6849 KOps/s $\color{#35bf28}+1.21\%$
test_items_nested_leaf 0.4873ms 0.1674ms 5.9727 KOps/s 5.9085 KOps/s $\color{#35bf28}+1.09\%$
test_items_stack_nested 3.7413ms 0.2690ms 3.7174 KOps/s 3.6359 KOps/s $\color{#35bf28}+2.24\%$
test_items_stack_nested_leaf 0.3218ms 0.1660ms 6.0240 KOps/s 5.9232 KOps/s $\color{#35bf28}+1.70\%$
test_items_stack_nested_locked 0.3188ms 0.2676ms 3.7364 KOps/s 3.5969 KOps/s $\color{#35bf28}+3.88\%$
test_keys 41.3770μs 3.9522μs 253.0236 KOps/s 258.7463 KOps/s $\color{#d91a1a}-2.21\%$
test_keys_nested 2.1211ms 0.1493ms 6.6973 KOps/s 6.3285 KOps/s $\textbf{\color{#35bf28}+5.83\%}$
test_keys_nested_locked 0.2863ms 0.1550ms 6.4511 KOps/s 6.1961 KOps/s $\color{#35bf28}+4.12\%$
test_keys_nested_leaf 38.4394ms 0.1358ms 7.3659 KOps/s 7.4292 KOps/s $\color{#d91a1a}-0.85\%$
test_keys_stack_nested 0.2493ms 0.1502ms 6.6590 KOps/s 6.2858 KOps/s $\textbf{\color{#35bf28}+5.94\%}$
test_keys_stack_nested_leaf 0.2410ms 0.1306ms 7.6595 KOps/s 7.3240 KOps/s $\color{#35bf28}+4.58\%$
test_keys_stack_nested_locked 0.2734ms 0.1551ms 6.4490 KOps/s 6.0887 KOps/s $\textbf{\color{#35bf28}+5.92\%}$
test_values 10.6100μs 1.1724μs 852.9867 KOps/s 861.0411 KOps/s $\color{#d91a1a}-0.94\%$
test_values_nested 0.1002ms 52.1801μs 19.1644 KOps/s 19.0574 KOps/s $\color{#35bf28}+0.56\%$
test_values_nested_locked 0.1014ms 52.6940μs 18.9775 KOps/s 19.3843 KOps/s $\color{#d91a1a}-2.10\%$
test_values_nested_leaf 89.9970μs 46.5908μs 21.4635 KOps/s 21.0204 KOps/s $\color{#35bf28}+2.11\%$
test_values_stack_nested 96.8810μs 52.3898μs 19.0877 KOps/s 18.7187 KOps/s $\color{#35bf28}+1.97\%$
test_values_stack_nested_leaf 89.9280μs 46.3479μs 21.5760 KOps/s 21.2044 KOps/s $\color{#35bf28}+1.75\%$
test_values_stack_nested_locked 0.1008ms 53.0596μs 18.8467 KOps/s 19.1789 KOps/s $\color{#d91a1a}-1.73\%$
test_membership 22.8430μs 1.3387μs 747.0207 KOps/s 729.5480 KOps/s $\color{#35bf28}+2.40\%$
test_membership_nested 40.0840μs 3.4413μs 290.5859 KOps/s 291.7394 KOps/s $\color{#d91a1a}-0.40\%$
test_membership_nested_leaf 25.1570μs 3.4304μs 291.5084 KOps/s 285.5407 KOps/s $\color{#35bf28}+2.09\%$
test_membership_stacked_nested 38.7720μs 3.4105μs 293.2152 KOps/s 289.2432 KOps/s $\color{#35bf28}+1.37\%$
test_membership_stacked_nested_leaf 23.1830μs 3.4710μs 288.0976 KOps/s 273.1714 KOps/s $\textbf{\color{#35bf28}+5.46\%}$
test_membership_nested_last 51.2150μs 6.6205μs 151.0449 KOps/s 150.3525 KOps/s $\color{#35bf28}+0.46\%$
test_membership_nested_leaf_last 32.2600μs 6.6339μs 150.7416 KOps/s 150.5895 KOps/s $\color{#35bf28}+0.10\%$
test_membership_stacked_nested_last 43.7120μs 6.5524μs 152.6152 KOps/s 140.2519 KOps/s $\textbf{\color{#35bf28}+8.82\%}$
test_membership_stacked_nested_leaf_last 25.9490μs 6.5855μs 151.8491 KOps/s 138.9692 KOps/s $\textbf{\color{#35bf28}+9.27\%}$
test_nested_getleaf 51.9370μs 10.3857μs 96.2859 KOps/s 86.4352 KOps/s $\textbf{\color{#35bf28}+11.40\%}$
test_nested_get 49.1210μs 9.8393μs 101.6328 KOps/s 91.9636 KOps/s $\textbf{\color{#35bf28}+10.51\%}$
test_stacked_getleaf 36.6980μs 10.3005μs 97.0831 KOps/s 87.8506 KOps/s $\textbf{\color{#35bf28}+10.51\%}$
test_stacked_get 54.9330μs 9.8505μs 101.5173 KOps/s 92.9152 KOps/s $\textbf{\color{#35bf28}+9.26\%}$
test_nested_getitemleaf 57.4270μs 11.9793μs 83.4771 KOps/s 76.6646 KOps/s $\textbf{\color{#35bf28}+8.89\%}$
test_nested_getitem 31.8400μs 11.3594μs 88.0324 KOps/s 79.5887 KOps/s $\textbf{\color{#35bf28}+10.61\%}$
test_stacked_getitemleaf 56.9160μs 11.8376μs 84.4767 KOps/s 77.6431 KOps/s $\textbf{\color{#35bf28}+8.80\%}$
test_stacked_getitem 48.6110μs 11.4198μs 87.5676 KOps/s 81.5645 KOps/s $\textbf{\color{#35bf28}+7.36\%}$
test_lock_nested 0.7174ms 0.3323ms 3.0093 KOps/s 2.9207 KOps/s $\color{#35bf28}+3.03\%$
test_lock_stack_nested 0.4460ms 0.2973ms 3.3633 KOps/s 3.2817 KOps/s $\color{#35bf28}+2.49\%$
test_unlock_nested 93.7485ms 0.4301ms 2.3251 KOps/s 2.3435 KOps/s $\color{#d91a1a}-0.78\%$
test_unlock_stack_nested 0.4710ms 0.3065ms 3.2625 KOps/s 3.2001 KOps/s $\color{#35bf28}+1.95\%$
test_flatten_speed 0.6861ms 0.3648ms 2.7411 KOps/s 2.6766 KOps/s $\color{#35bf28}+2.41\%$
test_unflatten_speed 1.2365ms 0.4494ms 2.2253 KOps/s 2.1345 KOps/s $\color{#35bf28}+4.26\%$
test_common_ops 4.8766ms 0.6862ms 1.4572 KOps/s 1.3842 KOps/s $\textbf{\color{#35bf28}+5.28\%}$
test_creation 19.1160μs 1.8305μs 546.2850 KOps/s 541.8700 KOps/s $\color{#35bf28}+0.81\%$
test_creation_empty 47.7390μs 9.9182μs 100.8249 KOps/s 93.8052 KOps/s $\textbf{\color{#35bf28}+7.48\%}$
test_creation_nested_1 57.1160μs 12.5586μs 79.6268 KOps/s 74.1373 KOps/s $\textbf{\color{#35bf28}+7.40\%}$
test_creation_nested_2 36.7690μs 15.6117μs 64.0545 KOps/s 59.5939 KOps/s $\textbf{\color{#35bf28}+7.49\%}$
test_clone 56.9160μs 13.3155μs 75.1005 KOps/s 72.3593 KOps/s $\color{#35bf28}+3.79\%$
test_getitem[int] 30.6270μs 11.1799μs 89.4460 KOps/s 89.1572 KOps/s $\color{#35bf28}+0.32\%$
test_getitem[slice_int] 83.0460μs 22.6603μs 44.1301 KOps/s 44.0286 KOps/s $\color{#35bf28}+0.23\%$
test_getitem[range] 0.1419ms 41.2908μs 24.2185 KOps/s 23.3773 KOps/s $\color{#35bf28}+3.60\%$
test_getitem[tuple] 0.1213ms 18.2875μs 54.6821 KOps/s 54.1100 KOps/s $\color{#35bf28}+1.06\%$
test_getitem[list] 0.1034ms 36.2460μs 27.5893 KOps/s 26.3893 KOps/s $\color{#35bf28}+4.55\%$
test_setitem_dim[int] 75.3910μs 29.0719μs 34.3975 KOps/s 29.5527 KOps/s $\textbf{\color{#35bf28}+16.39\%}$
test_setitem_dim[slice_int] 88.9560μs 54.5614μs 18.3280 KOps/s 17.0162 KOps/s $\textbf{\color{#35bf28}+7.71\%}$
test_setitem_dim[range] 0.1289ms 74.5283μs 13.4177 KOps/s 12.8426 KOps/s $\color{#35bf28}+4.48\%$
test_setitem_dim[tuple] 83.9270μs 44.4402μs 22.5021 KOps/s 21.0493 KOps/s $\textbf{\color{#35bf28}+6.90\%}$
test_setitem 0.1120ms 19.6384μs 50.9206 KOps/s 49.2293 KOps/s $\color{#35bf28}+3.44\%$
test_set 77.9860μs 18.9222μs 52.8480 KOps/s 50.3788 KOps/s $\color{#35bf28}+4.90\%$
test_set_shared 1.9048ms 0.1416ms 7.0604 KOps/s 7.0690 KOps/s $\color{#d91a1a}-0.12\%$
test_update 0.1288ms 21.5576μs 46.3874 KOps/s 42.8087 KOps/s $\textbf{\color{#35bf28}+8.36\%}$
test_update_nested 99.3150μs 28.7489μs 34.7840 KOps/s 32.5313 KOps/s $\textbf{\color{#35bf28}+6.92\%}$
test_set_nested 0.1251ms 20.5945μs 48.5566 KOps/s 45.8319 KOps/s $\textbf{\color{#35bf28}+5.95\%}$
test_set_nested_new 81.6230μs 24.4984μs 40.8190 KOps/s 38.1270 KOps/s $\textbf{\color{#35bf28}+7.06\%}$
test_select 0.1106ms 37.4101μs 26.7307 KOps/s 25.0654 KOps/s $\textbf{\color{#35bf28}+6.64\%}$
test_select_nested 0.1094ms 57.8043μs 17.2998 KOps/s 16.9614 KOps/s $\color{#35bf28}+2.00\%$
test_exclude_nested 0.2490ms 0.1181ms 8.4662 KOps/s 8.3922 KOps/s $\color{#35bf28}+0.88\%$
test_empty[True] 0.5789ms 0.4114ms 2.4306 KOps/s 2.4376 KOps/s $\color{#d91a1a}-0.29\%$
test_empty[False] 8.8926μs 1.0355μs 965.6818 KOps/s 935.1989 KOps/s $\color{#35bf28}+3.26\%$
test_unbind_speed 0.4402ms 0.2399ms 4.1679 KOps/s 3.8727 KOps/s $\textbf{\color{#35bf28}+7.62\%}$
test_unbind_speed_stack0 0.4538ms 0.2398ms 4.1700 KOps/s 4.0854 KOps/s $\color{#35bf28}+2.07\%$
test_unbind_speed_stack1 1.1910ms 0.5919ms 1.6895 KOps/s 1.4626 KOps/s $\textbf{\color{#35bf28}+15.51\%}$
test_split 0.1320s 1.6652ms 600.5431 Ops/s 605.9591 Ops/s $\color{#d91a1a}-0.89\%$
test_chunk 2.2364ms 1.4723ms 679.1877 Ops/s 678.3138 Ops/s $\color{#35bf28}+0.13\%$
test_creation[device0] 0.1741ms 0.1040ms 9.6181 KOps/s 9.4911 KOps/s $\color{#35bf28}+1.34\%$
test_creation_from_tensor 4.2664ms 84.0610μs 11.8961 KOps/s 11.9581 KOps/s $\color{#d91a1a}-0.52\%$
test_add_one[memmap_tensor0] 0.1100ms 5.6615μs 176.6326 KOps/s 186.3369 KOps/s $\textbf{\color{#d91a1a}-5.21\%}$
test_contiguous[memmap_tensor0] 20.2180μs 0.6326μs 1.5809 MOps/s 1.5460 MOps/s $\color{#35bf28}+2.25\%$
test_stack[memmap_tensor0] 33.5130μs 3.7718μs 265.1289 KOps/s 278.4280 KOps/s $\color{#d91a1a}-4.78\%$
test_memmaptd_index 1.1098ms 0.2491ms 4.0144 KOps/s 4.1878 KOps/s $\color{#d91a1a}-4.14\%$
test_memmaptd_index_astensor 0.5276ms 0.3058ms 3.2703 KOps/s 3.3193 KOps/s $\color{#d91a1a}-1.48\%$
test_memmaptd_index_op 1.1092ms 0.5990ms 1.6694 KOps/s 1.6662 KOps/s $\color{#35bf28}+0.19\%$
test_serialize_model 0.2183s 0.1143s 8.7464 Ops/s 8.4547 Ops/s $\color{#35bf28}+3.45\%$
test_serialize_model_pickle 0.4475s 0.3734s 2.6780 Ops/s 2.6112 Ops/s $\color{#35bf28}+2.56\%$
test_serialize_weights 0.1076s 97.9361ms 10.2107 Ops/s 9.9464 Ops/s $\color{#35bf28}+2.66\%$
test_serialize_weights_returnearly 0.2339s 0.1360s 7.3506 Ops/s 8.0434 Ops/s $\textbf{\color{#d91a1a}-8.61\%}$
test_serialize_weights_pickle 1.0414s 0.6068s 1.6480 Ops/s 2.3852 Ops/s $\textbf{\color{#d91a1a}-30.91\%}$
test_serialize_weights_filesystem 96.8268ms 92.1685ms 10.8497 Ops/s 9.4387 Ops/s $\textbf{\color{#35bf28}+14.95\%}$
test_serialize_model_filesystem 97.8732ms 92.0863ms 10.8594 Ops/s 10.6407 Ops/s $\color{#35bf28}+2.06\%$
test_reshape_pytree 48.2600μs 21.0175μs 47.5794 KOps/s 47.1792 KOps/s $\color{#35bf28}+0.85\%$
test_reshape_td 79.4480μs 30.9916μs 32.2668 KOps/s 31.8004 KOps/s $\color{#35bf28}+1.47\%$
test_view_pytree 60.9940μs 21.0176μs 47.5792 KOps/s 47.9969 KOps/s $\color{#d91a1a}-0.87\%$
test_view_td 0.1289s 60.0912μs 16.6414 KOps/s 16.2458 KOps/s $\color{#35bf28}+2.43\%$
test_unbind_pytree 52.2780μs 24.3807μs 41.0161 KOps/s 40.6037 KOps/s $\color{#35bf28}+1.02\%$
test_unbind_td 0.1235ms 36.0927μs 27.7064 KOps/s 27.0882 KOps/s $\color{#35bf28}+2.28\%$
test_split_pytree 57.5070μs 24.0896μs 41.5117 KOps/s 41.4507 KOps/s $\color{#35bf28}+0.15\%$
test_split_td 0.1213ms 40.0858μs 24.9465 KOps/s 24.8726 KOps/s $\color{#35bf28}+0.30\%$
test_add_pytree 73.0360μs 30.3409μs 32.9588 KOps/s 33.0410 KOps/s $\color{#d91a1a}-0.25\%$
test_add_td 0.1112ms 51.7533μs 19.3224 KOps/s 18.4594 KOps/s $\color{#35bf28}+4.68\%$
test_distributed 0.2011ms 0.1017ms 9.8287 KOps/s 9.7776 KOps/s $\color{#35bf28}+0.52\%$
test_tdmodule 0.1741ms 22.4550μs 44.5336 KOps/s 43.4778 KOps/s $\color{#35bf28}+2.43\%$
test_tdmodule_dispatch 0.1830ms 44.0319μs 22.7108 KOps/s 22.7511 KOps/s $\color{#d91a1a}-0.18\%$
test_tdseq 0.3868ms 27.4864μs 36.3816 KOps/s 38.2560 KOps/s $\color{#d91a1a}-4.90\%$
test_tdseq_dispatch 0.1360ms 46.4145μs 21.5450 KOps/s 20.5767 KOps/s $\color{#35bf28}+4.71\%$
test_instantiation_functorch 2.1921ms 1.2990ms 769.8069 Ops/s 764.1743 Ops/s $\color{#35bf28}+0.74\%$
test_instantiation_td 1.4659ms 0.9933ms 1.0067 KOps/s 1.0016 KOps/s $\color{#35bf28}+0.51\%$
test_exec_functorch 0.2938ms 0.1617ms 6.1849 KOps/s 6.2573 KOps/s $\color{#d91a1a}-1.16\%$
test_exec_functional_call 0.3050ms 0.1511ms 6.6161 KOps/s 6.6287 KOps/s $\color{#d91a1a}-0.19\%$
test_exec_td 0.2631ms 0.1492ms 6.7044 KOps/s 6.8271 KOps/s $\color{#d91a1a}-1.80\%$
test_exec_td_decorator 0.7006ms 0.1970ms 5.0765 KOps/s 5.0923 KOps/s $\color{#d91a1a}-0.31\%$
test_vmap_mlp_speed[True-True] 0.6893ms 0.4763ms 2.0995 KOps/s 2.0870 KOps/s $\color{#35bf28}+0.60\%$
test_vmap_mlp_speed[True-False] 0.6399ms 0.4740ms 2.1097 KOps/s 2.1049 KOps/s $\color{#35bf28}+0.23\%$
test_vmap_mlp_speed[False-True] 0.6626ms 0.3888ms 2.5717 KOps/s 2.5713 KOps/s $\color{#35bf28}+0.01\%$
test_vmap_mlp_speed[False-False] 0.7228ms 0.3898ms 2.5654 KOps/s 2.5670 KOps/s $\color{#d91a1a}-0.06\%$
test_vmap_mlp_speed_decorator[True-True] 1.1342ms 0.5243ms 1.9074 KOps/s 1.8935 KOps/s $\color{#35bf28}+0.73\%$
test_vmap_mlp_speed_decorator[True-False] 0.9175ms 0.5257ms 1.9024 KOps/s 1.8969 KOps/s $\color{#35bf28}+0.29\%$
test_vmap_mlp_speed_decorator[False-True] 0.6705ms 0.4040ms 2.4754 KOps/s 2.4904 KOps/s $\color{#d91a1a}-0.60\%$
test_vmap_mlp_speed_decorator[False-False] 0.7039ms 0.4034ms 2.4791 KOps/s 2.4879 KOps/s $\color{#d91a1a}-0.36\%$
test_to_module_speed[True] 2.1880ms 1.3790ms 725.1665 Ops/s 726.2517 Ops/s $\color{#d91a1a}-0.15\%$
test_to_module_speed[False] 2.1109ms 1.3402ms 746.1451 Ops/s 738.0619 Ops/s $\color{#35bf28}+1.10\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 134. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.8770ms 13.4224μs 74.5021 KOps/s 78.1930 KOps/s $\color{#d91a1a}-4.72\%$
test_plain_set_stack_nested 36.6510μs 13.4359μs 74.4274 KOps/s 77.7855 KOps/s $\color{#d91a1a}-4.32\%$
test_plain_set_nested_inplace 44.2000μs 14.8122μs 67.5121 KOps/s 70.5694 KOps/s $\color{#d91a1a}-4.33\%$
test_plain_set_stack_nested_inplace 32.2410μs 14.7580μs 67.7599 KOps/s 70.2817 KOps/s $\color{#d91a1a}-3.59\%$
test_items 29.3700μs 4.7502μs 210.5197 KOps/s 211.5474 KOps/s $\color{#d91a1a}-0.49\%$
test_items_nested 0.4245ms 0.3460ms 2.8903 KOps/s 2.9379 KOps/s $\color{#d91a1a}-1.62\%$
test_items_nested_locked 0.4261ms 0.3496ms 2.8608 KOps/s 2.9258 KOps/s $\color{#d91a1a}-2.22\%$
test_items_nested_leaf 0.2578ms 0.2070ms 4.8315 KOps/s 4.9824 KOps/s $\color{#d91a1a}-3.03\%$
test_items_stack_nested 0.5875ms 0.3488ms 2.8669 KOps/s 2.9524 KOps/s $\color{#d91a1a}-2.90\%$
test_items_stack_nested_leaf 0.2573ms 0.2036ms 4.9115 KOps/s 4.9572 KOps/s $\color{#d91a1a}-0.92\%$
test_items_stack_nested_locked 0.4380ms 0.3500ms 2.8572 KOps/s 2.9161 KOps/s $\color{#d91a1a}-2.02\%$
test_keys 19.4600μs 4.6871μs 213.3517 KOps/s 219.2522 KOps/s $\color{#d91a1a}-2.69\%$
test_keys_nested 48.7730ms 0.1046ms 9.5633 KOps/s 10.5742 KOps/s $\textbf{\color{#d91a1a}-9.56\%}$
test_keys_nested_locked 0.1526ms 99.5582μs 10.0444 KOps/s 10.2353 KOps/s $\color{#d91a1a}-1.87\%$
test_keys_nested_leaf 0.1178ms 78.5439μs 12.7317 KOps/s 12.7836 KOps/s $\color{#d91a1a}-0.41\%$
test_keys_stack_nested 0.2889ms 94.8714μs 10.5406 KOps/s 10.6930 KOps/s $\color{#d91a1a}-1.42\%$
test_keys_stack_nested_leaf 0.1135ms 78.1040μs 12.8034 KOps/s 12.8228 KOps/s $\color{#d91a1a}-0.15\%$
test_keys_stack_nested_locked 0.1371ms 0.1002ms 9.9765 KOps/s 10.0744 KOps/s $\color{#d91a1a}-0.97\%$
test_values 9.8570μs 1.8935μs 528.1194 KOps/s 530.2980 KOps/s $\color{#d91a1a}-0.41\%$
test_values_nested 73.7310μs 46.0462μs 21.7173 KOps/s 22.1717 KOps/s $\color{#d91a1a}-2.05\%$
test_values_nested_locked 79.0420μs 48.1759μs 20.7572 KOps/s 21.0618 KOps/s $\color{#d91a1a}-1.45\%$
test_values_nested_leaf 65.8100μs 40.2523μs 24.8433 KOps/s 25.2473 KOps/s $\color{#d91a1a}-1.60\%$
test_values_stack_nested 0.1075ms 46.0517μs 21.7147 KOps/s 21.9916 KOps/s $\color{#d91a1a}-1.26\%$
test_values_stack_nested_leaf 72.4020μs 40.1778μs 24.8893 KOps/s 25.2043 KOps/s $\color{#d91a1a}-1.25\%$
test_values_stack_nested_locked 82.7810μs 48.3635μs 20.6768 KOps/s 21.0372 KOps/s $\color{#d91a1a}-1.71\%$
test_membership 59.3188μs 0.9673μs 1.0339 MOps/s 1.0360 MOps/s $\color{#d91a1a}-0.21\%$
test_membership_nested 16.8610μs 2.8865μs 346.4385 KOps/s 351.2538 KOps/s $\color{#d91a1a}-1.37\%$
test_membership_nested_leaf 35.6700μs 2.8954μs 345.3720 KOps/s 349.7677 KOps/s $\color{#d91a1a}-1.26\%$
test_membership_stacked_nested 25.4800μs 2.9400μs 340.1369 KOps/s 347.5046 KOps/s $\color{#d91a1a}-2.12\%$
test_membership_stacked_nested_leaf 27.5410μs 2.8977μs 345.1043 KOps/s 344.5890 KOps/s $\color{#35bf28}+0.15\%$
test_membership_nested_last 39.9110μs 5.2493μs 190.5003 KOps/s 191.8349 KOps/s $\color{#d91a1a}-0.70\%$
test_membership_nested_leaf_last 34.3300μs 5.3640μs 186.4279 KOps/s 190.9364 KOps/s $\color{#d91a1a}-2.36\%$
test_membership_stacked_nested_last 25.9400μs 5.3283μs 187.6777 KOps/s 191.0843 KOps/s $\color{#d91a1a}-1.78\%$
test_membership_stacked_nested_leaf_last 20.6710μs 5.2841μs 189.2454 KOps/s 190.3242 KOps/s $\color{#d91a1a}-0.57\%$
test_nested_getleaf 42.0210μs 8.5826μs 116.5142 KOps/s 119.5986 KOps/s $\color{#d91a1a}-2.58\%$
test_nested_get 41.7810μs 8.0253μs 124.6063 KOps/s 126.7116 KOps/s $\color{#d91a1a}-1.66\%$
test_stacked_getleaf 0.2426ms 8.4692μs 118.0751 KOps/s 119.5230 KOps/s $\color{#d91a1a}-1.21\%$
test_stacked_get 36.3010μs 8.0178μs 124.7231 KOps/s 126.0802 KOps/s $\color{#d91a1a}-1.08\%$
test_nested_getitemleaf 27.3710μs 9.9211μs 100.7953 KOps/s 102.4575 KOps/s $\color{#d91a1a}-1.62\%$
test_nested_getitem 37.5910μs 9.4828μs 105.4541 KOps/s 107.7622 KOps/s $\color{#d91a1a}-2.14\%$
test_stacked_getitemleaf 41.6300μs 9.9340μs 100.6641 KOps/s 102.1984 KOps/s $\color{#d91a1a}-1.50\%$
test_stacked_getitem 24.0200μs 9.4562μs 105.7506 KOps/s 107.5171 KOps/s $\color{#d91a1a}-1.64\%$
test_lock_nested 2.2207ms 0.3546ms 2.8198 KOps/s 2.8109 KOps/s $\color{#35bf28}+0.32\%$
test_lock_stack_nested 0.3962ms 0.3101ms 3.2253 KOps/s 3.2639 KOps/s $\color{#d91a1a}-1.18\%$
test_unlock_nested 0.7835ms 0.3530ms 2.8325 KOps/s 2.8749 KOps/s $\color{#d91a1a}-1.47\%$
test_unlock_stack_nested 0.4052ms 0.3204ms 3.1208 KOps/s 3.1662 KOps/s $\color{#d91a1a}-1.44\%$
test_flatten_speed 0.5072ms 0.2643ms 3.7833 KOps/s 3.9226 KOps/s $\color{#d91a1a}-3.55\%$
test_unflatten_speed 0.4355ms 0.3640ms 2.7471 KOps/s 2.8317 KOps/s $\color{#d91a1a}-2.99\%$
test_common_ops 1.0337ms 0.5804ms 1.7230 KOps/s 1.7989 KOps/s $\color{#d91a1a}-4.22\%$
test_creation 16.8300μs 1.5704μs 636.7906 KOps/s 656.3078 KOps/s $\color{#d91a1a}-2.97\%$
test_creation_empty 25.4000μs 7.3656μs 135.7659 KOps/s 154.4985 KOps/s $\textbf{\color{#d91a1a}-12.12\%}$
test_creation_nested_1 41.2710μs 9.1935μs 108.7728 KOps/s 120.8075 KOps/s $\textbf{\color{#d91a1a}-9.96\%}$
test_creation_nested_2 37.2110μs 11.7078μs 85.4129 KOps/s 93.5162 KOps/s $\textbf{\color{#d91a1a}-8.67\%}$
test_clone 63.0200μs 13.3769μs 74.7558 KOps/s 74.8502 KOps/s $\color{#d91a1a}-0.13\%$
test_getitem[int] 26.4700μs 10.9667μs 91.1848 KOps/s 93.5720 KOps/s $\color{#d91a1a}-2.55\%$
test_getitem[slice_int] 52.1420μs 21.2035μs 47.1619 KOps/s 47.9507 KOps/s $\color{#d91a1a}-1.64\%$
test_getitem[range] 69.4310μs 50.8170μs 19.6785 KOps/s 19.4656 KOps/s $\color{#35bf28}+1.09\%$
test_getitem[tuple] 42.6700μs 19.0513μs 52.4898 KOps/s 53.5732 KOps/s $\color{#d91a1a}-2.02\%$
test_getitem[list] 0.3418ms 37.6328μs 26.5726 KOps/s 27.5434 KOps/s $\color{#d91a1a}-3.52\%$
test_setitem_dim[int] 44.5410μs 24.6166μs 40.6230 KOps/s 40.6435 KOps/s $\color{#d91a1a}-0.05\%$
test_setitem_dim[slice_int] 64.8900μs 45.6861μs 21.8885 KOps/s 22.3169 KOps/s $\color{#d91a1a}-1.92\%$
test_setitem_dim[range] 98.8010μs 66.2030μs 15.1051 KOps/s 15.6350 KOps/s $\color{#d91a1a}-3.39\%$
test_setitem_dim[tuple] 66.8500μs 39.4714μs 25.3348 KOps/s 25.7817 KOps/s $\color{#d91a1a}-1.73\%$
test_setitem 54.2620μs 17.6453μs 56.6724 KOps/s 57.3688 KOps/s $\color{#d91a1a}-1.21\%$
test_set 52.7920μs 17.2991μs 57.8065 KOps/s 59.5022 KOps/s $\color{#d91a1a}-2.85\%$
test_set_shared 0.1418s 0.1330ms 7.5193 KOps/s 9.8297 KOps/s $\textbf{\color{#d91a1a}-23.50\%}$
test_update 72.2010μs 19.0033μs 52.6224 KOps/s 55.3112 KOps/s $\color{#d91a1a}-4.86\%$
test_update_nested 86.4310μs 26.0348μs 38.4101 KOps/s 40.8205 KOps/s $\textbf{\color{#d91a1a}-5.90\%}$
test_set_nested 65.6320μs 18.5147μs 54.0113 KOps/s 55.4836 KOps/s $\color{#d91a1a}-2.65\%$
test_set_nested_new 65.9720μs 21.3285μs 46.8857 KOps/s 46.3385 KOps/s $\color{#35bf28}+1.18\%$
test_select 87.1420μs 34.1070μs 29.3195 KOps/s 29.2893 KOps/s $\color{#35bf28}+0.10\%$
test_select_nested 89.3410μs 53.9127μs 18.5485 KOps/s 19.0946 KOps/s $\color{#d91a1a}-2.86\%$
test_exclude_nested 0.3653ms 0.1163ms 8.5977 KOps/s 8.8456 KOps/s $\color{#d91a1a}-2.80\%$
test_empty[True] 1.1409ms 0.3948ms 2.5332 KOps/s 2.5812 KOps/s $\color{#d91a1a}-1.86\%$
test_empty[False] 3.4461μs 0.8991μs 1.1122 MOps/s 1.1641 MOps/s $\color{#d91a1a}-4.46\%$
test_to 94.7010μs 59.6330μs 16.7692 KOps/s 18.1472 KOps/s $\textbf{\color{#d91a1a}-7.59\%}$
test_to_nonblocking 70.7310μs 33.4956μs 29.8546 KOps/s 29.4713 KOps/s $\color{#35bf28}+1.30\%$
test_unbind_speed 0.2990ms 0.2610ms 3.8319 KOps/s 3.7837 KOps/s $\color{#35bf28}+1.27\%$
test_unbind_speed_stack0 0.3232ms 0.2626ms 3.8084 KOps/s 3.8031 KOps/s $\color{#35bf28}+0.14\%$
test_unbind_speed_stack1 0.1471s 0.7834ms 1.2765 KOps/s 1.2996 KOps/s $\color{#d91a1a}-1.78\%$
test_split 1.6395ms 1.5480ms 646.0063 Ops/s 643.0103 Ops/s $\color{#35bf28}+0.47\%$
test_chunk 1.6199ms 1.5451ms 647.2237 Ops/s 645.2759 Ops/s $\color{#35bf28}+0.30\%$
test_creation[device0] 0.1732ms 73.5901μs 13.5888 KOps/s 13.7923 KOps/s $\color{#d91a1a}-1.48\%$
test_creation_from_tensor 0.1385ms 55.3728μs 18.0594 KOps/s 18.7800 KOps/s $\color{#d91a1a}-3.84\%$
test_add_one[memmap_tensor0] 87.5320μs 6.3674μs 157.0497 KOps/s 149.5407 KOps/s $\textbf{\color{#35bf28}+5.02\%}$
test_contiguous[memmap_tensor0] 17.6000μs 0.6588μs 1.5179 MOps/s 1.5489 MOps/s $\color{#d91a1a}-2.00\%$
test_stack[memmap_tensor0] 27.8700μs 4.2640μs 234.5232 KOps/s 222.6067 KOps/s $\textbf{\color{#35bf28}+5.35\%}$
test_memmaptd_index 1.2060ms 0.2525ms 3.9602 KOps/s 3.8146 KOps/s $\color{#35bf28}+3.82\%$
test_memmaptd_index_astensor 0.6599ms 0.3101ms 3.2246 KOps/s 3.1118 KOps/s $\color{#35bf28}+3.62\%$
test_memmaptd_index_op 0.9349ms 0.5750ms 1.7391 KOps/s 1.7139 KOps/s $\color{#35bf28}+1.47\%$
test_serialize_model 0.2565s 0.1108s 9.0293 Ops/s 8.9457 Ops/s $\color{#35bf28}+0.93\%$
test_serialize_model_pickle 1.3478s 1.2363s 0.8089 Ops/s 0.7166 Ops/s $\textbf{\color{#35bf28}+12.88\%}$
test_serialize_weights 93.1770ms 88.3897ms 11.3135 Ops/s 10.2786 Ops/s $\textbf{\color{#35bf28}+10.07\%}$
test_serialize_weights_returnearly 64.8046ms 56.3630ms 17.7421 Ops/s 11.3863 Ops/s $\textbf{\color{#35bf28}+55.82\%}$
test_serialize_weights_pickle 1.3476s 1.2420s 0.8052 Ops/s 0.8009 Ops/s $\color{#35bf28}+0.54\%$
test_reshape_pytree 0.2344ms 25.3594μs 39.4331 KOps/s 39.7301 KOps/s $\color{#d91a1a}-0.75\%$
test_reshape_td 0.1512ms 30.6945μs 32.5791 KOps/s 32.4981 KOps/s $\color{#35bf28}+0.25\%$
test_view_pytree 55.2610μs 24.3517μs 41.0648 KOps/s 40.3263 KOps/s $\color{#35bf28}+1.83\%$
test_view_td 0.1456s 58.3734μs 17.1311 KOps/s 21.2131 KOps/s $\textbf{\color{#d91a1a}-19.24\%}$
test_unbind_pytree 55.4600μs 29.2872μs 34.1446 KOps/s 33.1121 KOps/s $\color{#35bf28}+3.12\%$
test_unbind_td 0.2280ms 40.1115μs 24.9305 KOps/s 24.7606 KOps/s $\color{#35bf28}+0.69\%$
test_split_pytree 0.1819ms 29.1398μs 34.3173 KOps/s 34.2562 KOps/s $\color{#35bf28}+0.18\%$
test_split_td 0.5871ms 39.5546μs 25.2815 KOps/s 24.9823 KOps/s $\color{#35bf28}+1.20\%$
test_add_pytree 63.8500μs 34.6596μs 28.8521 KOps/s 28.0072 KOps/s $\color{#35bf28}+3.02\%$
test_add_td 0.2589ms 46.3307μs 21.5839 KOps/s 20.6237 KOps/s $\color{#35bf28}+4.66\%$
test_distributed 0.3328ms 70.3456μs 14.2155 KOps/s 13.9167 KOps/s $\color{#35bf28}+2.15\%$
test_tdmodule 66.1610μs 17.3442μs 57.6563 KOps/s 58.9162 KOps/s $\color{#d91a1a}-2.14\%$
test_tdmodule_dispatch 0.1347ms 34.8268μs 28.7136 KOps/s 29.5435 KOps/s $\color{#d91a1a}-2.81\%$
test_tdseq 43.3400μs 20.0142μs 49.9645 KOps/s 50.0438 KOps/s $\color{#d91a1a}-0.16\%$
test_tdseq_dispatch 53.9710μs 37.8781μs 26.4005 KOps/s 27.4272 KOps/s $\color{#d91a1a}-3.74\%$
test_instantiation_functorch 1.7911ms 1.6834ms 594.0313 Ops/s 593.5196 Ops/s $\color{#35bf28}+0.09\%$
test_instantiation_td 0.1778s 1.3617ms 734.3893 Ops/s 867.8236 Ops/s $\textbf{\color{#d91a1a}-15.38\%}$
test_exec_functorch 0.2141ms 0.1575ms 6.3481 KOps/s 6.4875 KOps/s $\color{#d91a1a}-2.15\%$
test_exec_functional_call 0.2417ms 0.1500ms 6.6674 KOps/s 6.6105 KOps/s $\color{#35bf28}+0.86\%$
test_exec_td 0.2162ms 0.1392ms 7.1861 KOps/s 7.0086 KOps/s $\color{#35bf28}+2.53\%$
test_exec_td_decorator 0.5496ms 0.1868ms 5.3536 KOps/s 5.3374 KOps/s $\color{#35bf28}+0.30\%$
test_vmap_mlp_speed[True-True] 0.6986ms 0.5773ms 1.7323 KOps/s 1.7342 KOps/s $\color{#d91a1a}-0.11\%$
test_vmap_mlp_speed[True-False] 0.6279ms 0.5756ms 1.7374 KOps/s 1.7382 KOps/s $\color{#d91a1a}-0.05\%$
test_vmap_mlp_speed[False-True] 0.5602ms 0.5083ms 1.9672 KOps/s 1.9652 KOps/s $\color{#35bf28}+0.10\%$
test_vmap_mlp_speed[False-False] 0.6544ms 0.5097ms 1.9620 KOps/s 1.8991 KOps/s $\color{#35bf28}+3.31\%$
test_vmap_mlp_speed_decorator[True-True] 1.0513ms 0.6251ms 1.5999 KOps/s 1.6243 KOps/s $\color{#d91a1a}-1.50\%$
test_vmap_mlp_speed_decorator[True-False] 0.7004ms 0.6183ms 1.6173 KOps/s 1.6275 KOps/s $\color{#d91a1a}-0.63\%$
test_vmap_mlp_speed_decorator[False-True] 0.7338ms 0.5276ms 1.8955 KOps/s 1.9157 KOps/s $\color{#d91a1a}-1.06\%$
test_vmap_mlp_speed_decorator[False-False] 0.6276ms 0.5271ms 1.8971 KOps/s 1.9128 KOps/s $\color{#d91a1a}-0.82\%$
test_vmap_transformer_speed[True-True] 7.9409ms 7.7729ms 128.6525 Ops/s 130.6912 Ops/s $\color{#d91a1a}-1.56\%$
test_vmap_transformer_speed[True-False] 8.4306ms 7.9668ms 125.5213 Ops/s 130.5878 Ops/s $\color{#d91a1a}-3.88\%$
test_vmap_transformer_speed[False-True] 8.4264ms 8.0150ms 124.7658 Ops/s 131.3270 Ops/s $\color{#d91a1a}-5.00\%$
test_vmap_transformer_speed[False-False] 8.1859ms 7.8844ms 126.8326 Ops/s 130.5240 Ops/s $\color{#d91a1a}-2.83\%$
test_vmap_transformer_speed_decorator[True-True] 19.0948ms 18.8140ms 53.1520 Ops/s 54.5245 Ops/s $\color{#d91a1a}-2.52\%$
test_vmap_transformer_speed_decorator[True-False] 19.8402ms 18.9269ms 52.8349 Ops/s 54.5497 Ops/s $\color{#d91a1a}-3.14\%$
test_vmap_transformer_speed_decorator[False-True] 19.6077ms 18.6084ms 53.7391 Ops/s 54.5879 Ops/s $\color{#d91a1a}-1.55\%$
test_vmap_transformer_speed_decorator[False-False] 19.2250ms 18.4340ms 54.2476 Ops/s 55.4758 Ops/s $\color{#d91a1a}-2.21\%$
test_to_module_speed[True] 1.4972ms 1.2637ms 791.2965 Ops/s 799.5254 Ops/s $\color{#d91a1a}-1.03\%$
test_to_module_speed[False] 1.5222ms 1.2366ms 808.6751 Ops/s 819.7241 Ops/s $\color{#d91a1a}-1.35\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants