Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check/add support for Apple M1 processor #428

Closed
dkozel opened this issue Dec 21, 2020 · 32 comments · Fixed by #550
Closed

Check/add support for Apple M1 processor #428

dkozel opened this issue Dec 21, 2020 · 32 comments · Fixed by #550
Labels
ARM / Neon Neon ARCH specific Enhancement new kernel entirely or for some specific ARCH

Comments

@dkozel
Copy link

dkozel commented Dec 21, 2020

With Apple's switch to ARM there may be some adjustments needed to detect and use the available NEON instructions.

@sbehnke is interested in working on/testing this feature.

@dkozel dkozel added ARM / Neon Neon ARCH specific Enhancement new kernel entirely or for some specific ARCH labels Dec 21, 2020
@jdemel
Copy link
Contributor

jdemel commented Dec 21, 2020

I'd hope we can just use an updated cpu_features version if this seems necessary. Besides publicly available docs regarding that chip would be helpful as well.

@sbehnke
Copy link
Contributor

sbehnke commented Dec 21, 2020

If I change the cpu_features to recognize that the M1 chip is actually ARM with CPU_FEATURES_ARCH_ARM defined, volk compiles and runs without issue. I'm not sure if this the appropriate solution or if there is more to do. While clang 12 has aarch64 define, it does not appear to define arm so the cpu_feature submodule fails to build.

@jdemel
Copy link
Contributor

jdemel commented Dec 21, 2020

So VOLK builds and uses NEON on M1? That's good.

And we might face 2 issues in the future:

  1. cpu_features seems to fail at automatically detecting M1 as an ARM CPU with NEON. In that case I suggest to open an issue against cpu_features and we can update our submodule pointer as soon as this is fixed.
  2. clang12 has some issue. Are you refering to a __aarch64__ define as opposed to a __arm__ define?

@sbehnke
Copy link
Contributor

sbehnke commented Dec 21, 2020

With regards to 2 first, clang12 has __aarch64__ defined but does not have __arm__ defined so the cpu_features emits an error about trying to compile cpuinfo_arm.c on a non-arm cpu.

In the email from @dkozel I'm not sure if it is using neon or not yet. I have attached the make test results here as I only downloaded / built it for the first time anywhere yesterday.

Running tests...
Test project /Users/sbehnke/Documents/GitHub/volk/build
        Start   1: qa_volk_16i_32fc_dot_prod_32fc
  1/134 Test   #1: qa_volk_16i_32fc_dot_prod_32fc .......................   Passed    0.19 sec
        Start   2: qa_volk_16i_branch_4_state_8
  2/134 Test   #2: qa_volk_16i_branch_4_state_8 .........................   Passed    0.01 sec
        Start   3: qa_volk_16i_convert_8i
  3/134 Test   #3: qa_volk_16i_convert_8i ...............................   Passed    0.01 sec
        Start   4: qa_volk_16i_max_star_16i
  4/134 Test   #4: qa_volk_16i_max_star_16i .............................   Passed    0.01 sec
        Start   5: qa_volk_16i_max_star_horizontal_16i
  5/134 Test   #5: qa_volk_16i_max_star_horizontal_16i ..................   Passed    0.01 sec
        Start   6: qa_volk_16i_permute_and_scalar_add
  6/134 Test   #6: qa_volk_16i_permute_and_scalar_add ...................   Passed    0.01 sec
        Start   7: qa_volk_16i_s32f_convert_32f
  7/134 Test   #7: qa_volk_16i_s32f_convert_32f .........................   Passed    0.01 sec
        Start   8: qa_volk_16i_x4_quad_max_star_16i
  8/134 Test   #8: qa_volk_16i_x4_quad_max_star_16i .....................   Passed    0.01 sec
        Start   9: qa_volk_16i_x5_add_quad_16i_x4
  9/134 Test   #9: qa_volk_16i_x5_add_quad_16i_x4 .......................   Passed    0.01 sec
        Start  10: qa_volk_16ic_convert_32fc
 10/134 Test  #10: qa_volk_16ic_convert_32fc ............................   Passed    0.01 sec
        Start  11: qa_volk_16ic_deinterleave_16i_x2
 11/134 Test  #11: qa_volk_16ic_deinterleave_16i_x2 .....................   Passed    0.01 sec
        Start  12: qa_volk_16ic_deinterleave_real_16i
 12/134 Test  #12: qa_volk_16ic_deinterleave_real_16i ...................   Passed    0.01 sec
        Start  13: qa_volk_16ic_deinterleave_real_8i
 13/134 Test  #13: qa_volk_16ic_deinterleave_real_8i ....................   Passed    0.01 sec
        Start  14: qa_volk_16ic_magnitude_16i
 14/134 Test  #14: qa_volk_16ic_magnitude_16i ...........................   Passed    0.01 sec
        Start  15: qa_volk_16ic_s32f_deinterleave_32f_x2
 15/134 Test  #15: qa_volk_16ic_s32f_deinterleave_32f_x2 ................   Passed    0.01 sec
        Start  16: qa_volk_16ic_s32f_deinterleave_real_32f
 16/134 Test  #16: qa_volk_16ic_s32f_deinterleave_real_32f ..............   Passed    0.01 sec
        Start  17: qa_volk_16ic_s32f_magnitude_32f
 17/134 Test  #17: qa_volk_16ic_s32f_magnitude_32f ......................   Passed    0.01 sec
        Start  18: qa_volk_16ic_x2_dot_prod_16ic
 18/134 Test  #18: qa_volk_16ic_x2_dot_prod_16ic ........................   Passed    0.01 sec
        Start  19: qa_volk_16ic_x2_multiply_16ic
 19/134 Test  #19: qa_volk_16ic_x2_multiply_16ic ........................   Passed    0.01 sec
        Start  20: qa_volk_16u_byteswap
 20/134 Test  #20: qa_volk_16u_byteswap .................................   Passed    0.00 sec
        Start  21: qa_volk_16u_byteswappuppet_16u
 21/134 Test  #21: qa_volk_16u_byteswappuppet_16u .......................   Passed    0.01 sec
        Start  22: qa_volk_32f_64f_add_64f
 22/134 Test  #22: qa_volk_32f_64f_add_64f ..............................   Passed    0.01 sec
        Start  23: qa_volk_32f_64f_multiply_64f
 23/134 Test  #23: qa_volk_32f_64f_multiply_64f .........................   Passed    0.01 sec
        Start  24: qa_volk_32f_8u_polarbutterfly_32f
 24/134 Test  #24: qa_volk_32f_8u_polarbutterfly_32f ....................   Passed    0.01 sec
        Start  25: qa_volk_32f_8u_polarbutterflypuppet_32f
 25/134 Test  #25: qa_volk_32f_8u_polarbutterflypuppet_32f ..............   Passed    0.02 sec
        Start  26: qa_volk_32f_accumulator_s32f
 26/134 Test  #26: qa_volk_32f_accumulator_s32f .........................   Passed    0.01 sec
        Start  27: qa_volk_32f_acos_32f
 27/134 Test  #27: qa_volk_32f_acos_32f .................................   Passed    0.01 sec
        Start  28: qa_volk_32f_asin_32f
 28/134 Test  #28: qa_volk_32f_asin_32f .................................   Passed    0.01 sec
        Start  29: qa_volk_32f_atan_32f
 29/134 Test  #29: qa_volk_32f_atan_32f .................................   Passed    0.01 sec
        Start  30: qa_volk_32f_binary_slicer_32i
 30/134 Test  #30: qa_volk_32f_binary_slicer_32i ........................   Passed    0.01 sec
        Start  31: qa_volk_32f_binary_slicer_8i
 31/134 Test  #31: qa_volk_32f_binary_slicer_8i .........................   Passed    0.01 sec
        Start  32: qa_volk_32f_convert_64f
 32/134 Test  #32: qa_volk_32f_convert_64f ..............................   Passed    0.01 sec
        Start  33: qa_volk_32f_cos_32f
 33/134 Test  #33: qa_volk_32f_cos_32f ..................................   Passed    0.01 sec
        Start  34: qa_volk_32f_exp_32f
 34/134 Test  #34: qa_volk_32f_exp_32f ..................................   Passed    0.00 sec
        Start  35: qa_volk_32f_expfast_32f
 35/134 Test  #35: qa_volk_32f_expfast_32f ..............................   Passed    0.01 sec
        Start  36: qa_volk_32f_index_max_16u
 36/134 Test  #36: qa_volk_32f_index_max_16u ............................   Passed    0.01 sec
        Start  37: qa_volk_32f_index_max_32u
 37/134 Test  #37: qa_volk_32f_index_max_32u ............................   Passed    0.01 sec
        Start  38: qa_volk_32f_invsqrt_32f
 38/134 Test  #38: qa_volk_32f_invsqrt_32f ..............................   Passed    0.00 sec
        Start  39: qa_volk_32f_log2_32f
 39/134 Test  #39: qa_volk_32f_log2_32f .................................   Passed    0.01 sec
        Start  40: qa_volk_32f_null_32f
 40/134 Test  #40: qa_volk_32f_null_32f .................................   Passed    0.00 sec
        Start  41: qa_volk_32f_s32f_32f_fm_detect_32f
 41/134 Test  #41: qa_volk_32f_s32f_32f_fm_detect_32f ...................   Passed    0.00 sec
        Start  42: qa_volk_32f_s32f_add_32f
 42/134 Test  #42: qa_volk_32f_s32f_add_32f .............................   Passed    0.01 sec
        Start  43: qa_volk_32f_s32f_calc_spectral_noise_floor_32f
 43/134 Test  #43: qa_volk_32f_s32f_calc_spectral_noise_floor_32f .......   Passed    0.01 sec
        Start  44: qa_volk_32f_s32f_convert_16i
 44/134 Test  #44: qa_volk_32f_s32f_convert_16i .........................   Passed    0.01 sec
        Start  45: qa_volk_32f_s32f_convert_32i
 45/134 Test  #45: qa_volk_32f_s32f_convert_32i .........................   Passed    0.01 sec
        Start  46: qa_volk_32f_s32f_convert_8i
 46/134 Test  #46: qa_volk_32f_s32f_convert_8i ..........................   Passed    0.01 sec
        Start  47: qa_volk_32f_s32f_mod_rangepuppet_32f
 47/134 Test  #47: qa_volk_32f_s32f_mod_rangepuppet_32f .................   Passed    0.01 sec
        Start  48: qa_volk_32f_s32f_multiply_32f
 48/134 Test  #48: qa_volk_32f_s32f_multiply_32f ........................   Passed    0.01 sec
        Start  49: qa_volk_32f_s32f_normalize
 49/134 Test  #49: qa_volk_32f_s32f_normalize ...........................   Passed    0.01 sec
        Start  50: qa_volk_32f_s32f_power_32f
 50/134 Test  #50: qa_volk_32f_s32f_power_32f ...........................   Passed    0.01 sec
        Start  51: qa_volk_32f_s32f_s32f_mod_range_32f
 51/134 Test  #51: qa_volk_32f_s32f_s32f_mod_range_32f ..................   Passed    0.00 sec
        Start  52: qa_volk_32f_s32f_stddev_32f
 52/134 Test  #52: qa_volk_32f_s32f_stddev_32f ..........................   Passed    0.01 sec
        Start  53: qa_volk_32f_sin_32f
 53/134 Test  #53: qa_volk_32f_sin_32f ..................................   Passed    0.01 sec
        Start  54: qa_volk_32f_sqrt_32f
 54/134 Test  #54: qa_volk_32f_sqrt_32f .................................   Passed    0.01 sec
        Start  55: qa_volk_32f_stddev_and_mean_32f_x2
 55/134 Test  #55: qa_volk_32f_stddev_and_mean_32f_x2 ...................   Passed    0.01 sec
        Start  56: qa_volk_32f_tan_32f
 56/134 Test  #56: qa_volk_32f_tan_32f ..................................   Passed    0.01 sec
        Start  57: qa_volk_32f_tanh_32f
 57/134 Test  #57: qa_volk_32f_tanh_32f .................................   Passed    0.01 sec
        Start  58: qa_volk_32f_x2_add_32f
 58/134 Test  #58: qa_volk_32f_x2_add_32f ...............................   Passed    0.01 sec
        Start  59: qa_volk_32f_x2_divide_32f
 59/134 Test  #59: qa_volk_32f_x2_divide_32f ............................   Passed    0.01 sec
        Start  60: qa_volk_32f_x2_dot_prod_16i
 60/134 Test  #60: qa_volk_32f_x2_dot_prod_16i ..........................   Passed    0.01 sec
        Start  61: qa_volk_32f_x2_dot_prod_32f
 61/134 Test  #61: qa_volk_32f_x2_dot_prod_32f ..........................   Passed    0.01 sec
        Start  62: qa_volk_32f_x2_fm_detectpuppet_32f
 62/134 Test  #62: qa_volk_32f_x2_fm_detectpuppet_32f ...................   Passed    0.01 sec
        Start  63: qa_volk_32f_x2_interleave_32fc
 63/134 Test  #63: qa_volk_32f_x2_interleave_32fc .......................   Passed    0.01 sec
        Start  64: qa_volk_32f_x2_max_32f
 64/134 Test  #64: qa_volk_32f_x2_max_32f ...............................   Passed    0.01 sec
        Start  65: qa_volk_32f_x2_min_32f
 65/134 Test  #65: qa_volk_32f_x2_min_32f ...............................   Passed    0.01 sec
        Start  66: qa_volk_32f_x2_multiply_32f
 66/134 Test  #66: qa_volk_32f_x2_multiply_32f ..........................   Passed    0.01 sec
        Start  67: qa_volk_32f_x2_pow_32f
 67/134 Test  #67: qa_volk_32f_x2_pow_32f ...............................   Passed    0.01 sec
        Start  68: qa_volk_32f_x2_s32f_interleave_16ic
 68/134 Test  #68: qa_volk_32f_x2_s32f_interleave_16ic ..................   Passed    0.01 sec
        Start  69: qa_volk_32f_x2_subtract_32f
 69/134 Test  #69: qa_volk_32f_x2_subtract_32f ..........................   Passed    0.01 sec
        Start  70: qa_volk_32f_x3_sum_of_poly_32f
 70/134 Test  #70: qa_volk_32f_x3_sum_of_poly_32f .......................   Passed    0.01 sec
        Start  71: qa_volk_32fc_32f_add_32fc
 71/134 Test  #71: qa_volk_32fc_32f_add_32fc ............................   Passed    0.01 sec
        Start  72: qa_volk_32fc_32f_dot_prod_32fc
 72/134 Test  #72: qa_volk_32fc_32f_dot_prod_32fc .......................   Passed    0.01 sec
        Start  73: qa_volk_32fc_32f_multiply_32fc
 73/134 Test  #73: qa_volk_32fc_32f_multiply_32fc .......................   Passed    0.01 sec
        Start  74: qa_volk_32fc_conjugate_32fc
 74/134 Test  #74: qa_volk_32fc_conjugate_32fc ..........................   Passed    0.01 sec
        Start  75: qa_volk_32fc_convert_16ic
 75/134 Test  #75: qa_volk_32fc_convert_16ic ............................   Passed    0.01 sec
        Start  76: qa_volk_32fc_deinterleave_32f_x2
 76/134 Test  #76: qa_volk_32fc_deinterleave_32f_x2 .....................   Passed    0.01 sec
        Start  77: qa_volk_32fc_deinterleave_64f_x2
 77/134 Test  #77: qa_volk_32fc_deinterleave_64f_x2 .....................   Passed    0.01 sec
        Start  78: qa_volk_32fc_deinterleave_imag_32f
 78/134 Test  #78: qa_volk_32fc_deinterleave_imag_32f ...................   Passed    0.01 sec
        Start  79: qa_volk_32fc_deinterleave_real_32f
 79/134 Test  #79: qa_volk_32fc_deinterleave_real_32f ...................   Passed    0.01 sec
        Start  80: qa_volk_32fc_deinterleave_real_64f
 80/134 Test  #80: qa_volk_32fc_deinterleave_real_64f ...................   Passed    0.01 sec
        Start  81: qa_volk_32fc_index_max_16u
 81/134 Test  #81: qa_volk_32fc_index_max_16u ...........................   Passed    0.01 sec
        Start  82: qa_volk_32fc_index_max_32u
 82/134 Test  #82: qa_volk_32fc_index_max_32u ...........................   Passed    0.01 sec
        Start  83: qa_volk_32fc_magnitude_32f
 83/134 Test  #83: qa_volk_32fc_magnitude_32f ...........................   Passed    0.01 sec
        Start  84: qa_volk_32fc_magnitude_squared_32f
 84/134 Test  #84: qa_volk_32fc_magnitude_squared_32f ...................   Passed    0.01 sec
        Start  85: qa_volk_32fc_s32f_atan2_32f
 85/134 Test  #85: qa_volk_32fc_s32f_atan2_32f ..........................   Passed    0.01 sec
        Start  86: qa_volk_32fc_s32f_deinterleave_real_16i
 86/134 Test  #86: qa_volk_32fc_s32f_deinterleave_real_16i ..............   Passed    0.01 sec
        Start  87: qa_volk_32fc_s32f_magnitude_16i
 87/134 Test  #87: qa_volk_32fc_s32f_magnitude_16i ......................   Passed    0.01 sec
        Start  88: qa_volk_32fc_s32f_power_32fc
 88/134 Test  #88: qa_volk_32fc_s32f_power_32fc .........................   Passed    0.01 sec
        Start  89: qa_volk_32fc_s32f_power_spectral_densitypuppet_32f
 89/134 Test  #89: qa_volk_32fc_s32f_power_spectral_densitypuppet_32f ...   Passed    0.01 sec
        Start  90: qa_volk_32fc_s32f_power_spectrum_32f
 90/134 Test  #90: qa_volk_32fc_s32f_power_spectrum_32f .................   Passed    0.01 sec
        Start  91: qa_volk_32fc_s32f_x2_power_spectral_density_32f
 91/134 Test  #91: qa_volk_32fc_s32f_x2_power_spectral_density_32f ......   Passed    0.00 sec
        Start  92: qa_volk_32fc_s32fc_multiply_32fc
 92/134 Test  #92: qa_volk_32fc_s32fc_multiply_32fc .....................   Passed    0.01 sec
        Start  93: qa_volk_32fc_s32fc_rotatorpuppet_32fc
 93/134 Test  #93: qa_volk_32fc_s32fc_rotatorpuppet_32fc ................   Passed    0.01 sec
        Start  94: qa_volk_32fc_s32fc_x2_rotator_32fc
 94/134 Test  #94: qa_volk_32fc_s32fc_x2_rotator_32fc ...................   Passed    0.00 sec
        Start  95: qa_volk_32fc_x2_add_32fc
 95/134 Test  #95: qa_volk_32fc_x2_add_32fc .............................   Passed    0.01 sec
        Start  96: qa_volk_32fc_x2_conjugate_dot_prod_32fc
 96/134 Test  #96: qa_volk_32fc_x2_conjugate_dot_prod_32fc ..............   Passed    0.01 sec
        Start  97: qa_volk_32fc_x2_divide_32fc
 97/134 Test  #97: qa_volk_32fc_x2_divide_32fc ..........................   Passed    0.01 sec
        Start  98: qa_volk_32fc_x2_dot_prod_32fc
 98/134 Test  #98: qa_volk_32fc_x2_dot_prod_32fc ........................   Passed    0.01 sec
        Start  99: qa_volk_32fc_x2_multiply_32fc
 99/134 Test  #99: qa_volk_32fc_x2_multiply_32fc ........................   Passed    0.01 sec
        Start 100: qa_volk_32fc_x2_multiply_conjugate_32fc
100/134 Test #100: qa_volk_32fc_x2_multiply_conjugate_32fc ..............   Passed    0.01 sec
        Start 101: qa_volk_32fc_x2_s32f_square_dist_scalar_mult_32f
101/134 Test #101: qa_volk_32fc_x2_s32f_square_dist_scalar_mult_32f .....   Passed    0.01 sec
        Start 102: qa_volk_32fc_x2_s32fc_multiply_conjugate_add_32fc
102/134 Test #102: qa_volk_32fc_x2_s32fc_multiply_conjugate_add_32fc ....   Passed    0.01 sec
        Start 103: qa_volk_32fc_x2_square_dist_32f
103/134 Test #103: qa_volk_32fc_x2_square_dist_32f ......................   Passed    0.01 sec
        Start 104: qa_volk_32i_s32f_convert_32f
104/134 Test #104: qa_volk_32i_s32f_convert_32f .........................   Passed    0.01 sec
        Start 105: qa_volk_32i_x2_and_32i
105/134 Test #105: qa_volk_32i_x2_and_32i ...............................   Passed    0.01 sec
        Start 106: qa_volk_32i_x2_or_32i
106/134 Test #106: qa_volk_32i_x2_or_32i ................................   Passed    0.01 sec
        Start 107: qa_volk_32u_byteswap
107/134 Test #107: qa_volk_32u_byteswap .................................   Passed    0.00 sec
        Start 108: qa_volk_32u_byteswappuppet_32u
108/134 Test #108: qa_volk_32u_byteswappuppet_32u .......................   Passed    0.01 sec
        Start 109: qa_volk_32u_popcnt
109/134 Test #109: qa_volk_32u_popcnt ...................................   Passed    0.00 sec
        Start 110: qa_volk_32u_popcntpuppet_32u
110/134 Test #110: qa_volk_32u_popcntpuppet_32u .........................   Passed    0.01 sec
        Start 111: qa_volk_32u_reverse_32u
111/134 Test #111: qa_volk_32u_reverse_32u ..............................   Passed    0.01 sec
        Start 112: qa_volk_64f_convert_32f
112/134 Test #112: qa_volk_64f_convert_32f ..............................   Passed    0.01 sec
        Start 113: qa_volk_64f_x2_add_64f
113/134 Test #113: qa_volk_64f_x2_add_64f ...............................   Passed    0.01 sec
        Start 114: qa_volk_64f_x2_max_64f
114/134 Test #114: qa_volk_64f_x2_max_64f ...............................   Passed    0.01 sec
        Start 115: qa_volk_64f_x2_min_64f
115/134 Test #115: qa_volk_64f_x2_min_64f ...............................   Passed    0.01 sec
        Start 116: qa_volk_64f_x2_multiply_64f
116/134 Test #116: qa_volk_64f_x2_multiply_64f ..........................   Passed    0.01 sec
        Start 117: qa_volk_64u_byteswap
117/134 Test #117: qa_volk_64u_byteswap .................................   Passed    0.01 sec
        Start 118: qa_volk_64u_byteswappuppet_64u
118/134 Test #118: qa_volk_64u_byteswappuppet_64u .......................   Passed    0.01 sec
        Start 119: qa_volk_64u_popcnt
119/134 Test #119: qa_volk_64u_popcnt ...................................   Passed    0.01 sec
        Start 120: qa_volk_64u_popcntpuppet_64u
120/134 Test #120: qa_volk_64u_popcntpuppet_64u .........................   Passed    0.01 sec
        Start 121: qa_volk_8i_convert_16i
121/134 Test #121: qa_volk_8i_convert_16i ...............................   Passed    0.01 sec
        Start 122: qa_volk_8i_s32f_convert_32f
122/134 Test #122: qa_volk_8i_s32f_convert_32f ..........................   Passed    0.01 sec
        Start 123: qa_volk_8ic_deinterleave_16i_x2
123/134 Test #123: qa_volk_8ic_deinterleave_16i_x2 ......................   Passed    0.01 sec
        Start 124: qa_volk_8ic_deinterleave_real_16i
124/134 Test #124: qa_volk_8ic_deinterleave_real_16i ....................   Passed    0.01 sec
        Start 125: qa_volk_8ic_deinterleave_real_8i
125/134 Test #125: qa_volk_8ic_deinterleave_real_8i .....................   Passed    0.01 sec
        Start 126: qa_volk_8ic_s32f_deinterleave_32f_x2
126/134 Test #126: qa_volk_8ic_s32f_deinterleave_32f_x2 .................   Passed    0.01 sec
        Start 127: qa_volk_8ic_s32f_deinterleave_real_32f
127/134 Test #127: qa_volk_8ic_s32f_deinterleave_real_32f ...............   Passed    0.01 sec
        Start 128: qa_volk_8ic_x2_multiply_conjugate_16ic
128/134 Test #128: qa_volk_8ic_x2_multiply_conjugate_16ic ...............   Passed    0.01 sec
        Start 129: qa_volk_8ic_x2_s32f_multiply_conjugate_32fc
129/134 Test #129: qa_volk_8ic_x2_s32f_multiply_conjugate_32fc ..........   Passed    0.01 sec
        Start 130: qa_volk_8u_conv_k7_r2puppet_8u
130/134 Test #130: qa_volk_8u_conv_k7_r2puppet_8u .......................   Passed    0.01 sec
        Start 131: qa_volk_8u_x2_encodeframepolar_8u
131/134 Test #131: qa_volk_8u_x2_encodeframepolar_8u ....................   Passed    0.00 sec
        Start 132: qa_volk_8u_x3_encodepolar_8u_x2
132/134 Test #132: qa_volk_8u_x3_encodepolar_8u_x2 ......................   Passed    0.00 sec
        Start 133: qa_volk_8u_x3_encodepolarpuppet_8u
133/134 Test #133: qa_volk_8u_x3_encodepolarpuppet_8u ...................   Passed    0.01 sec
        Start 134: qa_volk_8u_x4_conv_k7_r2_8u
134/134 Test #134: qa_volk_8u_x4_conv_k7_r2_8u ..........................   Passed    0.00 sec

100% tests passed, 0 tests failed out of 134

@jdemel
Copy link
Contributor

jdemel commented Dec 21, 2020

You have multiple options to check: ./build/apps/volk-config-info --avail-machines and the --machine and --all-machines flags. The last one should tell you which machines are actually compiled in. The other are more host specific. If you run ctest -V, you should see a lot of kernels with output like neon .... In that case, the NEON kernel was executed.
I saw tweets that confirmed M1 supports SVE as well. But I couldn't find any documentation that supports this. Do you have any more knowledge about that?

About the __aarch64__ vs __arm__ thing. I just read up on it and as far as I understand it, whenever there's an ARM target, __arm__ should be present but __aarch64__ only if a 64bit ARM is available. I assume this is smth that should be fixed in cpu_features.

@sbehnke
Copy link
Contributor

sbehnke commented Dec 21, 2020

I don't think it does support SVE. Looking at the sysctl -a output:

hw.optional.floatingpoint: 1
hw.optional.watchpoint: 4
hw.optional.breakpoint: 6
hw.optional.neon: 1
hw.optional.neon_hpfp: 1
hw.optional.neon_fp16: 1
hw.optional.armv8_1_atomics: 1
hw.optional.armv8_crc32: 1
hw.optional.armv8_2_fhm: 1
hw.optional.armv8_2_sha512: 1
hw.optional.armv8_2_sha3: 1
hw.optional.amx_version: 2
hw.optional.ucnormal_mem: 1
hw.optional.arm64: 1

I don't think that even thought cpu_features is compiling it is detecting anything properly.

❯ list_cpu_features
arch            : ARM
implementer     :   0 (0x00)
architecture    :   0 (0x00)
variant         :   0 (0x00)
part            :   0 (0x00)
revision        :   0 (0x00)
flags           :

❯ volk-config-info --avail-machines
generic;
❯ volk-config-info --all-machines
generic;neon;neonv8

Looks like there's an open issue about adding support for M1 to cpu_feature already, so that much is good.

google/cpu_features#121

@sbehnke
Copy link
Contributor

sbehnke commented Dec 22, 2020

Getting closer to having something functional:

❯ ./list_cpu_features
arch            : aarch64
implementer     : 16777228 (0x100000C)
variant         :   2 (0x02)
part            : 458787763 (0x1B588BB3)
revision        :   2 (0x02)
flags           : asimdfhm,atomics,crc32,fp,fphp,sha3,sha512

@jdemel
Copy link
Contributor

jdemel commented Dec 22, 2020

@sbehnke thanks for working on it!

@sbehnke
Copy link
Contributor

sbehnke commented Dec 23, 2020

Submitted google/cpu_features#150 Now we wait for Google to tell me what I've done wrong.

@michaelld
Copy link
Contributor

Very cool! I have an ARM Mac I can do testing on ... not M1 but still ARM. I left a comment on you PR ... LGMT either way, but I think that change would help. Thx for your work!

@sabrinastronomy
Copy link

Hi there! Just found this thread after my intel based MacBook died this week, and I chose (which has now become a painful process in reinstalling GNU radio) to get the M1 chip based MacBook. I currently can't successfully install VOLK in the Rosetta emulator or with the ARM cmake toolchain files. Are there any updates on this? I can also help with testing if needed.

@jdemel
Copy link
Contributor

jdemel commented Apr 24, 2021

We're waiting for the PR against cpu_features to get merged. Afterwards, we can probably add M1 support pretty quickly. It looks like M1 is ARM based but there are some subtle differences.
For now, I assume it would be really helpful to support this PR against cpu_features. Maybe test it, etc.

@sabrinastronomy
Copy link

This (among other near impossible installations) prompted me to actually return the computer for an Intel chip. All will be much easier now! Thank you for your help and great development.

@dkozel
Copy link
Author

dkozel commented Jun 14, 2021

@sbehnke Did you successfully get GNU Radio running on an M1 mac?

@sbehnke
Copy link
Contributor

sbehnke commented Jun 14, 2021

@sbehnke Did you successfully get GNU Radio running on an M1 mac?

No, I haven't. I got pulled in a different direction and haven't had a chance to circle back yet.

@dkozel
Copy link
Author

dkozel commented Jun 14, 2021

Thanks for the fast response! I think no one has succeeded then.

@sbehnke
Copy link
Contributor

sbehnke commented Jun 14, 2021

Thanks for the fast response! I think no one has succeeded then.

I made a pull request for the upstream Volk, but it didn't get approved and pulled in, which I don't blame them for. I don't think we quite capture all of the capabilities of the processor and without that building in the tree I'm not sure what to do.

google/cpu_features#150

@jdemel
Copy link
Contributor

jdemel commented Jun 15, 2021

Thanks for doing this PR! I still hope it'll get merged at some point. At this point 2 out of 3 PRs against cpu_features are directly relevant to us.

@fllay
Copy link

fllay commented Aug 21, 2021

Any news ? It seems that macports can compile gnuradio and volk for m1 https://ports.macports.org/port/volk/details/ https://ports.macports.org/port/volk/builds/?page=1. I am using homebrew but still no news for homebrew.

@michaelld
Copy link
Contributor

@fllay Yes I pushed fixes for Volk's cpu_features into MacPorts. No movement upstream there; no idea what the timeframe is. I don't mix and match package managers, and I do MacPorts -- so I have no idea what homebrew is doing.

@michaelld
Copy link
Contributor

Having Volk patch cpu_features is a possible, I suppose ... that would help homebrew and other package managers and building from source on M1. Worth considering.

@fllay
Copy link

fllay commented Aug 22, 2021

Thank you for information, I guess I will move to macports. Actually, when I used intel Mac, I was using macports for package management and gnuradio.

@jdemel
Copy link
Contributor

jdemel commented Aug 22, 2021

On the plus side: It is a very simple patch to cpu_features to compile and use VOLK on M1.
On the minus side: This would report incomplete data for M1 with cpu_features.
I assume they don't want to report incomplete features and thus, they (cpu_features maintainers) didn't merge the available PRs yet.

We could patch cpu_features ourselves. However, that'll give quite a few package maintainers quite a headache. We'd rely on a custom cpu_features version that is not available through package managers. I'd say that is not a solution we should do here but leave to the package maintainers to decide for themselves. I just hope that there will be progress soon on the cpu_features side.

@michaelld
Copy link
Contributor

@jdemel discussion from GRCon21 summary: (1) don't use cpu_features since it is no longer in active development and clearly the maintainers are not engaged with supporting the M1 even though it's been out for "a long time"; (2) hence, revert much of the work that added support for cpu_features (e.g., 5e2193c); (3) fixup the new code to work with M1 & other CPU / ARCH. This will require some work, but it would at least give us control over the CPU / ARCH determination, which is a serious problem right now.

@michaelld
Copy link
Contributor

The related discussion was to minimally patch cpu_features to return UNKNOWN (or whatever the correct term is) for the M1, and then handle that inside Volk CMake. This would be -more- consistent with current behavio[u]r, but still might be a pain for quite a few package managers (as noted a few comments above).

@jdemel
Copy link
Contributor

jdemel commented Sep 26, 2021

I fear that cpu_features is abandonware.
If we revert to the situation we had before the introduction of cpu_features, we add quite a few more regressions. Basically everything Windows related falls back to generic. Also, we would again fail to properly detect AVX/AVX2 features.

I'd really like to find a maintained project that supports CPU feature detection. We might extract the relevant parts from cpu_features for our purposes. Then, we'd extend them to support M1.

In any case, we might run into issues e.g. in VMs. If there'd be an option to run e.g. AVX512 commands and catch illegal instruction error, we might just use that to detect the system we're on.

@michaelld
Copy link
Contributor

I just started looking at this project: https://github.com/simd-everywhere/simde.git ... it's pretty cool!

@michaelld
Copy link
Contributor

Maybe as a compromise at least right now, we tweak the CMake scripts to check for Apple M1 & if that succeeds then we don't do cpu_features. If not M1 then we do cpu_features & hope for the best. That would at least take care of the immediate need: proper Volk code to get it building on M1.

@michaelld
Copy link
Contributor

Thinking of SIMDe for the CPU detection more-so than their actual code ... though that's pretty cool too!

@juanibuqt
Copy link

It seems that Volk without M1 support is making a headache to lot of us! Cant finishing installing SatDump here since Vosk is a must

@jdemel
Copy link
Contributor

jdemel commented Sep 30, 2021

If you compile from source, you can fix it by patching cpu_features. VOLK itself is ready for M1. But the library we use to detect CPU features is not. The version you obtain via conda receives this patch as well. At least as far as I know. You might want to double check.

@jdemel
Copy link
Contributor

jdemel commented Dec 19, 2021

It seems like the relevant patch got merged
google/cpu_features@69d3993
I can't test it on an M1 at the moment. However, this commit claims to fix our issue as well.

jdemel added a commit to jdemel/volk that referenced this issue Dec 22, 2021
We have 3 issues that should be fixed with this commit.

Fix gnuradio#428
Should be fixed because cpu_features detects `arm64` now. Thus, it
builds on MacOS and reports M1 capabilities.

Fix gnuradio#478
Fix gnuradio#484
cpu_features received quite a bit of contributions for FreeBSD. All the
issues we had should be fixed now. However, this might require further
evaluation.

Signed-off-by: Johannes Demel <[email protected]>
Alesha72003 pushed a commit to Alesha72003/volk that referenced this issue May 15, 2024
We have 3 issues that should be fixed with this commit.

Fix gnuradio#428
Should be fixed because cpu_features detects `arm64` now. Thus, it
builds on MacOS and reports M1 capabilities.

Fix gnuradio#478
Fix gnuradio#484
cpu_features received quite a bit of contributions for FreeBSD. All the
issues we had should be fixed now. However, this might require further
evaluation.

Signed-off-by: Johannes Demel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARM / Neon Neon ARCH specific Enhancement new kernel entirely or for some specific ARCH
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants