Question about ARF #23

grimoire · 2023-06-07T11:36:15Z

Hi
I am a little bit confused with ARF cuda kernel.

ORN/src/orn/lib/active_rotating_filters.cu

Lines 19 to 33 in d6b38aa

    
             ) { 
        
             // index 
        
             const int n = blockIdx.x * blockDim.x + threadIdx.x; 
        
             if (n < count) { 
        
               const uint16_t l = n % nEntry; 
        
               const uint16_t j = (n / nEntry) % nInputPlane; 
        
               const uint16_t i = n / nEntry / nInputPlane; 
        
               const scalar_t val = weight_flatten[n]; 
        
               for (uint16_t k = 0; k < nRotation; k++) { 
        
                 const uint16_t index = (uint16_t)indices_flatten[l][k] - 1; 
        
                 output[i][k][j][index] = val; 
        
               } 
        
             } 
        
           }

Let's say, assume thread 0 and thead 1 has:

i₀ == i₁
j₀ == j₁
k₀ == k₁
index₀ == index₁

So output[i][k][j][index]. I think that is not expected. Different threads write data to the same memory address leading to unpredictable results.

Did I misunderstand the implementation?

PS: Is yzhou.work still available?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about ARF #23

Question about ARF #23

grimoire commented Jun 7, 2023 •

edited

Loading

Question about ARF #23

Question about ARF #23

Comments

grimoire commented Jun 7, 2023 • edited Loading

grimoire commented Jun 7, 2023 •

edited

Loading