Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I use this lib to output the original index of these sorted data #8

Open
xiangyunzhou opened this issue Feb 24, 2023 · 6 comments

Comments

@xiangyunzhou
Copy link

This lib looks awesome!
But I have a requirement to output the original index of these sorted data, for example the input array is {300 200 400 99 150 50 230 250}, then the sorted result should be {50 99 150 200 230 250 300 400}, and I want to get the original index as output {5, 3, 4, 1, 6, 7, 0, 2}.
Currently, I use the std::sort with a lambda to implement this require, but the performance is not good.

@r-devulap
Copy link
Contributor

r-devulap commented Feb 24, 2023

Hi, If you pass in the array and array of indices initialized to [0, 1, 2, 3, .. n] as arguments for key-value pair, PR #2 can be adapted to what you want. We have to add benchmarks to compare this to your implementation of std::sort to know how much we actually benefit from this.

@xiangyunzhou
Copy link
Author

Hi, If you pass in the array and array of indices initialized to [0, 1, 2, 3, .. n] as arguments for key-value pair, PR #2 can be adapted to what you want. We have to add benchmarks to compare this to your implementation of std::sort to know how much we actually benefit from this.

Thank you very much for this useful information and your contribute to this lib, PR #2 works well in my project, there's only one issue that avx512_qsort_kv(float*keys, uint16_t *indexes, int64_t arrsize) is not support yet, performance of current avx512_qsort_kv(double *keys, uint64_t *indexes, int64_t arrsize) is a little worse than my simple bubble sorting.

Hope that avx512_qsort_kv(float*keys, uint16_t *indexes, int64_t arrsize) will be much better than my current implementation.

Thanks again!

@r-devulap
Copy link
Contributor

avx512_qsort_kv(double *keys, uint64_t *indexes, int64_t arrsize) is a little worse than my simple bubble sorting.

@xiangyunzhou do you mind sharing your bubble sort so we can compare and add it to benchmarks? If a simple bubble sort performs as much, then what would be the point of all the complicated code.

@xiangyunzhou
Copy link
Author

@r-devulap below is my implementation and UT in my project. I want to get the top 32 index of 400 sorting by a float pfWeight. icpx with flags: -march=icelake-server and -O3.
Because avx512_qsort_kv doesn't support avx512_qsort_kv(float*keys, uint16_t *indexes, int64_t arrsize), so I need to store the key in u64 and value in double, which is not fair to compare with bubble, but it is about 10 times better than std::sort

#define MAX_CS1_NUM                       (400)
#define MAX_CS2_NUM                       (32)

uint32_t find_index_of_min(uint32_t num, const uint16_t *indexList, const float *valueList)
{
    uint32_t idx   = 0;
    float minValue = valueList[indexList[0]];
    for (uint32_t i = 1; i < num; i++)
    {
        const float value = valueList[indexList[i]];
        if(value < minValue)
        {
            minValue = value;
            idx      = i;
        }
    }
    return idx;
}

uint32_t mac_cell_select_cs2_by_pfweight(uint32_t numCs1, uint16_t *ueindexCs1, uint16_t *ueindexCs2, const float *pfWeight)
{
    uint32_t numCs2 = 0;
    while (numCs1 > 0 && numCs2 < MAX_CS2_NUM)
    {
        const uint32_t idx   = find_index_of_min(numCs1, ueindexCs1, pfWeight);
        assert(idx < MAX_CS1_NUM);
        ueindexCs2[numCs2++] = ueindexCs1[idx];
        ueindexCs1[idx]      = ueindexCs1[numCs1 - 1];
        numCs1--;
    }
    return numCs2;
}

TEST_F(MacCell, case_mac_cell_select_cs2_by_pfweight)
{
    float ueAdr[MAX_NUM_UE];
    float ueSe[MAX_NUM_UE];
    float uePfWeight[MAX_NUM_UE]={0.0};
    double uePfWeight1[MAX_NUM_UE]={0.0};

    for (uint32_t i = 0; i < MAX_NUM_UE; i++)
    {
        ueAdr[i] = ((float)rand())/13;
        ueSe[i] = ((float)rand())/13;
    }
    mac_cell_cal_ue_pfweight(MAX_NUM_UE, ueAdr, ueSe, uePfWeight);

    const uint32_t numCs1 = MAX_CS1_NUM;
    uint16_t ueindexCs1Input[MAX_CS1_NUM], ueindexCs2[MAX_CS2_NUM];
    uint64_t ueindexCs1[MAX_CS1_NUM];
    for (uint32_t i = 0; i < numCs1; i++)
    {
        ueindexCs1[i] = i;
        ueindexCs1Input[i] = i;
        uePfWeight1[i] = uePfWeight[i];
    }
    //memcpy(ueindexCs1Input, ueindexCs1, sizeof(ueindexCs1Input));

    const uint64_t functionStart = timer_get_ticks();
    const uint32_t numCs2 = mac_cell_select_cs2_by_pfweight(numCs1, ueindexCs1Input, ueindexCs2, uePfWeight);
    const uint64_t functionEnd = timer_get_ticks();
    std::sort(ueindexCs1, ueindexCs1 + numCs1, [uePfWeight1](uint16_t x, uint16_t y){return uePfWeight1[x] < uePfWeight1[y];});
    //avx512_qsort_kv(uePfWeight1, ueindexCs1, numCs1);
    const uint64_t sortEnd = timer_get_ticks();

    printf("mac_cell_select_cs2_by_pfweight time:%lu, std::sort time:%lu\n", functionEnd - functionStart, sortEnd - functionEnd);
    ASSERT_LT(functionEnd - functionStart, sortEnd - functionEnd);
    ASSERT_EQ(numCs2, MAX_CS2_NUM);
    for (uint32_t i = 0; i < numCs2; i++)
    {
        ASSERT_EQ(ueindexCs2[i], ueindexCs1[i]);
    }
}

@r-devulap
Copy link
Contributor

I want to get the top 32 index of 400 sorting by a float pfWeight.

Have you tried benchmarking for a larger array? The code size of AVX512 qsort is fairly large when compared to std::sort or your bubble sort. If you are sorting an array of just 400 elements, there might be a big front end penalty for AVX512 qsort which might offset all benefits you get from it (not sure if this is true but just wondering if that is why you are not seeing any perf gains).

@xiangyunzhou
Copy link
Author

I want to get the top 32 index of 400 sorting by a float pfWeight.

Have you tried benchmarking for a larger array? The code size of AVX512 qsort is fairly large when compared to std::sort or your bubble sort. If you are sorting an array of just 400 elements, there might be a big front end penalty for AVX512 qsort which might offset all benefits you get from it (not sure if this is true but just wondering if that is why you are not seeing any perf gains).

I have tried to increase the array from 400 to 20480, seems still no benefit from AVX512 qsort. I start to get benefit from the AVX512 qsort when I increase the top items from 32 to 128.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants