Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance of md5_mb_over_4GB_test #67

Open
KelvonLi opened this issue May 25, 2021 · 4 comments
Open

performance of md5_mb_over_4GB_test #67

KelvonLi opened this issue May 25, 2021 · 4 comments
Labels

Comments

@KelvonLi
Copy link

Hi,

I'm trying md5_mb performance to figure out if it also perform much better than open ssl when running with many multiple buffers.

And I changed the test code as below and built it and had a test.
It turned out that the performance was worse than open ssl, on both of test CPU platforms.
Not sure if you had similar test, is it expected?
And how should I improve its performance?
Thanks a lot!

#Test result:
/workspace/isa-l_crypto/tests/extended # ./md5_mb_over_4GB_test
md5_large_test
md5_openssl: runtime = 22236247 usecs, bandwidth 8 MB in 22.2362 sec = 0.38 MB/s
Starting updates
md5_ctx_mgr: runtime = 52901056 usecs, bandwidth 8 MB in 52.9011 sec = 0.16 MB/s

# Test code change
/workspace/isa-l_crypto/tests/extended # git diff md5_mb_over_4GB_test.c
#include "md5_mb.h"
#include "endian_helper.h"
#include <openssl/md5.h>
+#include "test.h"
+
#define TEST_LEN (10241024ull) //1M
#define TEST_BUFS MD5_MIN_LANES
+//#define TEST_BUFS MD5_MAX_LANES
#define ROTATION_TIMES 10000 //total length processing = TEST_LEN * ROTATION_TIMES
#define UPDATE_SIZE (13
MD5_BLOCK_SIZE)
#define LEN_TOTAL (TEST_LEN * ROTATION_TIMES)
@@ -54,6 +57,7 @@ int main(void)
uint32_t i, j, k, fail = 0;
unsigned char *bufs[TEST_BUFS];
struct user_data udata[TEST_BUFS];

  • struct perf start, stop;

    posix_memalign((void *)&mgr, 16, sizeof(MD5_HASH_CTX_MGR));
    md5_ctx_mgr_init(mgr);
    @@ -72,11 +76,17 @@ int main(void)
    }

    //Openssl MD5 update test

  • perf_start(&start);

  •   MD5_Init(&o_ctx);
      for (k = 0; k < ROTATION_TIMES; k++) {
              MD5_Update(&o_ctx, bufs[k % TEST_BUFS], TEST_LEN);
      }
      MD5_Final(digest_ref_upd, &o_ctx);
    
  • perf_stop(&stop);

  • printf("md5_openssl" ": ");

  • perf_print(stop, start, (long long)TEST_LEN * TEST_BUFS * 1);

    // Initialize pool
    for (i = 0; i < TEST_BUFS; i++) {
    @@ -86,6 +96,7 @@ int main(void)
    }

    printf("Starting updates\n");

  • perf_start(&start);
    int highest_pool_idx = 0;
    ctx = &ctxpool[highest_pool_idx++];
    while (ctx) {
    @@ -117,6 +128,11 @@ int main(void)
    ctx = md5_ctx_mgr_flush(mgr);
    }
    }

  • perf_stop(&stop);

  • printf("md5_ctx_mgr" ": ");

  • perf_print(stop, start, (long long)TEST_LEN * TEST_BUFS * 1);

    printf("multibuffer md5 digest: \n");
    for (i = 0; i < TEST_BUFS; i++) {

lines 6-62/62 (END)

@gbtucker
Copy link
Contributor

Hi @KelvonLi,

This example md5_mb_over_4GB_test.c is not meant as a performance test and in fact the multi-buffer part does a lot more work then the single-buffer check. It is processing TEST_BUFFS x the data than the single buffer by doing multiple jobs. At the end you may notice that it checks the multiple final digests created in the multi-buffer part against the one single buffer result as a check. I suggest you start with one of the included performance tests instead.

@KelvonLi
Copy link
Author

KelvonLi commented May 27, 2021

Hi @gbtucker ,

Thanks a lot for your reply.
I'm testing with one single buffer and also multiple buffer.
Here are some simple questions to ask:

  1. Are the md5_ctx_mgr_flush/md5_ctx_mgr_submit apis thread safe?

  2. To get the final md5 value of multiple buffers as one logic single buffer, does it have to use one single ctx and submit buffer one by one, right?
    I didn't find a way to leverage multiple ctxs(lanes) to calculate parallelly and generate one final md5 value.

My latest understanding is that, each ctx(lane) could only be used to calculate md5 at one moment and it should NOT be used until completed. Multiple ctxs(lanes) could run parallelly for each different md5 calculation.
Please correct me if I'm wrong.
Thanks a lot.

@gbtucker
Copy link
Contributor

Hi @KelvonLi,

For 1. all the functions are thread safe and reentrant. I would suggest one ctx per thread and take a look at the examples in examples/saturation_test for how to do this.

For 2. the lanes must have independent hash jobs to run in parallel. Because these are cryptographic hashes, there is no way to break up one hash job and run pieces concurrently beyond the fundamental block size.

@KelvonLi
Copy link
Author

Hi @gbtucker,
Thanks a ton for your replies and sharing!
I'll have some further study and test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants