Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(v2): Use AWS CRT as the default SDK client #1541

Closed

Conversation

jreijn
Copy link
Contributor

@jreijn jreijn commented Dec 22, 2023

#1092

Description of changes:

This PR changes the default SDK HTTP client from UrlConnectionClient to the AWS CRT client. The CRT client is a lightweight client which should reduce the cold start and latency of lambda functions.

Checklist

Breaking change checklist

RFC issue #:

  • Migration process documented
  • Implement warnings (if it can live side by side)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@scottgerring scottgerring changed the title refactor(v2) Use AWS CRT as the default SDK client feat: (v2) Use AWS CRT as the default SDK client Dec 22, 2023
@scottgerring scottgerring added this to the v2 milestone Dec 22, 2023
@scottgerring scottgerring added the v2 Version 2 label Dec 22, 2023
@pull-request-size pull-request-size bot added size/L and removed size/M labels Dec 22, 2023
@jeromevdl
Copy link
Contributor

Thanks @jreijn, looks great! Thanks to the new sync API, it's not so much change, that's a good news.

Question/ask: could you do a load test on the parameter sample (maybe the easiest to do) before and after CRT? Feel free to say no and I'll do it later.

@jeromevdl
Copy link
Contributor

Build is failing, problem in DynamoDBWriter in batch module.

@jeromevdl jeromevdl changed the title feat: (v2) Use AWS CRT as the default SDK client feat(v2): Use AWS CRT as the default SDK client Dec 22, 2023
@jreijn
Copy link
Contributor Author

jreijn commented Dec 22, 2023

@jeromevdl thanks! Yes the change is quite simple now. I'll look into the issues. I've just setup a dev linux box to see if I can run the tests locally as it seems to fail on my MacBook book due to some docker issue.

@scottgerring
Copy link
Contributor

@jeromevdl thanks! Yes the change is quite simple now. I'll look into the issues. I've just setup a dev linux box to see if I can run the tests locally as it seems to fail on my MacBook book due to some docker issue.

It's the test in the CDK example, which is autogenerated and doesn't really test anything. I'd like to remove it in #1542

@jreijn
Copy link
Contributor Author

jreijn commented Dec 22, 2023

Thanks @jreijn, looks great! Thanks to the new sync API, it's not so much change, that's a good news.

Question/ask: could you do a load test on the parameter sample (maybe the easiest to do) before and after CRT? Feel free to say no and I'll do it later.

@jeromevdl Do we already have something in place for doing the load test? Or should I just run something on my own?

Copy link

sonarcloud bot commented Dec 22, 2023

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@jreijn
Copy link
Contributor Author

jreijn commented Dec 22, 2023

@jeromevdl I've ran a load test with artillery on the parameters module and the results are mentioned below.

I've used the parameters example with 1 slight improvement (tiered compilation) against the current v1 (main) branch with the connection client and this PR branch with the aws crt client. It's using 512MB. I've created two reports:

  1. Duration
  2. Memory

I used CW Logs insights with the following 2 queries:

filter @type="REPORT"
| fields greatest(@initDuration, 0) + @duration as duration, ispresent(@initDuration) as coldStart
| stats count(*) as count, pct(duration, 50) as p50, pct(duration, 90) as p90, pct(duration, 99) as p99, max(duration) as max by coldStart
filter @type = "REPORT"
| stats max(@memorySize / 1000 / 1000) as provisionedMemoryMB,
  min(@maxMemoryUsed / 1000 / 1000) as smallestMemoryRequestMB,
  avg(@maxMemoryUsed / 1000 / 1000) as avgMemoryUsedMB,
  max(@maxMemoryUsed / 1000 / 1000) as maxMemoryUsedMB,
  provisionedMemoryMB - maxMemoryUsedMB as overProvisionedMB

Duration

For the main branch (v1) while using the url-connection-client I get the following results:

coldStart count p50 p90 p99 max
0 17909 3.1752 4.9216 31.5072 12609.12
1 91 3670.57 6515.75 7931.03 7931.03

For the v2 branch while using the aws-crt-client I get the following results:

coldStart count p50 p90 p99 max
0 17902 3.125 4.7294 24.0299 12774.03
1 98 4634.31 8409.65 9343.73 9343.73

Memory

v1 with url-connection-client

provisionedMemoryMB smallestMemoryRequestMB avgMemoryUsedMB maxMemoryUsedMB overProvisionedMB
512 107 117.3101 183 329

v2 with aws-crt-client

provisionedMemoryMB smallestMemoryRequestMB avgMemoryUsedMB maxMemoryUsedMB overProvisionedMB
512 117 128.9981 188 324

I expected different results and I noticed that the example was not functioning due to some changes in the v2 branch which I fixed in a commit. There are still some messages during a coldstart for the LambdaJsonLayout of the log4j configuration and a mention about a missing version, which I will try to look at somewhere over the next couple of days.

I also see that the parameters example is using an amazon API call to https://checkip.amazonaws.com. Not sure if that impacts the results, but will try to analyse further.

As the crt sync client should also work with the v1 branch (by providing a client) I'll also try and test that to see if changes the v2 impact the results.

@jeromevdl
Copy link
Contributor

Thanks for the tests. I didn't expect such results... It's disappointing. @msailes @scottgerring any thoughts? Do you think the asynchronous version would be better?

@msailes
Copy link
Contributor

msailes commented Dec 23, 2023

I don't think this is expected, one to ask Zoe Wang, when she's next working.

@jreijn
Copy link
Contributor Author

jreijn commented Dec 28, 2023

I've also tested with the sync crt client in the v1 branch and I see similar behavior. It makes me wonder about the impact compared to the url connection client. In the announcement post on the aws blog for the crt client we see the comparison to the netty client which is a more heavy weight client. Looking forward to some insights from the team that worked on the aws crt client.

@scottgerring
Copy link
Contributor

@jreijn thanks for the extensive testing!
I think we must be missing something. We will loop Zoe in once she's back on deck after the break. Failing that we probably need to hook up Codeguru to it and get some profiling data out.

@jreijn
Copy link
Contributor Author

jreijn commented Dec 28, 2023

@scottgerring great suggestion for code guru. I'll see if I can find some over the next week to hook it up and find out what I can see about it.

@scottgerring
Copy link
Contributor

It seems like this PR will also resolve #1544 !

@scottgerring
Copy link
Contributor

scottgerring commented Dec 29, 2023

I've used this as an opportunity to have a play with CodeGuru.

V2 current branch - No CRT Client - Java 11 Runtime
Screenshot 2023-12-29 at 15 21 06
Screenshot 2023-12-29 at 15 22 04

CRT version @jreijn branch - Java 11 Runtime
Screenshot 2023-12-29 at 15 59 15

I don't think the above ⏫ is that useful - 1/ you can see in the CRT chart, the frames are all to do with fetching the checkmyip website, and 2/ the CRT usage is performed once, in the class constructor, not per-request. I am putting a branch together to make the runtime depend more on AWS SDK usage.

@jreijn
Copy link
Contributor Author

jreijn commented Dec 29, 2023

It seems like this PR will also resolve #1544 !

Yes. I wasn't sure if the issue was known so solved it in this PR for now.

@jreijn
Copy link
Contributor Author

jreijn commented Jan 5, 2024

Thanks @scottgerring . I'll keep an eye on the issue. I'm still enjoying some time off so if something changes I'll chime in again.

@jreijn
Copy link
Contributor Author

jreijn commented Jan 5, 2024

@jreijn / @jeromevdl in the meantime, I suggest we do something like this:

1/ agree on a more indicative test. How do you both feel about this commit I pushed on the branch I referenced? It removes the call to the external website, and initializes the CRT client in the handler itself (so it happens per-request).

2/ Re-use Jeroen's existing cloudwatch cloudstart/warm-start based analysis process above with this test for v2 and the crt branches. I think this actually captures exactly what we want to see, but calling out to the external website muddies the waters.

At this point we'll have an idea on what the performance impact is for both cold/warm scenarios.

What do you both think?

@scottgerring i just noticed I missed this comment.

With regards to:

1 I fully agree. The remote call makes the test less precise due to the latency from the remote call.

2 sounds like a plan. If you want me to incorporate the changes and try again let me know!

@zoewangg
Copy link

zoewangg commented Jan 9, 2024

Hi @jreijn, I'm on the AWS SDK for Java team 🙂. We performed tests to compare HTTP clients coldstart latency on Lambda and from our tests, switching to AWS CRT sync HTTP client from UrlConnectionHttpClient reduced 35% startup latency. AWS CRT sync has slight higher startup latency (6%) compared to AWS CRT async HTTP client and we'll look into that. The Lambda memory setting was 512MB and we were just sending one single ListBuckets request with S3 SDK. LMK if you have any questions and I'm interested to see the new test result!

@jreijn
Copy link
Contributor Author

jreijn commented Jan 9, 2024

Thanks @zoewangg ! I'll try to make some time today or tomorrow to take it for another test run and see if we can see similar results. Was there a specific java version you used in your tests?

@zoewangg
Copy link

I was using Java 8.

@scottgerring
Copy link
Contributor

Hey @jreijn sorry for the slow reply - took some time off over the break with the family!

If you have the time and the inclination that would of course be great! I'm aware this one is getting a bit more fiddly than initially expected - @jeromevdl and I can also jump in put some time into it if you'd like

@jreijn
Copy link
Contributor Author

jreijn commented Jan 16, 2024

@scottgerring i hope you had some wonderful time with the family. I don't mind spending some more time, but it's good to share experiences. I did a test 2 days ago where I incorporated most of your changes (from the branch) and ran a new test. The results did not differ that much and I'm planning to do another run tomorrow before I post the results here just to check for any mistakes I might have made. I'll send an update tomorrow!

@scottgerring
Copy link
Contributor

Hey @jreijn sounds good (although a bit surprised about the results). Looking forward to seeing the numbers.

@jreijn
Copy link
Contributor Author

jreijn commented Jan 24, 2024

I did a retest today to see the difference. I've incorporated some your changes @scottgerring into this PR for clarity. I'm a bit surprised by the memory results.

Duration v2-with-crt-client (AWS SDK 2.22.0)

coldStart count p50 p90 p99 max
0 2950 1.7087 2.3787 7.0289 122.65
1 50 4555.15 4711.33 4802.67 4802.67

Memory usage (AWS SDK 2.22.0)

provisionedMemoryMB smallestMemoryRequestMB avgMemoryUsedMB maxMemoryUsedMB overProvisionedMB
512 173 177.8377 179 333

Duration (AWS SDK 2.23.9)

coldStart count p50 p90 p99 max
0 2745 1.5186 2.1981 6.2845 42.02
1 31 3069.25 3247.63 3306.67 3306.67

Memory (AWS SDK 2.23.9)

provisionedMemoryMB smallestMemoryRequestMB avgMemoryUsedMB maxMemoryUsedMB overProvisionedMB
512 169 173.5416 175 337

I'll try to create a heap dump of the JVM to analyse the what's actually in the heap and post an update later.

Update

I've analyzed a heap dump of about (28 MB). The amount of live objects is very small. Not sure how this relates to the 169 MB reported by Lambda itself. @scottgerring @jeromevdl any ideas?

@scottgerring
Copy link
Contributor

Hey @jreijn great - thanks for the dive deep!

I did a retest today to see the difference.

So .... we can see the remote call latency is down, but the overall picture hasn't really changed - the CRT is still slower both with and without cold starts. Memory usage seems similar.

We'll reach out to the team again on our side, it still feels like we are missing something!

Would you be able to give us the exact lambda configuration you are using - region, runtime, size?

I've analyzed a heap dump of about (28 MB). The amount of live objects is very small. Not sure how this relates to the 169 MB reported by Lambda itself. @scottgerring @jeromevdl any ideas?

I believe the lambda-reported usage will be the "OS view" and include the java runtime as well, not just the JVM-allocated heap. I would guess this accounts for the discrepancy.

Copy link

sonarcloud bot commented Jan 25, 2024

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@jreijn
Copy link
Contributor Author

jreijn commented Jan 25, 2024

I was using the sam template for deployment.

region: eu-west-1
runtime: java11
memory size: 512MB

@scottgerring
Copy link
Contributor

Hey @jreijn ,

Once again apologies for the high turnaround here. @jeromevdl , @zoewangg and I had an impromptu chat about this to see if there was anything obvious to point at, and came up empty-handed. I'd like to take some time to sit with this myself and dive into it - i've set time aside tomorrow for this. I will loop back with findings and hopefully some reproducible unit of testing we can use to measure these and other perf-impacting decisions for PT.

@scottgerring
Copy link
Contributor

@jreijn , @jeromevdl : I think i've gotten to the bottom of it! It's the size of the output JAR; using the CRT client by default includes a pile of sos for every platform it supports, which results in a function JAR that is 2.3x the size of the non-CRT jar.

Detailed numbers here, testing bits here (i'll clean this up a bit so others can use it)

Stripping back (removing all the shared objects for unneeded platforms) the JAR to the minimum reduces cold start time below the URLHttpClient:

  • CRT-Stripped Avg: 4.45. (lambda size = 15.9 MB)
  • CRT Normal Avg: 6.19, (lambda size = 32.73 MB)
  • URL Http Client: 6.49 (lambda size = 13.915 MB)

CRT-normal and URL client are very close to the same. This is close to what what you measured, @jreijn - yours are a little higher, mine are a little lower.

I note that in the maven repo, there are jar variants for all the different combos of arch/runtime. Turns out - you can use the "classifier" to get the right one - details here.

I haven't played with this yet, but it seems like there will be an onus on us to document this clearly in Powertools as part of the "getting started" guide - otherwise simply including the CRT as the default means that v2 may be slower than v1 because of all the unneeded libraries.

@jeromevdl
Copy link
Contributor

jeromevdl commented Feb 16, 2024

@scottgerring thanks for your great tests.

Looking at the jar sizes in maven, it's clear that the generic crt is quite big compared to the others 17 Mb vs ~2 Mb for the specific versions. Did you test one of these specific versions ?

If I'm not mistaken, we need either:

<dependency>
    <groupId>software.amazon.awssdk.crt</groupId>
    <artifactId>aws-crt</artifactId>
    <version>0.29.10</version>
    <classifier>linux-aarch_64</classifier> <!-- arm64 -->
</dependency>

Or:

<dependency>
    <groupId>software.amazon.awssdk.crt</groupId>
    <artifactId>aws-crt</artifactId>
    <version>0.29.10</version>
    <classifier>linux-x86_64</classifier>  <!-- x86_64 -->
</dependency>

Problem is that we can't set this in Powertools, so we need to document it... In PT, we can use

<dependency>
    <groupId>software.amazon.awssdk.crt</groupId>
    <artifactId>aws-crt</artifactId>
    <version>0.29.10</version>
    <scope>provided</scope>
</dependency>

I don't really like the developer experience, but if we can get better performance, that's probably worth it.

@scottgerring
Copy link
Contributor

scottgerring commented Feb 19, 2024

If I'm not mistaken, we need either:

not directly; i recreated a specific variant by hand by stripping the "full" JAR, before I realised you can use the classifier. We need to test that and make sure it works. I think the only trick is going to be ensuring that maven does what we think it will!

@jeromevdl
Copy link
Contributor

I think the only trick is going to be ensuring that maven does what we think it will!

Maven will do what it's supposed to do if you specify the classifier... Test it but it should work.

Wondering if we could simplify this somehow, rather than just documenting... We'll build a powertools-maven-plugin that reads in your IaC to evaluate the architecture, the target java runtime and choose the best dependencies ;)

@jreijn
Copy link
Contributor Author

jreijn commented Feb 19, 2024

@scottgerring wow nice find! I was curious if the package size had any impact after reading this post, but I did not get to it yet.

In my local branch I made some changes and marked the aws-crt-client in the PT parameters module to scope provided ike @jeromevdl suggested and included the aws-crt-client and the aws-crt library the with classifier in the example project, but I was not very pleased by the developer experience.

This is mainly because the following observations:

  • you need to explicitly set the aws http client in your project (parameters example) by adding the dependencies . Of course we can document this, but it's not a seamless experience with just adding a single dependency (pt-module) to a project and setting the aspectj plugin.
  • you need to make sure you keep the version of the aws-crt library in combination with classifier in sync with the version used by the aws-crt-client. In your project you will need to exclude the aws-crt transitive dependency from the aws-crt-client like and include the correct aws-crt dependency (runtime specific):
        <dependency>
            <groupId>software.amazon.awssdk.crt</groupId>
            <artifactId>aws-crt</artifactId>
            <version>0.29.9</version>
            <classifier>linux-x86_64</classifier>
        </dependency>

        <dependency>
            <groupId>software.amazon.awssdk</groupId>
            <artifactId>aws-crt-client</artifactId>
            <version>2.23.21</version>
            <exclusions>
                <exclusion>
                    <groupId>software.amazon.awssdk.crt</groupId>
                    <artifactId>aws-crt</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

otherwise you can/will end up with both versions on the class path (I noticed this after the sam build).

  • If your local environment is different from the target runtime environment this will become a bit of an hassle and you will probably need multiple maven profiles to be able to locally test your function. (I have a Macbook) and the lambda is targeted to run on linux.

It makes me wonder if we should use the CRT client as the default client, as overall the developer experience will be less. Perhaps it's better to leave the url connection client as the default as it offers quite a good performance over dev experience . We could add documentation on how to use the CRT client for the local/target runtime for the users that want to try and squeeze the most out of their function, but it will be an explicit decision made by the developer to take on the extra burden of the local development vs remote and the dependency synch.

Now for some good news.

I did another run with the classifier and I see some very promising results compared to the previous benchmark(s).

The following table is for the v2 crt client with classifier and latest version of the sdk (2.23.21) and crt library (0.29.9) :

coldStart count p50 p90 p99 avg min max
0 2967 1.5493 2.1589 5.5797 1.7932 1.07 40.57
1 33 3170.56 3310.47 3433.73 3097.1491 2507.71 3433.73

@jeromevdl
Copy link
Contributor

Thanks @jreijn. I agree the devX is quite bad and we probably should not do this by default but document it so that people who want better performance can do it...
@scottgerring @msailes wdyt?

@scottgerring
Copy link
Contributor

scottgerring commented Feb 20, 2024

Yeah, the developer experience isn't great. I didn't even consider this:

probably need multiple maven profiles to be able to locally test your function. (I have a Macbook)

... plus you often want flexibility to mix ARM64/AMD64 on the AWS side.

Considering our tenets 1/ Keep it lean and 2/ Work backwards from the community ... I think it is clear we should document how to use the CRT client (including what we've learnt here in terms of JAR-size minimization) and 2/ not use the CRT by default. The extra complexity it pushes onto the developer at the start of the journey outweighs the benefit in our case IMHO.

Open question: Is UrlConnectionHttpClient the best choice for us? Looking at this flowchart it seems like it is - minimize latency over throughput, minimize bundle size - but I think now is a good opportunity to make sure we're all on the same page @jreijn @jeromevdl @msailes !

@msailes
Copy link
Contributor

msailes commented Feb 20, 2024

I agree that we should leave CRT for now. At best we document how to use it as an optimization.

@scottgerring
Copy link
Contributor

@jreijn - FYI - we have passed over our collective findings to the CRT team, who mentioned they have been thinking about this. We will loop back when we have any further interesting information to share!

In the meantime, would you like to work on the documentation improvements? I appreciate this isn't as glamorous as code, and we can certainly pick it up on our side if not :)

@jreijn
Copy link
Contributor Author

jreijn commented Feb 21, 2024

@scottgerring great! Yes, I was already looking into adding some documentation. I will probably do that in a separate PR if that's ok.

@scottgerring
Copy link
Contributor

@jreijn sounds good! There's a bit of accumulated history on this one. I'll close this and link the findings on the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L v2 Version 2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants