-
Notifications
You must be signed in to change notification settings - Fork 942
Netty 4.2 upgrade + IO_URING support #6021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f39c3a2 to
647ae7a
Compare
| } | ||
| } | ||
|
|
||
| public static final boolean isIoUringAvailable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added via netty/netty#15785 a way to verify the status of the different optimizations made by io_uring on the OS - you can use it for reporting/debugging 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to use it, but I see now that it's so recent that it's not in a release yet. 😆
|
|
||
| if (useEpoll && CheckDependencies.isEpollAvailable()) { | ||
| if (useIoUring && CheckDependencies.isIoUringAvailable()) { | ||
| //IO_URING should default to 1 remotingThread unless specified in config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should use less than we use too with epoll - but 1 could be quite low.
And check netty/netty#15524 if can help to scale up the number of event loops if required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, setting it to 1 was a recommendation I found back from when netty io_uring was still an incubator project. I did some testing on this when I submitted the previous PR and saw that using 1 thread performed best up to something on the order of a few thousand connections and messages per second. After that it had to be increased but as I recall it, I saw no measurable improvement using more than just a few dedicated threads. Again, this was some time ago but I would guess up to ~5 threads or so on a decently sized server, say 16 cores.
Perhaps it should be set to something like Runtime.getRuntime().availableProcessors() or half that number or similar? Otherwise I guess the scaling option would make sense...
647ae7a to
878baf6
Compare
|
Very nice @jbertram ! Coincidentally, I was just about to send a PR for just the Netty upgrade parts of this and was planning to add IO_URING support on top of that later (as in, I have part of the commit message typed out). I stopped shy of submitting it when I noticed the exact same failure I'm seeing here now, on: |
|
@AntonRoskvist, yeah, I just noticed that as well. The full test-suite passes on this 100% but it was using JDK 17. Will need to do some investigation. Let me know if you make any determinations. |
|
@franz1981, on Java 25 with Netty 4.2.7 we're seeing this: The |
|
2 things:
The former worry me a bit because it is basically saying "I can zero a buffer which is limiting how much i can write into ti" and I don't remember why I've done it TBH .-. The latter can be addressed by https://netty.io/wiki/java-24-and-sun.misc.unsafe.html#netty-42-and-unsafe i.e. |
4b608c8 to
874b754
Compare
|
@franz1981, thanks! That got me moving in the right direction. 👍 |
92e643e to
42571ec
Compare
|
This is a fyi : netty/netty#14432 In theory is possible to extend the existing Uring support to includes files too, which enable faster performance then libaio since it allows using async fdatasync.
This can lead to few simplification including replacing the timed buffer with a batching mechanism running directly on the event loop - which accumulate them while waiting the last in flight fdatasync to complete - or the expected time credits to expire |
I don't fully understand the implication of this... say a user starts up a version of the broker using Netty 4.2, JDK 25 and does not allow I'm guessing it's specific to the test otherwise there should be some explicit check for these conditions when starting up the broker... |
|
IDK to be fair, that's why running a proper unsafe-forcibly disabled broker with NIO journal would reveal it |
|
This affects more things it seems... I just built this locally, then ran:
Java version:
|
42571ec to
bca155c
Compare
|
@franz1981, reading the doc you cited it seemed at first to me that all the broker would need is |
|
@AntonRoskvist, my last push contains some scripting to detect the JDK and add |
| mv $ARTEMIS_OOME_DUMP $ARTEMIS_OOME_DUMP.bkp | ||
| fi | ||
|
|
||
| JAVA_VERSION=`$JAVACMD -version 2>&1 | head -n 1 | awk -F '"' '{print $2}' | cut -d'.' -f1` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So probably this won't be final but this can be simplified a bit, doing something like:
$JAVACMD --sun-misc-unsafe-memory-access=allow --version >/dev/null && ALLOW_UNSAFE="--sun-misc-unsafe-memory-access=allow"
in essence, $JAVACMD --sun-misc-unsafe-memory-access=allow --version returns 0 if the version is > 24 and 1 otherwise, meaning if the option can be set, it will get set.
Could be set as part of an if statement for clarity as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot better than what I was using before. The main benefit is that it doesn't rely on head, awk, or cut which might not be available (especially in a container).
|
@jbertram yes, looking good. I think it's a strange behavior though, needing both flags. My understanding was that Netty sorts this out when starting up with the worst side effect being suboptimal performance in case it doesn't have access to |
|
I am working hard the last months to fix the performance of Netty without Unsafe. And is not finished yet 🙏 |
|
@franz1981, your efforts are much appreciated! Could you confirm whether applications using Netty 4.2 with JDK 25 should be using both It's worth noting that Artemis will already need |
|
Iouring and epoll are both using JNI - I suppose that is the reason why you need such. |
bca155c to
5b2bbff
Compare
5b2bbff to
38f4829
Compare
| mv $ARTEMIS_OOME_DUMP $ARTEMIS_OOME_DUMP.bkp | ||
| fi | ||
|
|
||
| # Netty needs access to unsafe, but this is turned off in Java 25 by default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment could be confusing/misleading later, since Unsafe itself is not turned off in Java 25 by default. It warns once on first use on 24+ but whats left of Unsafe (some has been removed already) is still usable. Setting --sun-misc-unsafe-memory-access=allow facilitates stopping printing the warning on Java 24+. On future JDKs, e.g perhaps 26+, that will not be permitted and the Unsafe methods will be disabled by default and the allowed fallback will become --sun-misc-unsafe-memory-access=warn to again permit them but always with the warning. On yet further future JDKs the Unsafe memory methods will be removed and --sun-misc-unsafe-memory-access will be ignored. At a later point --sun-misc-unsafe-memory-access will be removed (which may mean its presence prevents startup).
Netty 4.2 doesn't use Unsafe by default in Java 24+ so as to avoid the warning (whereas 4.1 still does use Unsafe by default [excepting 4.1.120-4.1.121], hence the warning mentioned on ARTEMIS-5711 when it uses Unsafe currently). Instead it uses the 'fallback MemorySegment' based cleaner on 25+ (it's broken on 24, so that seems to default to heap-buffers rather than direct buffers) rather than use Unsafe. Its possible to have it use a more performant alternative on 24+ by explicitly enabling native access (which is actually still enabled by default on 24+, for now, but again produces a warning, so Netty 4.2 doesn't do this by default to avoid those warnings, requiring the --enable-native-access flag). Yet alternatively again, you can make it just use Unsafe on Java 24+ [until they prevent 'allow', perhaps JDK26+] by explicitly specifying it can via --sun-misc-unsafe-memory-access=allow, or using the Netty-specific io.netty.noUnsafe prop.
https://github.com/netty/netty/blob/netty-4.2.7.Final/common/src/main/java/io/netty/util/internal/PlatformDependent0.java#L547-L588 plus the monster starting at https://github.com/netty/netty/blob/4.2/common/src/main/java/io/netty/util/internal/PlatformDependent0.java#L85
| fi | ||
|
|
||
| # Netty needs access to unsafe, but this is turned off in Java 25 by default | ||
| $JAVACMD --sun-misc-unsafe-memory-access=allow --version > /dev/null 2>&1 && ALLOW_UNSAFE="--sun-misc-unsafe-memory-access=allow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If netty is the only thing we need/decide to enable Unsafe for (dont know) on 25+, then I wonder if going with io.netty.noUnsafe sys prop might be preferable, as it at least currently seems like it wont require a second JVM startup and there would be no harm always specifying it on either older JDKs or future JDKs after its irrelevant.
Though, that would leave the warning on 25..
I still wonder about just leaving it not using Unsafe on 25. Seems like that should be possible, and that is the future beyond 25. If thats still not viable, perhaps we stick with 4.1 using Unsafe and its warnings for a bit longer until it is, and just try to ensure compatibility if an end user wants to use 4.2 (the milestone/RC builds of spring using Artemis with 4.2 suggest it mostly already is...thats the only 4.2 usage I'm personally aware of). Anyone not satisifed with no-Unsafe can always add the config themselves.
|
|
||
| if [ -z "$JAVA_ARGS" ]; then | ||
| JAVA_ARGS="-Dlog4j2.disableJmx=true --add-opens java.base/jdk.internal.misc=ALL-UNNAMED ${java-utility-opts}" | ||
| JAVA_ARGS="-Dlog4j2.disableJmx=true --add-opens java.base/jdk.internal.misc=ALL-UNNAMED --enable-native-access=ALL-UNNAMED ${java-utility-opts}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least its the case that --enable-native-access was already in JDK17 (as the initial FFM in 17 was an incubator addition there) so hopefully that means it works on all versions of JDK17+ as opposed to just post-GA backport releases, save us doing any checking.
| <plugin> | ||
| <groupId>org.apache.maven.plugins</groupId> | ||
| <artifactId>maven-surefire-plugin</artifactId> | ||
| <configuration> | ||
| <argLine>${activemq-surefire-argline}</argLine> | ||
| </configuration> | ||
| </plugin> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not already the case from the parent?
| String javaVersion = System.getProperty("java.version"); | ||
| if (javaVersion.startsWith("24") || javaVersion.startsWith("25")) { | ||
| commandList.add("--enable-native-access=ALL-UNNAMED"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps Runtime.version().feature(); would work here.
Restricting to just 24 and 25 for --enable-native-access probably doesnt make sense, its going to be useful or needed for all 26+
|
|
||
| @Test | ||
| public void shouldZeroesLimitedDirectByteBuffer() { | ||
| assumeTrue(PlatformDependent.hasUnsafe()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is possibly covering up a bug in either the test or more likely in the ByteUtil that was exposed by Unsafe not being available, and so this doesnt seem like the solution.
I dont think this test should need Unsafe, the behaviour just seems questionable. The very next test calls the same utility method, without Unsafe. It only behaves differently as the buffer passed isnt limited. Real code could call the ByeUtil method later without Unsafe and get unexpected behaviour, it wont have the benefit of covering itself with an assume.
EDIT: after typing that, I reminded myself of the time that this essentially already happened and I fixed it in the calling code:
https://issues.apache.org/jira/browse/ARTEMIS-4340
https://lists.apache.org/thread/7scdf4859scywos4knw44dbdw39wphcb
If Franz doesnt know why he did it, and it doesnt work without Unsafe anyway, I think it might be time we stopped it doing what its doing, or figure out why it did and how to adapt it to not having Unsafe..
|
|
||
| @LogMessage(id = 214037, value = "Unable to check IoUring availability ", level = LogMessage.Level.WARN) | ||
| void unableToCheckIoUringAvailability(Throwable e); | ||
|
|
||
| @LogMessage(id = 214038, value = "IoUring is not available, please add to the classpath or configure useIoUring=false to remove this warning", level = LogMessage.Level.WARN) | ||
| void unableToCheckIoUringAvailabilitynoClass(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these go in commons? Its annoying when methods like these log and suggest they are client issues, but are in fact being triggered by the broker while its acting as a server to clients. E.g the method right above these ones is especially confusing when logged by the broker.
| try { | ||
| return Env.isLinuxOs() && IoUring.isAvailable(); | ||
| } catch (NoClassDefFoundError noClassDefFoundError) { | ||
| ActiveMQClientLogger.LOGGER.unableToCheckIoUringAvailabilitynoClass(); | ||
| return false; | ||
| } catch (Throwable e) { | ||
| ActiveMQClientLogger.LOGGER.unableToCheckIoUringAvailability(e); | ||
| return false; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also feels like it would make more sense in commons (I know, the previous checks are in here; same comment to them)
| <dependency> | ||
| <groupId>io.netty</groupId> | ||
| <artifactId>netty-transport-native-io_uring</artifactId> | ||
| <classifier>${netty-transport-native-io_uring-classifier}</classifier> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>io.netty</groupId> | ||
| <artifactId>netty-transport-classes-io_uring</artifactId> | ||
| </dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I've commented on all the other io_uring PRs, I'd personally prefer folks had to add the deps if they want them, especially given its off by default, and its entirely untested.
Or at least for the client, if not both. The way its availability+use is done would need to change slightly to facilitate it to work without the classes dep though (elsewhere, I use a class per native type, and thats the only one to directly reference the types).
|
Beyond those comments, I was surprised I didnt see any test changes around the change in Netty default for hostname verification in 4.2. Either the Artemis code already handles [enabling and] disabling hostname in the required manner in advance to still work properly with the updated behaviour, which I dont recall seeing it do so, or perhaps it might suggest there is a missing test around explicitly disabling it. |
| <profile> | ||
| <id>jdk-17-23</id> | ||
| <activation> | ||
| <jdk>[17, 23]</jdk> | ||
| </activation> | ||
| <properties> | ||
| <activemq-surefire-argline> | ||
| ${initial-activemq-surefire-argline} | ||
| </activemq-surefire-argline> | ||
| </properties> | ||
| </profile> | ||
| <profile> | ||
| <id>jdk-25</id> | ||
| <activation> | ||
| <jdk>[24, 25]</jdk> | ||
| </activation> | ||
| <properties> | ||
| <activemq-surefire-argline> | ||
| ${initial-activemq-surefire-argline} | ||
| --enable-native-access=ALL-UNNAMED | ||
| --sun-misc-unsafe-memory-access=allow | ||
| </activemq-surefire-argline> | ||
| </properties> | ||
| </profile> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The jdk25 profile is also set for 24 so should probably mention that in the name.
It may be better to say [24, 26) rather than [24, 25], at least that was always the historic recommendation, but its possible it no longer matters thee days: https://maven.apache.org/guides/introduction/introduction-to-profiles.html#JDK
Nitpick, but the several existing jdk related profiles skip the "-" immediately after "jdk", so e.g "jdk24-25" would align better with them.
https://github.com/apache/activemq-artemis/blob/2.34.0/tests/integration-tests/pom.xml#L476-L484
That also shows the format used elsewhere by having a specific prop for the additions, that is set empty to begin with then referenced in the original base property, and then set to the alternative value in a profile when needed meaning you only need the higher-version profile(s) for the additions.
This is the latest in a series of related PRs. This replaces #5774.