-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] Improve cucumber expression creation performance #202
Conversation
Added entry for performance improvement
Replace issue link by PR link.
Hi @mpkorstanje, is anything missing in the PR ? Should I provide more information about the algorithm ? |
No I'm just backlogged for a bit. |
java/src/main/java/io/cucumber/cucumberexpressions/CucumberExpression.java
Outdated
Show resolved
Hide resolved
int length = text.length(); | ||
StringBuilder sb = new StringBuilder(length * 2); // prevent resizes | ||
int maxChar = CHAR_TO_ESCAPE.length; | ||
for (int i = 0; i < length; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in practice expressions with characters that need to be escaped are rare. Would it make sense to check the entire string for characters that need to be replaced first? It would save an array copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we rarely escape in practice, the situation is a bit different. We need to benchmark the variants based on the escaping frequency, not only on pure escaping performance.
I've implemented two other variants, so to summarize:
Variant | Description |
---|---|
escapeRegex0 | Pattern (cucumber 7.11.0) |
escapeRegex1 | append(char) with 1 array allocation and 1 array copy |
escapeRegex2 | append(String) with 1 array allocation and 1 array copy |
escapeRegex3 | append(char) with 1 array allocation and 0..1 array copy |
escapeRegex4 | append(String) with 0..1 array allocation and 0..1 array copy |
I benchmarked these variants with JMH for varying escaping frequencies:
The variant escapeRegex4
outperforms the other variants for all escaping frequencies. Depending on escaping frequency, this variant is 1.2 to 3 times faster the cucumber 7.11.0 implementation. The code is a bit more complex, but still understandable, so I think we can use this variant:
public static String escapeRegex4(String text) {
int length = text.length();
StringBuilder sb = null;
int blocStart=0;
int maxChar = CHAR_TO_ESCAPE.length;
for (int i = 0; i < length; i++) {
char currentChar = text.charAt(i);
if (currentChar < maxChar && CHAR_TO_ESCAPE[currentChar]) {
if (sb == null) {
sb = new StringBuilder(length * 2);
}
if (i > blocStart) {
// flush previous block
sb.append(text, blocStart, i);
}
sb.append('\\');
sb.append(currentChar);
blocStart=i+1;
}
}
if (sb != null) {
if (length > blocStart) {
// flush remaining characters
sb.append(text, blocStart, length);
}
return sb.toString();
}
return text;
}
In real conditions, on my project with about ~400 test scenarios runs in about 8.5 seconds. The InteliJ profiler says escapeRegex0
takes 3.37% of the total CPU time (=280 ms), while escapeRegex4
takes 0.97% of the total (=80 ms). That's a 200 ms improvement (2.3%): not a lot on one build (I was hoping for more), but worth the effort when considering the number of builds on all cucumber users.
For completeness, the escapeRegex3
variant is:
public static String escapeRegex3(String text) {
int length = text.length();
StringBuilder sb = new StringBuilder(length * 2); // prevent resizes
int maxChar = CHAR_TO_ESCAPE.length;
boolean escaped = false;
for (int i = 0; i < length; i++) {
char currentChar = text.charAt(i);
if (currentChar < maxChar && CHAR_TO_ESCAPE[currentChar]) {
sb.append('\\');
escaped = true;
}
sb.append(currentChar);
}
return escaped ? sb.toString() : text;
}
Moved `escapeRegex` method to a private package class. Improved code performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just some nitpicks!
java/src/main/java/io/cucumber/cucumberexpressions/RegexpUtils.java
Outdated
Show resolved
Hide resolved
java/src/main/java/io/cucumber/cucumberexpressions/CucumberExpression.java
Outdated
Show resolved
Hide resolved
@jkronegg would you be interested in helping out with cucumber/cucumber-jvm#2542. It will eventually enable us to discover all step definitions once, and because of that build all cucumber expressions once (per language) which should significantly improve performance. Though I can imagine that a larger contribution with a longer commitment isn't quite what you are looking for, so if you'd decline that is perfectly understandable. |
Reformatting and variable renaming.
java/src/main/java/io/cucumber/cucumberexpressions/RegexpUtils.java
Outdated
Show resolved
Hide resolved
java/src/main/java/io/cucumber/cucumberexpressions/RegexpUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge and release this soonish.
CucumberExpression.escapeRegex
performance improvement
@mpkorstanje sorry for the long response delay. My employer (https://www.mobiliere.ch/) allows me to work up to 10% of my labor time for the Cucumber framework. I still need to manage the details, but I will be happy to help the Cucumber community. |
Cheers. I'll see if I can write something up to help you get started. I'm unfortunately a bit short on time at the moment. |
🤔 What's changed?
Improved
CucumberExpression.escapeRegex(String)
performance as proposed in #200. Calls toescapeRegex
are about 7 times faster and the overall performance improvement onCucumberExpression
new instance is about 7%.⚡️ What's your motivation?
Fixes #200
🏷️ What kind of change is this?
♻️ Anything particular you want feedback on?
Unit tests have been executed on the previous implementation and proposed implementation to prevent regressions.
📋 Checklist: