Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single step expression is compiled into regex more than once (thus it makes test execution slower) #2035

Open
maio opened this issue Jun 26, 2020 · 9 comments
Labels
🐛 bug Defect / Bug ✅ accepted The core team has agreed that it is a good idea to fix this

Comments

@maio
Copy link

maio commented Jun 26, 2020

It seems that all step expressions are re-compiled into regexes before each scenario.

Steps to reproduce the behavior:

  1. Given I have feature file like:
Feature: Whatever

Scenario: S1
  Given foo

Scenario: S2
  Given foo
  1. and I implement foo step e.g.
@Given("foo")
public void foo() {}
  1. When I execute the feature file
  2. Then io.cucumber.core.stepexpression.StepExpressionFactory.createExpression method is executed twice with "foo" expression (because there are two scenarios)
  3. therefore "foo" is compiled into regex twice

Expected behavior

"foo" is compiled only once.

Context & Motivation

I noticed that if I remove all bodies of my steps from sources it still takes considerable amount of time to run my tests (e.g., 2.2 seconds for 186 scenarios). Using Java Flight Recorder I collected some profiling data and it seems that a lot of time is being spent in io.cucumber.core.stepexpression.StepExpressionFactory.createExpression (47% in my case according to flame graph - see below). When I examined that method in debugger I noticed that it's being called more than once with same expressionString parameter.

Screenshots

image

Your Environment

  • Versions used 6.1.2
  • Linux, Gradle, Kotlin
@maio maio added the 🐛 bug Defect / Bug label Jun 26, 2020
@maio
Copy link
Author

maio commented Jun 26, 2020

My Java/cucumber-jvm knowledge is pretty low, but I managed to implement a quickfix/hack for this by introducing expression cache in crateExpression method, and it reduced my tests execution time from 2.2s to 1.5s.

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Jun 26, 2020

Heya thanks for reporting this. A solution may be a bit more complicated then adding a cache. The type registry from which cucumber expressions are compiled is stateful.

It is possible that same expression string results in a different regex. For example the expression {double} will result in different expressions when used in English and French features.Another issue is that different scenario executions may register different parameter types and as a result may generate different regexps from an expression. While not recommend it can't be ruled out. This means that the cache key would be fairly complex and depend on the state of the parameter type registry which is non-trivial and not exposed.

I do have some long term plans to add some immutability to the runner and glue but it needs major change to the way backends discover and register glue. So I don't think this will be fixed any time soon.

@mpkorstanje
Copy link
Contributor

Some improvements may also come from implementing a proper parser for cucumber expressions rather the rewriting using regexes (see cucumber/common#771) but that too is a long term project.

@maio
Copy link
Author

maio commented Jun 27, 2020

Thanks for quick and insightful reply @mpkorstanje. Feel free to close this issue if it's not going to be fixed anytime soon.

PS: Thanks for Cucumber guys! It's pleasure to work with.

@mpkorstanje
Copy link
Contributor

Your welcome!

I'll keep the issue around as a reminder to benchmark this in the future.

@hakanai
Copy link

hakanai commented Aug 27, 2021

If it were me,

  1. Find all calls to Pattern.compile(String).
  2. Replace with Patterns.compile(String).
  3. Implement like so:
public static void Patterns {
    private static final Map<String, Pattern> store = new ConcurrentHashMap<>();

    private Patterns() {}

    public static Pattern compile(String pattern) {
        return store.computeIfAbsent​(pattern, Pattern::compile);
    }
    
    public static Pattern compile(String pattern, int flags) {
        // ... look at flags, prepend to string pattern as (?x), delegate
        // alternatively, have a HashKey class contain both the pattern and the flags
    }
}

@mpkorstanje
Copy link
Contributor

Once cucumber-java8 is removed Cucumber will only have to create regular expressions once for each language used. That way we can avoid a static map and all the problems that come with that.

@jkronegg
Copy link
Contributor

jkronegg commented Dec 5, 2022

I can confirm that all step expressions are recreated at each test scenario (for one of my project with about 400 test scenarios, expressions are recreated about 400 times). Also I understand the issue of caching with respect to the parameter type registry.

Thus, I tested a cached version of my proposed ExpressionFactory improvement cucumber/cucumber-expressions#185 / cucumber/cucumber-expressions#187 . This cached version takes the parameter type registry into account (i.e. Map<ParameterTypeRegistry, Map<String, Expression>>). I measured the duration of my ~400 test scenarios over 10 runs and got on average:

  • without caching: 12s 851ms
  • with caching: 12s 923ms

I can conclude that adding caching made no significant improvement. Thus, I'm not sure having such caching is the way to go to improve the performance.

For completeness, there is the code with caching:

public final class ExpressionFactory {
    private static final Pattern PARAMETER_PATTERN = Pattern.compile("((?:\\\\){0,2})\\{([^}]*)\\}");
    private static final Map<ParameterTypeRegistry, Map<String, Expression>> expressionsByRegistry = new HashMap<>();

    private final ParameterTypeRegistry parameterTypeRegistry;
    private final Map<String, Expression> expressions;

    public ExpressionFactory(ParameterTypeRegistry parameterTypeRegistry) {
        this.parameterTypeRegistry = parameterTypeRegistry;
        expressions = expressionsByRegistry.computeIfAbsent(this.parameterTypeRegistry, key -> new HashMap<>());
    }

    public Expression createExpression(String expressionString) {
        Expression expression = expressions.get(expressionString);
        if (expression!=null) {
            return expression;
        }

        int length = expressionString.length();
        int lastCharIndex = 0;
        char firstChar = 0;
        char lastChar = 0;
        if (length > 0) {
            lastCharIndex = length - 1;
            firstChar = expressionString.charAt(0);
            lastChar = expressionString.charAt(lastCharIndex);
        }
        if (firstChar == '^' || lastChar == '$') {
            // java regexp => create from regexp
            expression = this.createRegularExpressionWithAnchors(expressionString);
        } else if (firstChar == '/' && lastChar == '/') {
            // script style regexp => create from regexp
            expression = new RegularExpression(Pattern.compile(expressionString.substring(1, lastCharIndex)), this.parameterTypeRegistry);
        } else {
            // otherwise create cucumber style expression
            expression = new CucumberExpression(expressionString, this.parameterTypeRegistry);
        }
        expressions.put(expressionString, expression);
        return expression;
    }

    private RegularExpression createRegularExpressionWithAnchors(String expressionString) {
        try {
            return new RegularExpression(Pattern.compile(expressionString), this.parameterTypeRegistry);
        } catch (PatternSyntaxException var3) {
            if (PARAMETER_PATTERN.matcher(expressionString).find()) {
                throw new CucumberExpressionException("You cannot use anchors (^ or $) in Cucumber Expressions. Please remove them from " + expressionString, var3);
            } else {
                throw var3;
            }
        }
    }
}

@mpkorstanje
Copy link
Contributor

I can conclude that adding caching made no significant improvement. Thus, I'm not sure having such caching is the way to go to improve the performance.

Currently we have to have to rebuild all Cucumber expressions for each scenario because cucumber-java8 defines step definitions at test execution time (actually, after the test has already started). This means that each scenario execution can contain a different set of cucumber expressions. If/once cucumber-java8 is removed all we can cache expressions by language.

But #2174 and #2279 will take a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Defect / Bug ✅ accepted The core team has agreed that it is a good idea to fix this
Projects
None yet
Development

No branches or pull requests

5 participants