Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Camunda 7 scripting - race conditions across process-instances while setting a variable #4449

Open
1 task
DumboJetEngine opened this issue Jun 24, 2024 · 7 comments
Assignees
Labels
type:bug Issues that describe a user-facing bug in the project.

Comments

@DumboJetEngine
Copy link

DumboJetEngine commented Jun 24, 2024

Environment (Required on creation)

  • Win11 Pro
  • camunda-bpm-run-7.21.0
  • jdk-17.0.10

( I have customized pretty much nothing on Camunda. It uses the default H2 database, as far as I understand. Isn't that database a valid production candidate? )

Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket)

I have a Camunda 7 workflow with a sub-process that accesses parent variables.
Specifically, the parent does this in a script task (groovy code):

class Result {
    Boolean canProceed
    String errorMessage
}

Init();

def Init()
{
    def result = execution.hasVariable("result");
    if(result == false)
    {
        result = new Result();
//        execution.removeVariable("result");
        execution.setVariable("result", result);
    }
}

def setError(errorMessage)
{
    Init();
    def result = execution.getVariable("result");
    result.canProceed = false;
    result.errorMessage = errorMessage;
}

def setSuccess()
{
    Init();
    def result = execution.getVariable("result");
    result.canProceed = true;
    result.errorMessage = null;
}

And the sub-process calls the setSuccess() and setError("sth") functions.

The workflow does not contain any user tasks, so once I start a process instance, it executes and it gets done/destroyed, after I get the result variables back.

All is working fine when I call the workflow once at a time. But when I bombard it with parallel calls (each time creating a new process instance), then I get weird errors revolving around variables.

This is the C# code that calls the workflow in parallel (using the latest Camunda.Api.Client nuget package):

var camunda = CamundaClient.Create("http://localhost:8080/engine-rest");
var pd = camunda.ProcessDefinitions.ByKey("name");

var actionNames = new string[] {
	"delete",
	"delete",
	...
	...
};

var repetition = 0;
var parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = 10 };
await Parallel.ForEachAsync(actionNames, parallelOptions, async (actionName, ct) => {
	var result = await PerformAction(actionName: actionName);
});

async Task<Result> PerformAction(string actionName)
{
	var businessKey = $"temp-id:{Guid.NewGuid()}";

	var camundaResult = await pd.StartProcessInstance(new Camunda.Api.Client.ProcessDefinition.StartProcessInstance
	{
		BusinessKey = businessKey,
		Variables =
		{
			{ "action", VariableValue.FromObject(new { name = actionName }) },
		},
		WithVariablesInReturn = true,
	});

	var processResultJson = camundaResult.Variables["result"]?.Value as string;
	var result = JsonSerializer.Deserialize<Result>(processResultJson, new JsonSerializerOptions { PropertyNameCaseInsensitive = true });
	return result;
}

When MaxDegreeOfParallelism is bigger than 1, I get all kinds of unexpected errors, like:

  • java.lang.NullPointerException
  • org.camunda.bpm.engine.ProcessEngineException: ENGINE-17004 Cannot add variable instance with name result. Variable already exists'
  • java.lang.NullPointerException: Cannot invoke "org.camunda.bpm.engine.impl.persistence.entity.ExecutionEntity.getParentActivityInstanceId()" because "scopeExecution" is null

Here are some stack traces (they were too long to post here): https://mega.nz/file/cR0VSLDA#W1f_a4Xxs6OI0hREfAc1SBwx0nzvDrQb3Jn1qPDoGpE

And here is the workflow file: https://mega.nz/file/NM1kDDxK#zsiqoi-7meHYLV4cYl7Qa9CAmQdtyFAW3aMf4cNAKA0

Steps to reproduce (Required on creation)

  • Download the workflow file.
  • Run Camunda 7 and deploy the workflow.
  • Paste the C# code in a new Console App project (I use Visual Studio 2022) and add the Camunda.Api.Client nuget package to the project.
  • In the variable actionNames, make sure there are many (> 100) strings.
  • Run the app with MaxDegreeOfParallelism = 1 and then with MaxDegreeOfParallelism = 10, to see the difference.

Observed Behavior (Required on creation)

When using MaxDegreeOfParallelism = 10 various errors related to setting or getting the variables appear, coming from the execution engine.

Expected behavior (Required on creation)

Not getting any error, no matter what degree of parallelism you use, since a process instance is supposed to be isolated from other process instance.

Root Cause (Required on prioritization)

Solution Ideas

Hints

Links

Breakdown

Pull Requests

No tasks being tracked yet.

Dev2QA handover

  • Does this ticket need a QA test and the testing goals are not clear from the description? Add a Dev2QA handover comment
@yanavasileva
Copy link
Member

Hi @DumboJetEngine,

Thank you for your interest in our product.

add the Camunda.Api.Client nuget package to the project

  1. Is this Camunda client created based on our OpenAPI? I found https://www.nuget.org/packages/Camunda.Api.Client that doesn't seem to have been updated since 2020 and I don't think it is compatible with Camunda 7.21. Further, we don't provide support the clients created by third-party tools.

  2. Would it be possible to simplify the project without the usage of NuGet client and share an end-to-end minimal example that reproduces the issue? For that you can consider using: https://github.com/camunda/camunda-engine-unittest template.

  3. Could you try to run your scenario with enabled asynchronous continuation on Activity_Initialize task and observe if the errors still persist? Screenshot:
    image

In case you need to upload more data relevant to the investigation of the issue, please create a simple repository or upload files as gist in GitHub. Thank you in advance for that.

Best,
Yana

@DumboJetEngine
Copy link
Author

Hello.

  1. The client supposedly supports Camunda 7 (see here). I am not sure if any breaking changes were added in version 7.2* of Camunda. I have tried looking into your API to see if any fields the API accepts were missing from the calls (mainly here), but I saw no field relevant enough to cause a race condition. And everything works fine when there is no concurrency.
  2. It's been a while since I last touched Java, so I don't think it will be very feasible for me to use this unit test template. :( Downloading all the tools to build this template project and replicating the logic might be doable after some struggle, but I have no idea how to execute things in parallel in Java.
  3. When using asynchronous continuations on the "Initialize" block, nothing changes. It still works with one thread, and fails the same way with 10 threads. However, I get no result variable back with this enabled. I am not sure why this is. I am new to Camunda, and I had the impression that this option only persists the current state and that it should not affect the workflow result-variables in any way if no errors occur.

I might try to hit your API without this client, to see if that changes anything, but I honestly don't think it will.

@yanavasileva
Copy link
Member

yanavasileva commented Jun 28, 2024

I might try to hit your API without this client, to see if that changes anything, but I honestly don't think it will.

That might be the case, I just wanted to lay out all of the options.
If you manage to create a standalone reproducible example will speed up reproducing the bug and its analysis.

@github-actions github-actions bot added the group:stale DRI: Yana label Jun 29, 2024
@DumboJetEngine
Copy link
Author

DumboJetEngine commented Jul 1, 2024

Hello again.
I've used a simple HTTP client to reproduce the problem this time:
https://gist.github.com/DumboJetEngine/7bcdeccc222d4339fe70bc008f56f652
Test with MaxDegreeOfParallelism = 1 and MaxDegreeOfParallelism = 10 to see the difference.

Here is the bpmn file I've used:
https://gist.github.com/DumboJetEngine/4fd2efb3462a879f210afc6636916069

I don't see asynchronous continuations affecting anything (when it comes to errors at least).

@yanavasileva
Copy link
Member

yanavasileva commented Jul 5, 2024

Hi @DumboJetEngine,

Unfortunately, I am still not able to reproduce the issue. I see some flows have missing conditions in the latest bpmn process. Could you please simplify the example further?

It uses the default H2 database, as far as I understand. Isn't that database a valid production candidate?

Yes, h2 database should not be used in production. You can pick some of the other supported environments: https://docs.camunda.org/manual/7.21/introduction/supported-environments/#databases

@DumboJetEngine
Copy link
Author

@yanavasileva
Any chance this is caused by me using H2, or should I continue using it for this test?
I am a bit busy right now, but if I find time I will give it another go.

@yanavasileva
Copy link
Member

Any chance this is caused by me using H2, or should I continue using it for this test?

I am not sure, probably not. In case you want to use h2 in production or test pre-production environment, please switch to another database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Issues that describe a user-facing bug in the project.
Projects
None yet
Development

No branches or pull requests

2 participants