Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performances and Multithreading #174

Open
emabiz opened this issue Apr 11, 2023 · 2 comments
Open

Performances and Multithreading #174

emabiz opened this issue Apr 11, 2023 · 2 comments

Comments

@emabiz
Copy link

emabiz commented Apr 11, 2023

Hi,
first of all thank you for your job.

I tried the library parsing nlohmannjson objects.
Performances are good parsing a lot of objects using one pre-created validator.
What is the expected behavior using the same validator by multiple threads that work concurrently?
Must be the access to validator locked?

Thank you

@tristanpenman
Copy link
Owner

Hi @emabiz, glad to hear you're finding the library useful.

I haven't done any testing with regard to concurrency. The Validator is more-or-less stateless. While it is validating a particular document, pretty much everything relevant to that document lives on the stack, or in short-lived heap allocations. However, there are two risks that come to mind:

  1. The Validator does maintain a regex cache, which is not protected by any locking. While this may not cause issues in practice, this cannot be guaranteed.
  2. Individual Parsers may behave unexpectedly if you were to validate the same document concurrently. i.e. Two threads using the same Validator, and same Parser Adapter.

In the case of 1), I believe that would be ease enough to address using std::mutex.
In the case of 2), that's already a bit of a degenerate case, so is probably not worth worrying about.

Hope that helps!

@emabiz
Copy link
Author

emabiz commented Apr 12, 2023

Hi,
for 1. yes my idea is to protect the validator with a mutex. Can you confirm that there is no cache sharing among validators, then it is possible to create different validators and using them concurrently without protecting them with a global mutex?

Not clear the case 2.
My idea is to use the validator to check json coming from the network. Then each document is a different object.
I'd like to create valijson::Schema ,valijson::SchemaParser and valijson::Validator once and then use them for all received objects. Each object can be validated by a different thread.
Something like this:

MMValidator::MMValidator():
	_validator(valijson::Validator::kStrongTypes)
{
	json schemaDocument = json::object({
		{"$schema","https://json-schema.org/draft/2020-12/schema"},
		{"type" , "object" },
		...
		});

	// Parse the json schema into an internal schema format
	
	valijson::adapters::NlohmannJsonAdapter schemaDocumentAdapter(schemaDocument);
	try {
		_parser.populateSchema(schemaDocumentAdapter, _schema);
	}
	catch (std::exception& e) {
		wm::LogManager::error(_log_target, fmt::format("Failed to parse schema: {}", e.what()));		
	}
	
}

bool MMValidator::validate(json targetDocument)
{
	bool valid = false;
	valijson::ValidationResults results;
	valijson::adapters::NlohmannJsonAdapter targetDocumentAdapter(targetDocument);
	{
		std::lock_guard<std::mutex> lock(_mutex);		
		valid = _validator.validate(_schema, targetDocumentAdapter, &results);
	}
	
	if (!valid) {
		valijson::ValidationResults::Error error;
		while (results.popError(error)) {
			std::string context;
			std::vector<std::string>::iterator itr = error.context.begin();
			for (; itr != error.context.end(); itr++) {
				context += *itr;
			}
			wm::LogManager::error(_log_target, fmt::format("invalid mm {}, context={}, error={}", targetDocument.dump(), context, error.description));
		}
	}
	return valid;
}

Also not sure if the mutex can be released immediately after the validate, or if it must be locked also during results error context manipulation.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants