Optimization: Cache nonmatching filter results #166

plundeen · 2021-03-25T15:46:45Z

If I am reading it correctly, it looks like you could further optimize the Topic pattern match evaluation in MQTT Server.lvlib:_Subscription.lvclass:Evaluate Match.vi. As it stands, if a given topic name matches the subscription pattern, it is cached so the pattern matching/wildcard evaluation does not be be performed on each subsequent sample. But if the name does not match, the result is not cached, so subsequent samples of that topic have to be reevaluated every time.

Perhaps there are some nuances that I am overlooking that preclude subsequent samples of a non-matching topic. Feel free to close if I have misinterpreted. But I think it would save some processing overhead if the evaluation was done for each unique topic name exactly once, regardless of whether it matched or not. So, the cache variant attribute table would end up having an entry for every topic, with the corresponding value the result of the evaluation.

The text was updated successfully, but these errors were encountered:

plundeen · 2021-03-25T15:47:33Z

Also, there's a lingering #TODO comment that I think has been addressed and can be removed here.

francois-normandin · 2021-03-27T15:18:00Z

@plundeen I think this is worth further investigation as it might reduce significantly the burden on the server when there is a high frequency stream of messages being published. I see it having a large potential for filtering high-frequency messages, but even more so for QoS 0 messages...

Of course, what you suggest could be used in a DoS attack that could result in a huge memory footprint if attackers were to systematically bombard the server with ever-changing topics being published... but at this point, I think one would have bigger problems and should use TLS to not allow just anyone to connect in the first place.

I'll definitely consider this, and probably come up with a way for the subscription loops to share with the server (owner of subscriptions) a mean to stop the distribution upstream instead of having every subscription manage the same "not a match" list. Most subscriptions would be in the situation of having a tiny "matched" list and a huge (and nearly identical to other subscriptions) "not-a-match" list.

plundeen · 2021-03-29T14:00:28Z

Of course, what you suggest could be used in a DoS attack that could result in a huge memory footprint if attackers were to systematically bombard the server with ever-changing topics being published... but at this point, I think one would have bigger problems and should use TLS to not allow just anyone to connect in the first place.

I completely agree.

I'll definitely consider this, and probably come up with a way for the subscription loops to share with the server (owner of subscriptions) a mean to stop the distribution upstream instead of having every subscription manage the same "not a match" list. Most subscriptions would be in the situation of having a tiny "matched" list and a huge (and nearly identical to other subscriptions) "not-a-match" list.

The simplest implementation would probably be to have a single variant lookup table wrapped in a DVR that is shared by all loops. That does, however, introduce resource contention as they fight for the mutex, and could result in an increase in context switching. The actual time the mutex is held should be pretty brief, thanks to the variant attribute red-black tree performance, at least.

Thanks for taking the time to consider this, @francois-normandin .

francois-normandin · 2021-03-29T15:34:40Z

Thanks. Most appreciated feedback.

If this were an application-specific project implementation, I would definitely consider the variant-wrapped DVR as "the" top-of-the-list contender. The subscription mechanism needs to stay independent in all possible use cases and this means that there is the possibility of multiple concurrent servers running in the same app, all in need of staying absolutely independent of one another (including their respective subscription processes). It does not mean the DVR approach is out of the question... after all, the DVR reference can be passed down in a by-value fashion from a server to all the subscriptions. But such an arrangement could result in tight coupling of the server class with the subscription class.

In the context of this project, I want to maintain as much decoupling as possible (at least in the base implementation) and leave it to specific overrides the care to introduce coupling when it serves the application. To that effect, I'm already considering extracting the subscription mechanism from the MQTT-specific context and make that a separate open source project. That way, the MQTT server could use appropriate hooks to inject a DVR-based variant attribute table as you suggest. All in all, what I'm saying is that I believe the specific implementation strategy should be a DVR-based attribute table, but that it should be implemented through dependency inversion to make it compliant with SOLID principles. :-)

francois-normandin added the enhancement New feature or request label Mar 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization: Cache nonmatching filter results #166

Optimization: Cache nonmatching filter results #166

plundeen commented Mar 25, 2021

plundeen commented Mar 25, 2021

francois-normandin commented Mar 27, 2021

plundeen commented Mar 29, 2021

francois-normandin commented Mar 29, 2021

Optimization: Cache nonmatching filter results #166

Optimization: Cache nonmatching filter results #166

Comments

plundeen commented Mar 25, 2021

plundeen commented Mar 25, 2021

francois-normandin commented Mar 27, 2021

plundeen commented Mar 29, 2021

francois-normandin commented Mar 29, 2021