First pass at [Ruby][Mongo] validate simple filter rules #3174 #430

mchernyavskaya · 2022-11-14T14:09:39Z

Closes https://github.com/elastic/enterprise-search-team/issues/3174

Basic high-level validation of simple filtering rules. It has a lot of edge cases and we can't cover all, so it's a best-effort attempt to provide at least some rough coverage.

The PR also includes pre-processing of start_with and end_with rules in mongo (the first PR for simple rules didn't cover those as they were specified as post-processing in the implementation document, but since technically it's just a kind of a regex, I didn't see why we can't do it in pre-processing for Mongo).

Checklists

Pre-Review Checklist

Covered the changes with automated tests
Tested the changes locally

Related Pull Requests

[Mongo] add "simple" pre-processing filtering logic #410 - simple filtering rules
First pass at post-processing rules #412 - post-processing rules

For Elastic Internal Use Only

Considered corresponding documentation changes to contribute separately
New configuration settings added in this PR follow the official guidelines
Built gems (both connectors_utility and connectors_service) and included into Enterprise Search and tested that Enterprise Search works well with new gem versions. Instruction can be found here

https://github.com/elastic/enterprise-search-team/issues/3174

https://github.com/elastic/enterprise-search-team/issues/3174 write tests for starts_with, ends_with

https://github.com/elastic/enterprise-search-team/issues/3174 linting

https://github.com/elastic/enterprise-search-team/issues/3174 wrestling with starts_with

# Conflicts: # lib/connectors/base/simple_rules_parser.rb # lib/connectors/mongodb/mongo_rules_parser.rb # spec/connectors/mongodb/mongo_rules_parser_spec.rb

https://github.com/elastic/enterprise-search-team/issues/3174 getting in main and fixing 100500 tests

https://github.com/elastic/enterprise-search-team/issues/3174 fixing another 100500 tests

https://github.com/elastic/enterprise-search-team/issues/3174 added pre-validation for starts_with, ends_with

seanstory · 2022-11-15T16:00:03Z

lib/core/filtering/simple_rule.rb

+        raise "#{e.key} is required: #{e.message}"
+      end
+
+      def self.flex_fetch(hash, key, default_value = nil)


I believe @vidok said this will kill kittens

Yeah if we do it in a monkey patch. =) But here, we don't.
Frankly, I don't see a good way to do it with key types unknown. Moreover, even the
hash[key] || hash.fetch(key.to_sym, nil) || hash.fetch(key.to_s, nil) won't work properly in the case where the key = boolean false (learned the hard way, therefore this ugly code).

When resolving merge conflicts, I convinced myself we didn't need this. We can just rely on string keys.

seanstory · 2022-11-15T16:01:14Z

lib/core/filtering/simple_rule.rb

+        @rule = SimpleRule.flex_fetch(rule_hash, RULE)
+        @value = SimpleRule.flex_fetch(rule_hash, VALUE)
+        @id = SimpleRule.flex_fetch(rule_hash, ID)
+        @order = SimpleRule.flex_fetch(rule_hash, ORDER, 0)


If we're going to add order here, I suggest we require it, and not provide a default value. Otherwise we risk non-deterministic ordering in specs if we don't have an explicit order field defined and they all get :order => 0 from this default.

My thinking was that if we don't have order, then the ordering probably doesn't matter (like it doesn't now for pre-processing) and we can safely assume some default value. But you're probably right.

seanstory · 2022-11-15T16:01:34Z

lib/connectors/base/simple_rules_parser.rb

@@ -11,23 +11,144 @@

 module Connectors
  module Base
+    class FilteringRulesValidationError < StandardError; end


should this go in errors.rb?

seanstory · 2022-11-15T16:02:39Z

lib/connectors/base/simple_rules_parser.rb

      def initialize(rules)
-        @rules = (rules || []).map(&:with_indifferent_access).filter { |r| r[:id] != 'DEFAULT' }.sort_by { |r| r[:order] }
+        begin
+          sorted = (rules || []).map { |r| SimpleRule.new(r) }.filter { |r| r.id != 'DEFAULT' }.sort_by(&:order)


I like this change. I'd avoided it because of the order requirement, but you've solved that now.
Should we unify this sorting with the one done by the PostProcessingEngine to DRY up our logic?

seanstory · 2022-11-15T16:08:58Z

lib/connectors/base/simple_rules_parser.rb

+        # # check for overlapping ranges
+        # ranges = field_rules.filter { |r| r[:rule] == '>' || r[:rule] == '<' }
+        # ranges.each_with_index do |r, i|
+        #   next if i == ranges.size - 1
+        #   next_r = ranges[i + 1]
+        #   if r[:value] == next_r[:value]
+        #     raise FilteringRulesValidationError.new("Contradicting rules for field: #{field}. Can't have overlapping ranges.")
+        #   end


should this block be removed?

Yeah it should, it's a WIP to document the thought process. Frankly, the ranges feel so cumbersome to validate... there's just too many cases.

lib/connectors/base/simple_rules_parser.rb

lib/connectors/mongodb/mongo_rules_parser.rb

spec/core/filtering/simple_rule_spec.rb

https://github.com/elastic/enterprise-search-team/issues/3174 added some range pre-validation some of the processed cases aren't covered by tests (drop_invalid_range_rules)

…ring-rules # Conflicts: # lib/core/filtering/post_process_engine.rb # lib/core/filtering/simple_rule.rb # spec/connectors/mongodb/mongo_rules_parser_spec.rb

seanstory · 2022-11-16T02:41:34Z

@mchernyavskaya I failed to get this ready to merge. It too me too long to figure out how to resolve the merge conflicts (I couldn't rebase your branch and I still don't really understand why), and I'm just out of time. I think we'll just need to save this issue for 8.7 :(

mchernyavskaya · 2022-11-16T12:01:49Z

@seanstory no worries. Taking out of the DM: I feel like I'm going into a rabbit hole with these filtering rules (and specifically, with validation). I'm second-guessing this pre and post processing model more and more, so maybe that's something we should revisit.

artem-shelkovnikov · 2022-11-16T16:41:43Z

Checked the PR and spoke to @mchernyavskaya about it. It indeed feels like a rabbit hole - number of possible combinations of invalid rules is potentially really high, the complexity of validating the rules is humongous.

I think same problem can be probably approached from a different perspective - let users enter any rules they want but provide good logging and tracing to see what query was generated (maybe even why) and why the expected data was not ingested.

It's probably good to validate each individual rule though just for sanity purposes - for example that RANGE filter is actually [X; Y], not [-INF, X], [Y; +INF]

mchernyavskaya · 2022-11-17T14:18:14Z

@artem-shelkovnikov yep - I also proposed something like this in Ingestion Sync agenda - want it at least to be up for a discussion.

seanstory · 2022-11-29T20:00:36Z

I believe we can close this now, as it was implemented in a separate PR

First pass at [Ruby][Mongo] validate simple filter rules #3174

8fd7c4b

https://github.com/elastic/enterprise-search-team/issues/3174

mchernyavskaya added the v8.6.0.0 label Nov 14, 2022

github-actions bot added auto-backport v8.6.0.4 labels Nov 14, 2022

mchernyavskaya added 7 commits November 14, 2022 16:17

First pass at [Ruby][Mongo] validate simple filter rules #3174

b50c3bc

https://github.com/elastic/enterprise-search-team/issues/3174 write tests for starts_with, ends_with

First pass at [Ruby][Mongo] validate simple filter rules #3174

757ebbc

https://github.com/elastic/enterprise-search-team/issues/3174 linting

First pass at [Ruby][Mongo] validate simple filter rules #3174

c58c604

https://github.com/elastic/enterprise-search-team/issues/3174 wrestling with starts_with

Merge branch 'main' into maryna/validate-filtering-rules

2ddcc8d

# Conflicts: # lib/connectors/base/simple_rules_parser.rb # lib/connectors/mongodb/mongo_rules_parser.rb # spec/connectors/mongodb/mongo_rules_parser_spec.rb

[Ruby][Mongo] validate simple filter rules #3174

8d5d54d

https://github.com/elastic/enterprise-search-team/issues/3174 getting in main and fixing 100500 tests

[Ruby][Mongo] validate simple filter rules #3174

b6a06b3

https://github.com/elastic/enterprise-search-team/issues/3174 fixing another 100500 tests

[Ruby][Mongo] validate simple filter rules #3174

a7ccfc7

https://github.com/elastic/enterprise-search-team/issues/3174 added pre-validation for starts_with, ends_with

seanstory reviewed Nov 15, 2022

View reviewed changes

mchernyavskaya and others added 3 commits November 15, 2022 18:40

[Ruby][Mongo] validate simple filter rules #3174

917ed9e

https://github.com/elastic/enterprise-search-team/issues/3174 added some range pre-validation some of the processed cases aren't covered by tests (drop_invalid_range_rules)

Merge remote-tracking branch 'origin/main' into maryna/validate-filte…

a989257

…ring-rules # Conflicts: # lib/core/filtering/post_process_engine.rb # lib/core/filtering/simple_rule.rb # spec/connectors/mongodb/mongo_rules_parser_spec.rb

Fix some lingering bad tests

0e9bd41

artem-shelkovnikov closed this May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First pass at [Ruby][Mongo] validate simple filter rules #3174 #430

First pass at [Ruby][Mongo] validate simple filter rules #3174 #430

mchernyavskaya commented Nov 14, 2022

seanstory Nov 15, 2022

mchernyavskaya Nov 15, 2022

seanstory Nov 16, 2022

seanstory Nov 15, 2022

mchernyavskaya Nov 15, 2022

seanstory Nov 15, 2022

seanstory Nov 15, 2022

seanstory Nov 15, 2022

mchernyavskaya Nov 15, 2022

seanstory commented Nov 16, 2022

mchernyavskaya commented Nov 16, 2022

artem-shelkovnikov commented Nov 16, 2022

mchernyavskaya commented Nov 17, 2022

seanstory commented Nov 29, 2022

First pass at [Ruby][Mongo] validate simple filter rules #3174 #430

First pass at [Ruby][Mongo] validate simple filter rules #3174 #430

Conversation

mchernyavskaya commented Nov 14, 2022

Closes https://github.com/elastic/enterprise-search-team/issues/3174

Checklists

Pre-Review Checklist

Related Pull Requests

For Elastic Internal Use Only

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanstory commented Nov 16, 2022

mchernyavskaya commented Nov 16, 2022

artem-shelkovnikov commented Nov 16, 2022

mchernyavskaya commented Nov 17, 2022

seanstory commented Nov 29, 2022