PatternValidator does not correctly validate input with newlines #495

kmalski · 2022-01-12T19:33:19Z

Hi,

I have found that neither java regex based (PatternValidatorJava) nor joni based (PatternValidatorEcma262) pattern validation does not work correctly with newlines.

Any of implementation does not correctly interpret ^ and $ anchors. I would expect that, when I use them at the start and end of pattern (eg. ^[a-z]{1,10}$) they would not allow to pass any trailing newline character (eg. abc\n should not be matched). There are separate problems with both implementation, so I will describe them individually.

Joni
The problem is with default configuration for ECMAScript syntax in Joni library, which has multiline matching by default enabled. From the json-schema-validator code:

  private boolean matches(String value) {
      if (compiledRegex == null) {
          return true;
      }

      byte[] bytes = value.getBytes();
      return compiledRegex.matcher(bytes).search(0, bytes.length, Option.NONE) >= 0;
  }

For the fast fix, the last line can be changed to:

      return compiledRegex.matcher(bytes).search(0, bytes.length, -Option.MULTILINE) >= 0;

but I have also rised issue in the Joni library, as I believe that this is not correct default (ECMAScript has disabled multiline matching by default).

What's more interesting, because of enabled multiline matching, currently this input \r\nab\nab\n will match this pattern ^[a-z]{1,10}$ and pass validation. We want to allow single character-only word, and the entire sentence passes.

Java built-in regex find vs match
Current implementation of matching with java regex looks like this:

  private boolean matches(String value) {
      return compiledPattern == null || compiledPattern.matcher(value).find();
  }

The problem is how the find method works. From the documentation: Attempts to find the next subsequence of the input sequence that matches the pattern. So this function matches subsequence of input and for the two JDKs, which I tried, returns true for input abab\n and pattern ^[a-z]{1,10}$, despite that I used ^ and $ anchors. So any ending newline character will be always allowed for such patterns.

Possible solution is to use matches method which attempts to match the entire region against the pattern, but this will result in implicit adding ^ and $ anchors to every pattern.

The text was updated successfully, but these errors were encountered:

stevehu · 2022-01-12T23:01:35Z

@kmalski Thanks a lot for raising this issue and providing detailed explanations and solutions. I would say that multiline support for pattern matching is very rare and that is why nobody found this issue before. What we can do is to make this configurable and default to the single-line. This will maintain backward compatibility and at the same time, allow users with multiline pattern matching to be enabled when necessary. What do you think?

Here is the class for the configuration and we just need to add a new entry with default as single-line.

https://github.com/networknt/json-schema-validator/blob/master/src/main/java/com/networknt/schema/SchemaValidatorsConfig.java

kmalski · 2022-01-13T15:50:17Z

@stevehu Looks good for me. I found one more case in Joni library, when subtract of Option.MULTILINE is not enough, so I think it's reasonable to wait for answer (at least some time).

stevehu · 2022-01-14T17:45:39Z

Thanks for the update. Let's keep this issue open until we have the proper solution.

anro87 · 2023-04-21T07:20:23Z

We are facing the same issue within our project. Any plans to adjust the SchemaValidatorsConfig.java as mentioned by @stevehu above?

stevehu · 2023-04-21T11:36:58Z

@anro87 I think @kmalski has documented the solution already. Would you like to submit a PR to get it implemented?

anro87 · 2023-04-24T08:01:06Z

@stevehu You mean we should implement it based on @kmalski's solution and open a PR for it? Guess we can do that. ;)

stevehu · 2023-04-24T18:04:19Z

Yes. Let us know if you need any help. Thanks.

mriehl · 2023-05-04T12:56:09Z

Hey @stevehu , I'm one of @anro87 's colleagues. I've just started writing a test for this feature but it quickly dawned on me that the problem was already fixed a few days ago in #737
The pattern validator is now using matches as outlined above and I can't reproduce the issue.

It's not configurable though, not sure what that means for a release since it would technically break existing behavior

Note: the Joni/ECMA262 side still exhibits this issue, it's the JDK-based matcher that has been fixed.

mriehl · 2023-05-09T10:56:01Z

Waiting on feedback how I should proceed. These are the options we have:

adjust the joni validator to behave similarly or wait for something to happen in Multiline Option with ^ and $ anchors jruby/joni#57
(my personal preference, there are probably lots of users out there who don't expect input with newlines to pass validation. There are probably also security implications)
introduce a config flag just for the joni validator and leave master as-is (non-joni validator uses match and not find)
introduce a config flag for both validators (what was initially discussed here).
Not sure how that will affect the unit tests added in Enable unit-tests for ECMA 262 regular expressions #737 but still an option

fdutton · 2023-05-22T11:53:52Z

@mriehl I would like to see your solution for both. #782 raises another issue with my solution when the regex is not anchored.

mriehl · 2023-05-22T13:02:34Z

Hm I didn't really think about the use case in #782 .
A configuration flag for implicit anchoring would solve this though it's not elegant.

What do you think about keeping it the way it is now (no implicit anchoring) and doing very simple detection if the regex wants to be anchored, in that case we would use matches instead of find. This seems like it would do the right thing in 99,99% of cases

…DK engines. Resolves #495 and #782

justin-tay · 2024-06-13T12:06:28Z

Re-opening as the fix for this was reverted due to causing other issues in

Fix JDK regex support #888

I'm not familiar enough with the issue and it doesn't look like this is easy to fix so using GraalJS might be an option.

kmalski · 2024-06-14T20:30:13Z

Hi, great progress of the library from the time when I originally opened this issue. @justin-tay to make it easier for you, I have checked to see if this issue still exists and currently I can reproduce it for Joni and JDK implementations (GraalJS implementation seems to work fine).

@Test
void multilineString() {
    RegularExpression regex = new JoniRegularExpression("^[a-z]{1,10}$");
    assertTrue(regex.matches("abc\n"));
}

@Test
void multilineString() {
    RegularExpression regex = new JDKRegularExpression("^[a-z]{1,10}$");
    assertTrue(regex.matches("\nabc\n"));
}

Both test cases pass, but because we are using anchors here, I would expect them to fail.

fdutton mentioned this issue May 22, 2023

Resolves improper anchoring of patternProperties #783

Merged

fdutton pushed a commit that referenced this issue May 22, 2023

Resolves improper anchoring of regular expressions in both Joni and J…

0b6f318

…DK engines. Resolves #495 and #782

stevehu closed this as completed in f09740a May 23, 2023

justin-tay reopened this Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PatternValidator does not correctly validate input with newlines #495

PatternValidator does not correctly validate input with newlines #495

kmalski commented Jan 12, 2022

stevehu commented Jan 12, 2022

kmalski commented Jan 13, 2022

stevehu commented Jan 14, 2022

anro87 commented Apr 21, 2023

stevehu commented Apr 21, 2023

anro87 commented Apr 24, 2023 •

edited

Loading

stevehu commented Apr 24, 2023

mriehl commented May 4, 2023 •

edited

Loading

mriehl commented May 9, 2023

fdutton commented May 22, 2023

mriehl commented May 22, 2023

justin-tay commented Jun 13, 2024

kmalski commented Jun 14, 2024

PatternValidator does not correctly validate input with newlines #495

PatternValidator does not correctly validate input with newlines #495

Comments

kmalski commented Jan 12, 2022

stevehu commented Jan 12, 2022

kmalski commented Jan 13, 2022

stevehu commented Jan 14, 2022

anro87 commented Apr 21, 2023

stevehu commented Apr 21, 2023

anro87 commented Apr 24, 2023 • edited Loading

stevehu commented Apr 24, 2023

mriehl commented May 4, 2023 • edited Loading

mriehl commented May 9, 2023

fdutton commented May 22, 2023

mriehl commented May 22, 2023

justin-tay commented Jun 13, 2024

kmalski commented Jun 14, 2024

anro87 commented Apr 24, 2023 •

edited

Loading

mriehl commented May 4, 2023 •

edited

Loading