You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The behavior of the crate when trying to use the ASCII character class syntax [[:foo:]] with invalid character classes is somewhat confusing. A friend was trying to use [[:XID_Start:]] to check whether _ (underscore/low line) was included in the XID_Start character class (it's not), and was confused when it returned true.
let expr = regex::Regex::new(r"[[:XID_Start:]]").unwrap();dbg!(expr.is_match("_"));// true
The correct syntax, \p{XID_Start}, does work correctly:
let correct = regex::Regex::new(r"\p{XID_Start}").unwrap();dbg!(correct.is_match("a"));// truedbg!(correct.is_match("1"));// falsedbg!(correct.is_match("_"));// false
It seems that when the class is invalid for an ASCII character class (regex § ASCII character classes), it falls back to marking any character present within the brackets as true:
I'm not entirely sure what regex is actually interpreting this sequence as, but, assuming this is intentional behavior, I think that it might be something that is worth documenting in the aforementioned section on ASCII character classes in the docs, as the behavior is not immediately intuitive.
The text was updated successfully, but these errors were encountered:
Yes the behavior is unfortunate but intentional for compatibility with how other regex engines work. In retrospect, I would have rathered being a bit more strict here to produce errors for unrecognized classes.
I agree that adding a note to the docs about this would be a good idea.
Crate version: 1.11.0
Example code: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c4b4cfe18c2e6413444e53315de33b27 (used for snippets below and extra checks)
The behavior of the crate when trying to use the ASCII character class syntax
[[:foo:]]
with invalid character classes is somewhat confusing. A friend was trying to use[[:XID_Start:]]
to check whether_
(underscore/low line) was included in the XID_Start character class (it's not), and was confused when it returned true.The correct syntax,
\p{XID_Start}
, does work correctly:It seems that when the class is invalid for an ASCII character class (
regex
§ ASCII character classes), it falls back to marking any character present within the brackets as true:I'm not entirely sure what
regex
is actually interpreting this sequence as, but, assuming this is intentional behavior, I think that it might be something that is worth documenting in the aforementioned section on ASCII character classes in the docs, as the behavior is not immediately intuitive.The text was updated successfully, but these errors were encountered: