Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character classes that reference other character classes #34

Open
SteelPhase opened this issue Nov 26, 2019 · 4 comments
Open

Character classes that reference other character classes #34

SteelPhase opened this issue Nov 26, 2019 · 4 comments

Comments

@SteelPhase
Copy link

Would it be possible to have the regex parser support character classes like \w within other character classes? I had a regex pattern earlier that used the character class [0-9a-zA-Z_\.-], and I attempted to simplify it with [\w\.\-]. I didn't notice this library doesn't support doing that, and was wondering just how difficult that would be to implement. For the time being i'm just expanding out \w to 0-9a-zA-Z_ within the character class.

@timtadh
Copy link
Owner

timtadh commented Nov 27, 2019

Looks like this is supported by re2 (which I mostly follow when adding new support for compatibility with Go regexp). https://github.com/google/re2/wiki/Syntax

It has been a while since I worked on the regexp parser. However, adding this support looks doable by extending the charClassItem function to support the built-in classes. The signature would need to change to support returning a list of ranges instead of just one.

Do you have other feature requests for the regexp language? I have mostly followed the principle of implementing the portions people ask for.

@SteelPhase
Copy link
Author

This is purely a nice to have, as it's easy enough to just do it myself. The only other one I've run into is the need to strip non capturing group syntax from existing regex expressions. Still simple to work around by stripping the ?: at the start of a group

@timtadh
Copy link
Owner

timtadh commented Nov 27, 2019

Ok. Less likely to handle ignoring ?: as capture groups are not something that lexmachine is likely to support (as it is likely better to implement that sort of logic in a different way, perhaps by having multiple tokens).

Adding support for the built-in character class to be used inside of a [] character class seems like a good idea.

@SteelPhase
Copy link
Author

Thanks for taking look into this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants