Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Regex DSL #1302

Open
rlouf opened this issue Nov 30, 2024 · 0 comments
Open

Add Regex DSL #1302

rlouf opened this issue Nov 30, 2024 · 0 comments
Assignees
Milestone

Comments

@rlouf
Copy link
Member

rlouf commented Nov 30, 2024

Why?

Regular expressions are a very compact DSL to generate DFAs, and can be intimidating at first. We cannot expect users to be fluent in this DSL.

How?

We can make this easier by designing a simple Python DSL that compiles into regular expressions. It can be as simple as defining functions that return a regex string:

def one_or_more(pattern: str):
    return f"({pattern})+"

This is in practice slightly more complex, as some characters need to be escaped in literal expressions (such as (), and we cannot expect users who are not familiar with regular expressions to know that. A solution would be to create a RegexStr object that abstracts this away, for instance:

def escape(string):
    if string == "(":
        return f"\{string}"
    if string == ")":
        return f"\{string}"
    else:
        return string


class RegexStr(str):
    def __add__(self, other):
        if isinstance(other, RegexStr):
            return RegexStr(f"{self}{other}")
        else:
            return RegexStr(f"{self}{escape(other)}")

    def __radd__(self, other):
        if isinstance(other, RegexStr):
            return RegexStr(f"{other}{self}")
        else:
            return RegexStr(f"{escape(other)}{self}")


def exactly(quantity, *regex_strs):
    regex_str = ''.join(regex_strs)
    if regex_str == '':
        raise ValueError('regex_strs argument must have at least one nonblank value')
    return RegexStr(regex_str + '{' + str(quantity) + '}')

print("(" + exactly(3, "abc") + ")")

This is however a very rough design, and I am open to alternatives.

@rlouf rlouf added this to the 0.1 milestone Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants