Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add named capturing groups to gleam/regex.Match #663

Open
yoonthegoon opened this issue Jul 21, 2024 · 6 comments
Open

Add named capturing groups to gleam/regex.Match #663

yoonthegoon opened this issue Jul 21, 2024 · 6 comments

Comments

@yoonthegoon
Copy link

As it is now, you have the ability to write named capturing groups, but those results don't appear in the list of matches returned from regex.scan.

import gleam/io
import gleam/regex

pub fn main() {
  let assert Ok(regex) = regex.from_string("(?<delim>,)")
  let content = "Hello, world!"

  regex.scan(regex, content)
  |> io.debug
}
[Match(content: ",", submatches: [Some(",")])]

I propose Match have the following signature:

pub type Match {
  Match(
    /// The full string of the match.
    content: String,
    /// A `Regex` can have subpatterns, sup-parts that are in parentheses.
    submatches: List(#(String, Option(String))),
  )
}

So now running something like this

import gleam/io
import gleam/regex

pub fn main() {
  let assert Ok(regex) = regex.from_string("(?<type_of>new|old)\\s+(\\w+)")
  let content = "new match_type"

  regex.scan(regex, content)
  |> io.debug
}

would give you this

[Match(content: "new match_type", submatches: [#("type_of", Some("new")), #("2", Some("match_type"))])]

If the concern is this may break existing used regex, possibly a new added option (which will still break existing compiled regex) or a new groups function can be added that returns just a list of tuples of group name and submatches.

@lpil
Copy link
Member

lpil commented Jul 22, 2024

How would we handle groups with no names?

@lpil lpil transferred this issue from gleam-lang/gleam Jul 22, 2024
@yoonthegoon
Copy link
Author

named groups can be Some(String) then. if it's named, then there's a string, otherwise Nil. iirc erlang uses PCRE for its regex implementation, so the feature of named strings is technically there. though i'm unsure if the erlang match gives you group names like how it's done with javascript.

@yoonthegoon
Copy link
Author

here's how javascript handles it

>> /(?<type_of>new|old)\s+(\w+)/.exec("new match_type")
Array(3) [ "new match_type", "new", "match_type" ]
  0: "new match_type"
  1: "new"
  2: "match_type"
  groups: Object { type_of: "new" }
    type_of: "new"
  index: 0
  input: "new match_type"
  length: 3
  <prototype>: Array []

@yoonthegoon
Copy link
Author

due to how Erlang handles matches though, it may be necessary to instead include desired group names in Options.

1> re:run("new match_type", "(?<type_of>new|old)\\s+(\\w+)").
{match,[{0,14},{0,3},{4,10}]}
2> re:run("new match_type", "(?<type_of>new|old)\\s+(\\w+)", [{capture, ["type_of"], list}]).
{match,["new"]}

@yoonthegoon
Copy link
Author

I see now gleam/regex is meant to be a port of Elm's Regex package and they unfortunately don't have a way to extract capturing group names either. Straying from this would have to be a decision. 🤷

@lpil
Copy link
Member

lpil commented Jul 25, 2024

I see now gleam/regex is meant to be a port of Elm's Regex package

It's not a port, similarity is incidental

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants