Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stdlib): allow parsing multiple durations at once in parse_duration function #1197

Merged
merged 13 commits into from
Jan 3, 2025

Conversation

titaneric
Copy link
Contributor

@titaneric titaneric commented Dec 27, 2024

Summary

Hi, I hope that parse_duration could parse multiple duration at once, so I have opened #1198 as a referenced issue.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

cargo test
./scripts/check.sh

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on
    our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • Our CONTRIBUTING.md is a good starting place.
  • If this PR introduces changes to LICENSE-3rdparty.csv, please
    run dd-rust-license-tool write and commit the changes. More details here.
  • For new VRL functions, please also create a sibling PR in Vector to document the new function.

References

Closes #1196

@titaneric titaneric changed the title feat(stdlib): allow multi-durations in parse_duration function feat(stdlib): allow parsing multiple durations at once in parse_duration function Dec 28, 2024
@pront pront self-assigned this Jan 2, 2025
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @titaneric! I left a few comments. The most interesting one is around the RE change.

@@ -0,0 +1,3 @@
Added `parse_bytes` function to parse bytes either from SI or binary unit depending on the given `base` argument.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please git rm this file.

@@ -0,0 +1 @@
Support multiple-units duration string for `parse_duration` function such as `1h2s`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Support multiple-units duration string for `parse_duration` function such as `1h2s`.
Added support for multi-unit duration strings (e.g., `1h2s`, `2m3s`) in the `parse_duration` function.

let captures = RE
.captures(value)
.ok_or(format!("unable to parse duration: '{value}'"))?;
let capture_match = captures.get(0).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use unwrap(), we can return an error here.

.to_f64()
.ok_or(format!("unable to format duration: '{number}'"))?;
Ok(Value::from_f64_or_zero(number))
let mut num = 0.0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
let mut num = 0.0;
let mut sum = 0.0;

}

static RE: Lazy<Regex> = Lazy::new(|| {
Regex::new(
r"(?ix) # i: case-insensitive, x: ignore whitespace + comments
\A
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed you remove the anchors here, I suppose this is needed for iterative parsing. What are the implications of this? Would this function now accept strings like 1h2s extrablah?

I wonder if we can leverage https://docs.rs/humantime/latest/humantime/ here without breaking existing behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I remove it since I want to iterate the regex matching.
The duration 1h2s extrablah would not be accepted, and its error is Err("unable to parse duration: ' extrablah'")

humantime seems promising for me! I would try it in the next commit.

Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @titaneric, this function is now more powerful than before and easier to maintain.

.to_f64()
.ok_or(format!("unable to format duration: '{number}'"))?;
let duration = ht_parse_duration(&trimmed_value)
.map_err(|_| format!("unable to parse duration: '{value}'"))?;
Copy link
Member

@pront pront Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map_err(|_| format!("unable to parse duration: '{value}'"))?;
.map_err(|e| format!("unable to parse duration: '{e}'"))?;

This will require updates to the test cases.

@pront
Copy link
Member

pront commented Jan 3, 2025

Note: there are multiple failing CI checks, you reproduce them locally with ./scripts/checks.sh.

@titaneric
Copy link
Contributor Author

titaneric commented Jan 3, 2025

A little awkward... We need to update MSRV to 1.80 to support Duration::div_duration_f64 method which is a BREAKING change.

There is another Duration::div_f64 method which MSRV is 1.38, but I need to change the implementation a little bit and it's hard to maintain.

@pront
Copy link
Member

pront commented Jan 3, 2025

A little awkward... We need to update MSRV to 1.80 to support Duration::div_duration_f64 method which is a BREAKING change.

There is another Duration::div_f64 method which MSRV is 1.38, but I need to change the implementation a little bit and it's hard to maintain.

No worries, we can bump the MSRV. Do you mind doing the bump in a separate PR please?

@titaneric
Copy link
Contributor Author

ok, I could raise another PR to bump the MSRV

@titaneric titaneric mentioned this pull request Jan 3, 2025
11 tasks
@titaneric titaneric force-pushed the feat/multi-durations branch from ea72949 to 35dc8e2 Compare January 3, 2025 18:29
@titaneric
Copy link
Contributor Author

I've rebased the main branch and update the LICENSE, please check!

@pront pront added this pull request to the merge queue Jan 3, 2025
@pront
Copy link
Member

pront commented Jan 3, 2025

Thank you @titaneric, this looks great.

Merged via the queue into vectordotdev:main with commit 3c82c74 Jan 3, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow parse_duration function to parse multiple duration similar to Golang's time.ParseDuration
2 participants