Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presenting times in non-UTC timezones #5221

Open
philrz opened this issue Aug 14, 2024 · 0 comments
Open

Presenting times in non-UTC timezones #5221

philrz opened this issue Aug 14, 2024 · 0 comments

Comments

@philrz
Copy link
Contributor

philrz commented Aug 14, 2024

tl;dr

Issues like brimdata/zui#1057 and brimdata/brimcap#352 reflect users' desire to sometimes work in non-UTC timezones, such as their local time. The strftime function added in #5197 provides the %z and %Z formatting directives that could allow output of such timezones, but right now they only reflect the UTC timezone.

To improve on this, we've discussed approaches for a user to provde a hint for an alternate timezone such that the rendered time value and output of %z / %Z directives would reflect the delta. In preliminary discussions on this topic, we've reached some consensus on a proposal to add a third, optional parameter to our strftime that would provide that offset/timezone hint.

Details

Comparison With Other Tools

Users are likely to compare our timezone support to that of other tools and/or copy string-based time values between Zed tools and other tools, so it may be helpful to know how others approach this problem. Since they both support the same formatting options as our strftime, I happened to start with jq and GNU Date.

Since all these tools offer their own default/alternate formats, as a neutral starting point, I'll start from a UTC seconds-since-epoch value 1723662771 that roughly matches the time when this issue was written. I'm ignoring the fractional seconds for now because Zed has orthogonal catch-up to do there (#5220).

jq

jq has its strftime function that supports formatting options like Zed's function of the same name. Per their docs, it only is intended to format a time in UTC (well, they call it GMT).

$ echo '1723662771' | jq 'strftime("%Y-%m-%dT%H:%M:%S")' -
"2024-08-14T19:12:51"

For output in non-UTC timezones, jq offers a separate strflocaltime function. By default it outputs time in the local timezone for the system on which jq is running, though it overrides that behavior if the TZ environment variable is set to something else. My Macbook happens to be in Pacific Daylight Time at the moment, so:

$ echo '1723662771' | jq 'strflocaltime("%Y-%m-%dT%H:%M:%S")' -
"2024-08-14T12:12:51"

$ echo '1723662771' | TZ=US/Eastern jq 'strflocaltime("%Y-%m-%dT%H:%M:%S")' -
"2024-08-14T15:12:51"

Adding the %z or %Z directives can show the offset or timezone abbreviation, respectively.

$ echo '1723662771' | jq 'strflocaltime("%Y-%m-%dT%H:%M:%S%z")' -
"2024-08-14T12:12:51-0800"

$ echo '1723662771' | jq 'strflocaltime("%Y-%m-%dT%H:%M:%S %Z")' -
"2024-08-14T12:12:51 PST"

Though a close inspection of that output reveals a bug, which I see has already been filed as jqlang/jq#1912: They should have said -0700 and PDT, not -0800 and PST! Let's not have that bug in Zed. 😉

Going back to the regular strftime and its mission to be for UTC only, indeed it doesn't change the time value itself based on local system time or TZ variable (i.e., it's still in UTC), but it does still reflect a non-UTC timezone in the %z or %Z directives if invoked, which seems weird and like something else we'd not want to mimic. 😛

$ echo '1723662771' | jq 'strftime("%Y-%m-%dT%H:%M:%S%z")' -
"2024-08-14T19:12:51-0800"

$ echo '1723662771' | jq 'strftime("%Y-%m-%dT%H:%M:%S %Z")' -
"2024-08-14T19:12:51 PST"

GNU Date

GNU Date effectively behaves like jq's strlocaltime by default (i.e., reflects local system timezone, or what's in TZ if present... though it does the right thing with daylight savings time!) with the formatting directives invoked via +.

$ gdate --date=@1723662771 +"%Y-%m-%dT%H:%M:%S%z"
2024-08-14T12:12:51-0700

$ TZ=US/Eastern gdate --date=@1723662771 +"%Y-%m-%dT%H:%M:%S%z"
2024-08-14T15:12:51-0400

$ gdate --date=@1723662771 +"%Y-%m-%dT%H:%M:%S %Z"
2024-08-14T12:12:51 PDT

$ TZ=US/Eastern gdate --date=@1723662771 +"%Y-%m-%dT%H:%M:%S %Z"
2024-08-14T15:12:51 EDT

If -u is added, now GNU Date behaves more like jq's strftime and renders in UTC, though unlike the weird jq behavior cited above, it appears strict in always reflecting UTC even in the timezone offset/abbreviations in %z or %Z.

$ gdate -u --date=@1723662771 +"%Y-%m-%dT%H:%M:%S%z"
2024-08-14T19:12:51+0000

$ TZ=US/Eastern gdate -u --date=@1723662771 +"%Y-%m-%dT%H:%M:%S%z"
2024-08-14T19:12:51+0000

$ gdate -u --date=@1723662771 +"%Y-%m-%dT%H:%M:%S %Z"
2024-08-14T19:12:51 UTC

$ TZ=US/Eastern gdate -u --date=@1723662771 +"%Y-%m-%dT%H:%M:%S %Z"
2024-08-14T19:12:51 UTC

Zed

Repro is with Zed commit 71e35c5.

$ zq -version
Version: v1.17.0-20-g71e35c5d

While Zed's strftime supports %z and %Z for completeness, at the moment they only reflect UTC timezone. No attempt is made to reflect the local system time or check the TZ environment variable for a hint, so this all seems to match with GNU Date's -u behavior.

$ echo '1723662771' | zq -z 'yield time(this * 1000000000) | strftime("%Y-%m-%dT%H:%M:%S%z", this)' -
"2024-08-14T19:12:51+0000"

$ echo '1723662771' | TZ=US/Eastern zq -z 'yield time(this * 1000000000) | strftime("%Y-%m-%dT%H:%M:%S%z", this)' -
"2024-08-14T19:12:51+0000"

$ echo '1723662771' | zq -z 'yield time(this * 1000000000) | strftime("%Y-%m-%dT%H:%M:%S %Z", this)' -
"2024-08-14T19:12:51 UTC"

$ echo '1723662771' | TZ=US/Eastern zq -z 'yield time(this * 1000000000) | strftime("%Y-%m-%dT%H:%M:%S %Z", this)' -
"2024-08-14T19:12:51 UTC"

However, if provided a string representation of a time value that's got a non-UTC offset or timezone abbreviation, Zed's cast to its time type converts it to the appropriate UTC value. So, starting from the Pacific Time outputs from GNU Date shown previously, we get the same UTC outputs for both of these.

$ echo '"2024-08-14T12:12:51-0700"' | zq -z 'yield time(this)' -
2024-08-14T19:12:51Z

$ echo '"2024-08-14T12:12:51 PDT"' | zq -z 'yield time(this)' -
2024-08-14T19:12:51Z

The same is true if there's a colon in the timezone offset.

$ echo '"2024-08-14T12:12:51-07:00"' | zq -z 'yield time(this)' -
2024-08-14T19:12:51Z

I point this out because I found that even as a non-string time literal, the language is already prepared to interpret timezone offsets correctly, but only if they include a colon.

$ echo '2024-08-14T12:12:51-07:00' | zq -z 'yield this' -
2024-08-14T19:12:51Z

$ echo '2024-08-14T12:12:51-0700' | zq -z 'yield this' -
stdio:stdin: format detection error
	arrows: schema message length exceeds 1 MiB
	csv: line 1: delimiter ',' not found
	json: strconv.ParseFloat: parsing "2024-08-14": invalid syntax
	line: auto-detection not supported
	parquet: auto-detection requires seekable input
	tsv: line 1: delimiter '\t' not found
	vng: auto-detection requires seekable input
	zeek: line 1: bad types/fields definition in zeek header
	zjson: line 1: malformed ZJSON: bad type object: "2024-08-14T12:12:51-0700": unpacker error parsing JSON: invalid character '-' after top-level value
	zng: unknown ZNG message frame type: 3
	zson: ZSON syntax error

I peeked at the code and I see how this all fits together. The Zed casting code for time (i.e., what would be used to convert string-based timestamps) ultimately depends on https://github.com/araddon/dateparse which is very flexible in which formats it accepts, hence offsets with and without colons are both fine. Meanwhile the ZSON parser (i.e., what would parse ZSON time literals) depends on the RFC3339Nano mode of Go's time.parse, and RFC3339 only supports offsets with colons.

The ZSON spec describes time as a "an RFC 3339 UTC date/time string". Since the parser is ready to accept them with an offset as long as there's a colon, perhaps we could add a flag at some point to enable printing the ZSON time values with a specified offset. If we did that it would probably also make sense to add an formatting directive to Zed's strftime function to include the colons, since https://github.com/lestrrat-go/strftime doesn't currently offer one. (#5253) FWIW, other tools like GNU Date or the JavaScript library https://github.com/samsonjs/strftime provide offset-with-colon via directive %:z.

Zed Proposals

In a preliminary discussion on this topic, @mattnibs made the following proposal:

I could imagine strftime taking a third duration argument that specifies the offset of timezone and then you could use the %z directive to display the timezone. We could have a timezone function that would return the offset.

Eg strftime(“%z”, now(), tz(“PST”))

@nwt had a preference for a string argument instead of a duration+function, such that the user could directly input an offset or supported timezone name/abbreviation.

Implementation details aside, these ideas do seem like they'd provide the base functionality that's currently missing.

That said, this requires the user to lock their timezone within the Zed program, which seems less convenient than how jq and GNU Date provided ways to automatically reflect the system's local time or an alternate timezone specified in TZ. Environments may want this if they have users spread across multiple timezones running the same Zed programs that all want to see times presented in local format.

If we started from one of the proposals above, perhaps a way of calling the proposed tz function or a particular value for the proposed string argument could invoke the behavior seen with jq and GNU Date where it obeys the local system time and overrides that if TZ is set.

A possible alternative to the "third strftime parameter" approach might be to offer some kind of CLI option that allows the specification of an alternate timezone/offset (e.g., zq -tz="US/Eastern") and have that setting affect strftime and any other time-centric functionality we may add in the future (e.g., if we wanted to start allow for printing time literals with offsets rather than just strings via strftime.) One side effect of this approach is that it could provide a way for users to get the benefit of their TZ environment variable without the Zed tooling having to explicitly know about it, e.g., if they invoke with zq -ts=$TZ.

Zui Proposals

Addressing this data presentation topic in Zui is covered separately in brimdata/zui#1057.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant