Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error handling timezones #305

Open
Jmbols opened this issue May 3, 2024 · 7 comments
Open

[BUG] Error handling timezones #305

Jmbols opened this issue May 3, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@Jmbols
Copy link

Jmbols commented May 3, 2024

Description
There is an error trying to construct an update patch when the x-axis are dates with a specified timezone.
The error is when trying to compare timezones. Pandas pd.to_datetime() by default will convert a timezone to a fixed off-set, whereas the timezone in the x-axis has a different format. The off-set is the same because the data is created based on the same timezone.

/.pyenv/versions/3.11.1/envs/clearview-dash-311/lib/python3.11/site-packages/plotly_resampler/aggregation/plotly_aggregator_parser.py", line 41, in to_same_tz
assert ts.tz.str() == reference_tz.str()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Reproducing the bug 🔍
This code snippet reproduces the bug

import pandas as pd
import numpy as np
import plotly.graph_objects as go

from plotly_resampler import FigureResampler


fig = FigureResampler()

x = pd.date_range("2024-04-01T00:00:00", "2025-01-01T00:00:00", freq="H")
x = x.tz_localize("Asia/Taipei")
y = np.random.randn(len(x))

fig.add_trace(
    go.Scattergl(x=x, y=y, name="demo", mode="lines+markers"),
    max_n_samples=int(len(x) * 0.2),
)

relayout_data = {
    "xaxis.range[0]": "2024-04-27T08:00:00+08:00",
    "xaxis.range[1]": "2024-05-04T17:15:39.491031+08:00",
}

fig.construct_update_data_patch(relayout_data)

Environment information

  • OS: Ubuntu 22.04
  • Python version: 3.11
  • plotly-resampler environment: python and dash
  • plotly-resampler version: 0.9.2
@Jmbols Jmbols added the bug Something isn't working label May 3, 2024
@Jmbols
Copy link
Author

Jmbols commented May 3, 2024

Can be fixed by tz_convert before passing relayout_data to fig.construct_update_data_patch(relayout_data), but the default behaviour interacting with dash is this error.

@Jmbols
Copy link
Author

Jmbols commented May 7, 2024

But fix only works when there is no switch to DST. Timezone Canada/Pacific, for example, changes timezone upon switch to and from DST, so if the above code is run like

import pandas as pd
import numpy as np
import plotly.graph_objects as go

from plotly_resampler import FigureResampler


fig = FigureResampler()

x = pd.date_range("2024-04-01T00:00:00", "2025-01-01T00:00:00", freq="H")
x = x.tz_localize("UTC")
x = x.tz_convert("Canada/Pacific")
y = np.random.randn(len(x))

fig.add_trace(
    go.Scattergl(x=x, y=y, name="demo", mode="lines+markers"),
    max_n_samples=int(len(x) * 0.2),
)

relayout_data = {
    "xaxis.range[0]": pd.Timestamp("2024-03-01T00:00:00").tz_localize("Canada/Pacific"),
    "xaxis.range[1]": pd.Timestamp("2024-03-31T00:00:00").tz_localize("Canada/Pacific"),
}

fig.construct_update_data_patch(relayout_data)

you get the error:
site-packages/plotly_resampler/aggregation/plotly_aggregator_parser.py", line 81, in get_start_end_indices
assert start.tz == end.tz
^^^^^^^^^^^^^^^^^^

Is there a reason not to use assert start.tz.__str__() == end.tz.__str__()? That would solve the assertion error at least with DST if the name of the timezone is the same.

@DHRUVCHARNE
Copy link

You can also try using the pytz library to handle timezone conversions and DST transitions. Here's an example:

import pytz

...

x = pd.date_range("2024-04-01T00:00:00", "2025-01-01T00:00:00", freq="H")
x = x.tz_localize("UTC")
x = x.tz_convert(pytz.timezone("Canada/Pacific"))

...

relayout_data = {
"xaxis.range[0]": pd.Timestamp("2024-03-01T00:00:00").tz_localize(pytz.timezone("Canada/Pacific")),
"xaxis.range[1]": pd.Timestamp("2024-03-31T00:00:00").tz_localize(pytz.timezone("Canada/Pacific")),
}

@jonasvdd jonasvdd self-assigned this Sep 5, 2024
@jonasvdd
Copy link
Member

jonasvdd commented Sep 9, 2024

@Jmbols, @DHRUVCHARNE,

I tried to fix this behavior in #318 by catching the legacy tz-string assert (see ⬇️), and then compare for offsets (see ⬇️ ⬇️ )

However, this introduces the possibly unwanted behavior, that different timezones with the same offset, are considered valid. (e.g. "Europe/Brussels" and "Europe/Amsterdam" are two different timezone objects / strings, but with the same offset -> so they are considered as equal.)

This is also expressed in the following tests:

def test_time_tz_slicing_different_timestamp():
# construct a time indexed series with UTC timezone
n = 60 * 60 * 24 * 3
dr = pd.Series(
index=pd.date_range("2022-02-14", freq="s", periods=n, tz="UTC"),
data=np.random.randn(n),
)
# create multiple other time zones
cs = [
dr,
dr.tz_localize(None).tz_localize("Europe/Amsterdam"),
dr.tz_convert("Europe/Lisbon"),
dr.tz_convert("Australia/Perth"),
dr.tz_convert("Australia/Canberra"),
]
for i, s in enumerate(cs):
t_start, t_stop = sorted(s.iloc[np.random.randint(0, n, 2)].index)
t_start = t_start.tz_convert(cs[(i + 1) % len(cs)].index.tz)
t_stop = t_stop.tz_convert(cs[(i + 1) % len(cs)].index.tz)
# As each timezone in CS tz aware, using other timezones in `t_start` & `t_stop`
# will raise an AssertionError
with pytest.raises(AssertionError):
hf_data_dict = construct_hf_data_dict(s.index, s.values)
start_idx, end_idx = PlotlyAggregatorParser.get_start_end_indices(
hf_data_dict, hf_data_dict["axis_type"], t_start, t_stop
)
# THESE have the same timezone offset -> no AssertionError should be raised
cs = [
dr.tz_localize(None).tz_localize("Europe/Amsterdam"),
dr.tz_convert("Europe/Brussels"),
dr.tz_convert("Europe/Oslo"),
dr.tz_convert("Europe/Paris"),
dr.tz_convert("Europe/Rome"),
]
for i, s in enumerate(cs):
t_start, t_stop = sorted(s.iloc[np.random.randint(0, n, 2)].index)
t_start = t_start.tz_convert(cs[(i + 1) % len(cs)].index.tz)
t_stop = t_stop.tz_convert(cs[(i + 1) % len(cs)].index.tz)
hf_data_dict = construct_hf_data_dict(s.index, s.values)
start_idx, end_idx = PlotlyAggregatorParser.get_start_end_indices(
hf_data_dict, hf_data_dict["axis_type"], t_start, t_stop
)

I would like to hear your opinion on this matter before continuing on this PR.

@jonasvdd
Copy link
Member

@Jmbols @DHRUVCHARNE, any thoughts/remarks on my above comment?

@Jmbols
Copy link
Author

Jmbols commented Oct 24, 2024

This is the behaviour I would expect. "Europe/Brussels" and "Europe/Amsterdam" are equivalent for all intents and purposes. The main issue I can see with that is if they have different dates when switching to and from DST, but presumably this would be caught by the offset check?

@mhangaard
Copy link

Would it be easier if everything was converted to UTC first, then calculate, then at the last moment before returning a Patch, convert back to the original timezone?

jonasvdd added a commit that referenced this issue Mar 6, 2025
* fix: check if update_data contains update before batch_update

* add test + avoid same error when verbose=True

* 🧹 create _hf_data_container if correct trace type

* 🙏 python 3.7 not supported on Apple Silicon

* remove WIP

* 🖊️ more verbose asserts

* 🖊️ more verbose asserts

* 🙏 more sleep time

* 🙏

* 🙌

* 🤔 fix for [BUG] Error handling timezones #305

* 🙈 linting

* 💨 Refactor timezone handling in PlotlyAggregatorParser

* Update minmax operator image

Signed-off-by: Emmanuel Ferdman <[email protected]>

* Drop duplicate sentence

* Feat/plotly6 (#338)

* Parametrize test_utils.py on is_figure

* 🔍 remove dtype parsing as orjon>3.10 supports float16 #118

* 💪 refactor: streamline JupyterDash integration and remove unused persistent inline logic

* 💨 move construct_update_data_patch method into the FigureResampler class

* 🐐 refactor: enhance test utilities and add support for Plotly>=6 data handling

* 🙏 enhance serialization tests for plotly>6

* 📝 remove debug print statement and enhance type handling for hf_x

* 🔒 update dependency versions in pyproject.toml to Support plotly 6 #334

* 🔍 drop python3.7 CI workflow and upgrade upload-artifact action

* 🙏 fix pickling of figurewidget resampler

* 🙏 fix tests

* 💨 migration of code towards new upload artifact

* 💪 enhance CI workflow to improve test result uploads and add retention settings

* 🕳️ fix: ensure correct dtype handling for aggregated x indices in PlotlyAggregatorParser

* ⬆️ chore: update dependency constraints for pandas and pyarrow in pyproject.toml

* 🙈 fix linting

* 🔍 fix: correct spelling in streamlit_app.py comments and update dash-extensions and pyarrow versions in requirements.txt

* ⬆️ chore: update ipywidgets version constraint to allow for newer versions

* 🚧 test: set random seed for reproducibility in test_wrap_aggregate

* 🙈 chore: update ipywidgets version constraint for serialization support

* 🙈

* 🔍 ci: conditionally skip tests on Python 3.12 for Ubuntu (as it keeps hanging in github actions)

* 🔍 ci: exclude Python 3.12 on Ubuntu from test matrix to prevent hangs

* 🖊️ review code

* 🧹 cleanup comments

---------

Co-authored-by: Maxim Ivanov <[email protected]>
Co-authored-by: jeroen <[email protected]>

* 🤔 fix for [BUG] Error handling timezones #305

* 🙈 linting

* 💨 Refactor timezone handling in PlotlyAggregatorParser

* 📌 bug: Fix timezone handling for DST in PlotlyAggregatorParser and update tests

---------

Signed-off-by: Emmanuel Ferdman <[email protected]>
Co-authored-by: jvdd <[email protected]>
Co-authored-by: Jeroen Van Der Donckt <[email protected]>
Co-authored-by: Emmanuel Ferdman <[email protected]>
Co-authored-by: Maxim Ivanov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants