Skip to content

PeriodicTasks with a TZ are not running #875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
radoshi opened this issue Apr 18, 2025 · 50 comments
Closed

PeriodicTasks with a TZ are not running #875

radoshi opened this issue Apr 18, 2025 · 50 comments

Comments

@radoshi
Copy link

radoshi commented Apr 18, 2025

Hi,

We noticed a failure on all periodic tasks with a timezone set, after moving to 2.8.0. We believe this a regression caused by #844

Any tasks that keyed to a CrontabSchedule set in a specific timezone are not being scheduled.

For example:

def _get_crontab_schedule(
    minute: str = "*",
    hour: str = "*",
    day_of_month: str = "*",
    month_of_year: str = "*",
    day_of_week: str = "*",
    timezone: zoneinfo.ZoneInfo = zoneinfo.ZoneInfo("UTC"),
) -> CrontabSchedule:
    return CrontabSchedule.objects.get_or_create(
        minute=minute,
        hour=hour,
        day_of_month=day_of_month,
        month_of_year=month_of_year,
        day_of_week=day_of_week,
        timezone=timezone,
    )
...
    schedule_every_day_7_am_pst, _ = _get_crontab_schedule(
        minute="0",
        hour="7",
        timezone=zoneinfo.ZoneInfo("America/Los_Angeles"),
    )
    PeriodicTask.objects.update_or_create(...)

This task did not get scheduled.

IntervalSchedules seem fine. CrontabSchedule with a UTC also seems fine.

We're happy to try to do a minimal repro, though its a bit complicated, if that would be helpful.

@radoshi
Copy link
Author

radoshi commented Apr 18, 2025

cc JinRiYao2001

@auvipy
Copy link
Member

auvipy commented Apr 21, 2025

Thanks for the report

@ZipBrandon
Copy link

@radoshi Thank you for identifying the crux of the issue. We are experience failures on our end, too. I could not find out the root-cause as I thought it was with my arguments being passed to tasks.

@gsvr
Copy link

gsvr commented Apr 22, 2025

@radoshi @auvipy We've experienced tasks not running as well, but it's a bit of a mystery what the underlying cause is, as I've had ones with timezones set that work, and UTC ones that don't. Below is a screenshot of our periodic tasks and when they last ran. We deployed 2.8.0 on 20 April.

Image

Edit: The second and third items on the screenshot should be ignored, they have correctly not run yet.

@ZipBrandon
Copy link

Here is my list with green checkmarks of what is running and red X's for what isn't running on time. I'm having to manually run the jobs. Some jobs update last runtime when I do it, some do not even though I see the output of the success of their runs in my data.

Image

@Azurency
Copy link

I'm experiencing the same issue with the new 2.8.0 release. It's also not quite clear why, it seems to happen to tasks with timezone and UTC, some of them are just not scheduled while others run fine.

@Azurency
Copy link

Azurency commented Apr 22, 2025

I think there is also an issue with #835 it is excluding tasks based on crontab hour without taking timezones into account. Reverting commit fc64741 seems to allow tasks with timezone to run on schedule again for me (but it's still not explaining why some UTC tasks are not scheduled).

@mrzorn
Copy link

mrzorn commented Apr 22, 2025

Just adding that we are experiencing the same issue, some of our PeriodicTasks did not run this morning but got picked up as soon as we downgraded to 2.7.0.

@zoetw88
Copy link

zoetw88 commented Apr 22, 2025

I wanted to bring up a potential concern with the exclusion logic in get_excluded_hours_for_crontab_tasks().

The current implementation excludes crontab tasks based on the server's local time, which might lead to unintended issues for UTC or timezone-aware tasks. Specifically:

  • UTC tasks should be compared using UTC time rather than the server's local time, to avoid the risk of accidentally excluding them.

  • Timezone-aware tasks are more likely to be affected since their scheduled hour is converted to the server’s local time before being compared against the exclusion window.

This approach may unintentionally skip tasks that are scheduled correctly based on their respective timezones.

Suggested Fix:

For UTC tasks, avoid converting them to the server's local time and compare directly against UTC time.

For timezone-aware tasks, convert them to the server's local time and then apply the exclusion check.

Performance Consideration:
I understand that this might introduce additional computation, especially when dealing with a large number of tasks. A possible way to mitigate this impact could be:

Cache the time zone conversion during task creation or update, so that we don't repeatedly convert the time on every exclusion check. This would keep the logic correct without incurring significant performance overhead.

I'd be happy to help with a patch to implement this change and create tests for it. Let me know if you'd like me to proceed with this.

@JinRiYao2001
Copy link
Contributor

JinRiYao2001 commented Apr 23, 2025

@radoshi
Thank you for reporting this. The PR I submitted only affects crontab periodic tasks where start_time is set and it's in the future. Could you please provide more code or a minimal reproduction to help investigate further?

@auvipy
Copy link
Member

auvipy commented Apr 23, 2025

we have this draft fix #877, can anyone try it ?

@zoetw88
Copy link

zoetw88 commented Apr 23, 2025

@auvipy I'm able to take care of this and run some quick tests based on my observations. However, I’ll be available on Friday or Saturday. Please let me know if that works for the timeline.

@auvipy
Copy link
Member

auvipy commented Apr 23, 2025

I am available on saturday as well

@michalpokusa
Copy link

michalpokusa commented Apr 23, 2025

We are also having problems, and I noticed that setting the cron for e.g. 59 14 * * * schedules the task while using e.g. 0 15 * * * does not work. Might be ralated to #875 (comment)

@Azurency
Copy link

Azurency commented Apr 23, 2025

I don't think #877 will be enough.
It is just making sure that the excluded hours are in UTC, which is needed for UTC tasks but as @zoetw88 said in #875 (comment) for timezone-aware tasks, there needs to be some kind of conversion, the simplest way is to convert in UTC.

This problem is that this line compares excluded hours (preferably UTC with the change in #877) to the hour field of crontab which is in the timezone the user specified, not in UTC

crontab__isnull=False, crontab__hour__in=exclude_hours

The UTC hour could be computed every query with something like that Azurency@ce00300 (this is just something I tried, to see if that worked on some tasks I was scheduling).
But I think the best solution would be to store the UTC hour and UTC minute inside the Crontab model like @zoetw88 said "during task creation or update".

@auvipy
Copy link
Member

auvipy commented Apr 24, 2025

it seems that, reverting that PR would be best option right now?

@auvipy
Copy link
Member

auvipy commented Apr 24, 2025

I don't think #877 will be enough. It is just making sure that the excluded hours are in UTC, which is needed for UTC tasks but as @zoetw88 said in #875 (comment) for timezone-aware tasks, there needs to be some kind of conversion, the simplest way is to convert in UTC.

This problem is that this line compares excluded hours (preferably UTC with the change in #877) to the hour field of crontab which is in the timezone the user specified, not in UTC

django-celery-beat/django_celery_beat/schedulers.py

Line 265 in cf04a7b
crontab__isnull=False, crontab__hour__in=exclude_hours

The UTC hour could be computed every query with something like that Azurency@ce00300 (this is just something I tried, to see if that worked on some tasks I was scheduling). But I think the best solution would be to store the UTC hour and UTC minute inside the Crontab model like @zoetw88 said "during task creation or update".

why don't you send a draft pr?

@mrzorn
Copy link

mrzorn commented Apr 24, 2025

With the current release version being quite broken, are there plans to revert the offending commit and release a hotfix and/or revert back to 2.7.0? It seems like a generally bad idea to have the latest/default version of the lib contain such an error that breaks the fundamental reason for using said lib.

@yidaoxiangan
Copy link

Encountering the same problem. All tasks in Periodic Tasks with tz aware crontab scheduler are not fetched in all_as_schedule(self) method for DatabaseScheduler after upgrading from 2.6.0 to 2.8.0. Downgrading to 2.7.0 works for our project. Bug may related to #835.

@databyte
Copy link

databyte commented Apr 25, 2025

I'm in a pickle as well because 2.7.0 doesn't support Django 5.2 so I'd have to back out of multiple things to make this work. A 2.8.1 hotfix removing the nonfunctioning optimization (fc64741) would take the pressure off so you can optimize the optimization with ample time and test coverage.

Activate with: source .venv/bin/activate
  × No solution found when resolving dependencies:
  ╰─▶ Because django-celery-beat==2.7.0 depends on django>=2.2,<5.2 and you require django==5.2, we can
      conclude that your requirements and django-celery-beat==2.7.0 are incompatible.
      And because you require django-celery-beat==2.7.0, we can conclude that your requirements are
      unsatisfiable.

@radoshi
Copy link
Author

radoshi commented Apr 25, 2025

Either option would be really helpful so we can get to Django 5.2:

  1. Revert to 2.7.0
  2. Hot fix to 2.8.1

@auvipy
Copy link
Member

auvipy commented Apr 26, 2025

I think there is also an issue with #835 it is excluding tasks based on crontab hour without taking timezones into account. Reverting commit fc64741 seems to allow tasks with timezone to run on schedule again for me (but it's still not explaining why some UTC tasks are not scheduled).

we got a new PR to fix this, #879 please feel free to review it. once this is ready we will make a hot fix early next week

@alirafiei75
Copy link
Contributor

@radoshi
@gsvr
@Azurency
@zoetw88
I would appreciate if anyone can review and test the new #879 PR. It is best if you can test the code in your development environments to make sure it works as expected.

@radoshi
Copy link
Author

radoshi commented Apr 27, 2025

@alirafiei75 with our current setup, it's non-trivial to test stuff outside production unfortunately.

@auvipy
Copy link
Member

auvipy commented Apr 28, 2025

@alirafiei75 with our current setup, it's non-trivial to test stuff outside production unfortunately.

you can just review it as well

@radoshi
Copy link
Author

radoshi commented Apr 28, 2025

@auvipy done. I generally don't review OSS stuff, this is the first, but I hope that's helpful. Please feel free to ignore any and all comments.

@databyte
Copy link

Running that branch in production has fixed our issue with the TZs.

Here's the entry in our .in file if anyone else wants to try:

django-celery-beat @ https://github.com/alirafiei75/django-celery-beat/archive/refs/heads/refactor/crontab-query-optimization.tar.gz

@Azurency
Copy link

I'm also having issues with 2.8.0 and ClockedSchedule (see #879 (comment)). Maybe it's better to release a quick 2.8.1 hotfix reverting #835 for now ?

@auvipy
Copy link
Member

auvipy commented Apr 29, 2025

@auvipy done. I generally don't review OSS stuff, this is the first, but I hope that's helpful. Please feel free to ignore any and all comments.

Thanks that was a very good review

@auvipy
Copy link
Member

auvipy commented Apr 29, 2025

I'm also having issues with 2.8.0 and ClockedSchedule (see #879 (comment)). Maybe it's better to release a quick 2.8.1 hotfix reverting #835 for now ?

Yes, will release a hotfix after the new PR is merged

@iwalucas
Copy link
Contributor

When should we see 2.8.1 published?

@jennifer-richards
Copy link

jennifer-richards commented Apr 30, 2025

We're seeing something that smells a lot like this issue - hopefully it's fixed, but in case not and for the record, we have several crontab schedules in the America/Los_Angeles time zone that have not been running. Several are like 9 0 * * * and one like 0 0 * * 0.

Another, */15 1-23 * * *, also in America/Los_Angeles has been running despite the time zone.

Edit to add: from skimming the PR and the discussion, it looks like this is all consistent with the issue.

@auvipy
Copy link
Member

auvipy commented May 1, 2025

should be fixed by #879 but please let us know if otherwise.

@radoshi
Copy link
Author

radoshi commented May 1, 2025

@auvipy thank you! Will you be releasing 2.8.1 with this fix soon?

@auvipy
Copy link
Member

auvipy commented May 1, 2025

Yes

@databyte
Copy link

databyte commented May 1, 2025

Also note, since this includes optimization code our heartbeat checker failed.

This no longer triggers every few secs:

import tempfile
from pathlib import Path

from django_celery_beat.schedulers import DatabaseScheduler

HEARTBEAT_FILE = Path(tempfile.gettempdir()) / "scheduler_heartbeat"


class MyDatabaseScheduler(DatabaseScheduler):
    """Custom scheduler that touches a heartbeat file on each tick (every few secs)."""

    def tick(self, *args, **kwargs):
        interval = super().tick(*args, **kwargs)
        HEARTBEAT_FILE.touch()
        return interval

That statement about every few secs is now:

"""Custom scheduler that touches a heartbeat file every 5 minutes (SCHEDULE_SYNC_MAX_INTERVAL)."""

To anyone wanting to check that with a k8s probe, we run this script:

#!/usr/bin/env bash

HEARTBEAT_FILE="/tmp/scheduler_heartbeat"
STALE_THRESHOLD=330  # seconds: how long before we consider the heartbeat stale; 330 seconds = 5.5 minutes

# 1. Check if the heartbeat file exists
if [ ! -f "$HEARTBEAT_FILE" ]; then
  echo "Beat heartbeat file does not exist!"
  exit 1
fi

# 2. Check how recently the heartbeat file was updated
#    (We're assuming a Linux environment; 'stat -c %Y' gives the file's mtime in seconds)
NOW=$(date +%s)
MTIME=$(stat -c %Y "$HEARTBEAT_FILE") 2>/dev/null

if [ -z "$MTIME" ]; then
  echo "Cannot determine heartbeat file mtime!"
  exit 1
fi

AGE=$((NOW - MTIME))
if [ "$AGE" -gt "$STALE_THRESHOLD" ]; then
  echo "Celery Beat heartbeat is stale ($AGE seconds old)!"
  exit 1
fi

echo "Celery Beat is healthy (heartbeat updated $AGE seconds ago)."
exit 0

@iwalucas
Copy link
Contributor

iwalucas commented May 1, 2025

@auvipy when should we expect 2.8.1? Anyway we can use the fix until the official relase is out?

@pawel-steto
Copy link

I am having a similar issue in 2.7.0, after some time the scheduler stops working at random, I have detailed the issue here: celery/celery#9622

Do you think it is related ?

@auvipy
Copy link
Member

auvipy commented May 6, 2025

This issue should be fixed in main branch, you check

@radoshi
Copy link
Author

radoshi commented May 6, 2025

@auvipy any eta for dot release?

@auvipy
Copy link
Member

auvipy commented May 6, 2025

@auvipy any eta for dot release?

Got another related pr, so Hopefully after merging that

@auvipy
Copy link
Member

auvipy commented May 7, 2025

Hi guys! we got another PR which is related #886! would love to have your thoughts and reviews on that

@pawel-steto
Copy link

@auvipy I can hardly test this on main as my issue only appears on prod at random

@DamianB-BitFlipper
Copy link

Hi @auvipy . Thanks to everyone for addressing this issue in a timely manner. This surely is a nasty bug.

When reading through the thread, the issues we are experiencing with some tasks running and others not describes exactly what we've been noticing in our system. I am eager for version 2.8.1. When should we expect this release? Moreover, do you have a suggested stopgap solution for the meanwhile since downgrading to 2.7.0 doesn't work for us because it does not support Django 5.2?

@jennifer-richards
Copy link

jennifer-richards commented May 8, 2025

Moreover, do you have a suggested stopgap solution for the meanwhile since downgrading to 2.7.0 doesn't work for us because it does not support Django 5.2?

We switched our schedules to the equivalent in UTC and this avoids the issue. (If DST shifts are important to your tasks this might not be ideal, I suppose, but most time zones won't shift relative to UTC for a few months...)

I'm not certain that this works for all possible server timezones though.

@smiling-watermelon
Copy link

It's my understanding that all PRs related to this issue were merged, so, is there some planned release date for 2.8.1?
Or perhaps there's some more work that needs to be done before that?

@harmjan
Copy link

harmjan commented May 12, 2025

While working towards 2.8.1, can 2.8.0 be yanked on pypi to indicate it is broken?

@mrzorn
Copy link

mrzorn commented May 12, 2025

While working towards 2.8.1, can 2.8.0 be yanked on pypi to indicate it is broken?

Agreed - frankly cannot believe 2.8.0 is still available in such a broken state. Seems like a bad move.

@auvipy
Copy link
Member

auvipy commented May 13, 2025

we got a new release https://pypi.org/project/django-celery-beat/2.8.1/ and going to yank the 2.8.0. thanks to all who reported, reviewed, contributed code/test, and also shared your valuable opinion.

@auvipy auvipy closed this as completed May 13, 2025
@auvipy
Copy link
Member

auvipy commented May 20, 2025

hi guy! would you mind checking this PR and report that this do not create any regression #896 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests