Replies: 11 comments
-
Could you try this on 2.0.0rc1? |
Beta Was this translation helpful? Give feedback.
-
OH I just saw the nice repro script you gave! Trying this myself. |
Beta Was this translation helpful? Give feedback.
-
This seems better on 2.0.0rc1: Testing with 2s (a bit noisy)
Test with 1s
So possible we can close this -- but you tell me. |
Beta Was this translation helpful? Give feedback.
-
It seems much better on Airflow 2.0.0rc1, though -interestingly- with 2s value it is less accurate than with 1s; one would think that 1s should be less accurate because the scheduler loop is more likely to take more than the heartbeat interval. I will give it a try with different values and update the issue with my findings accordingly, but if you feel this is not an issue, feel free to resolve. |
Beta Was this translation helpful? Give feedback.
-
What is the easiest way to install Airflow 2.0.0rc1 and StatsD dependencies? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
I repeated the experiment on Airflow 2.0.0RC2 and produced more statistical data which you can find in the Excel sheet below, but here is a summary:
I tried this on the same machine I mentioned above (Amazon r5.4xlarge machine) so it is pretty powerful and I did confirm there isn't much load on the CPU (below 10% which is mainly the use of Airflow). I can retry this on a personal laptop if you feel you don't have strong confidence about results generated from a single machine (which, admittedly, I also feel so.) I cannot tell whether this is just a metrics issue or not, but I did look at the code and I do feel it is an actual scheduling issue, not just metrics (though I must admit my understanding of Airflow code base is still limited.) In my opinion, this justifies some investigation to see what is going on. In particular, I would like to suggest:
I can help with this investigation if you agree with me that it is important to do (though probably won't be able to do so before the new year). Otherwise, feel free to resolve (though I still think at least point 4 above is important if we think that scheduler interval accuracy is not important.) Statistical Data for Different RunsBelow is a snapshot of the Excel sheet I mentioned above. I can upload the Excel sheet file itself if you like, in which case please advise where I should upload it to. |
Beta Was this translation helpful? Give feedback.
-
I repeated the experiment using Airflow 2.0 on my personal laptop, which is a relatively powerful laptop with Intel 10750H 5.0 GHz 6-core/12-thread, 16 GB RAM. Below are some samples:
For some reason the 2 and 3 are noticeably bad. The rest are better, but there is still some 0.1~0.3 sec lag. |
Beta Was this translation helpful? Give feedback.
-
Is this issue reproducible on latest Airflow version? |
Beta Was this translation helpful? Give feedback.
-
I am not sure, I only checked on the versions mentioned above. |
Beta Was this translation helpful? Give feedback.
-
Then converting that into discussion until the error is confirmed in the latest version. There were few 100 fixes since 2.0 so it is easiest @rafidka if you verify it there - we have 2.4.0b1 released today so this is a great opprortunity to recheck it and see if the problem is still there - at the same time helping the community to test 2.4.0 release. |
Beta Was this translation helpful? Give feedback.
-
Apache Airflow version: 1.10.12
Kubernetes version (if you are using kubernetes) (use
kubectl version
): N/AEnvironment:
r5.4xlarge
EC2 instance.uname -a
): 5.4.58-37.125.amzn2int.x86_64 (viauname -r
)What happened:
I am experiencing some strange issue with scheduler heartbeat. If I configure it to 5 seconds, somehow the heartbeat metric is received every 6 seconds. I tried different values and what I noticed is that with values of 5 or less, the heartbeat is received one second later than expected. But with 6 or more, the heartbeat is received at the expected time (though for 2 and 6 I experienced an even stranger fluctuating behaviour). Below is a table of the values I tried and the frequencies I received:
What you expected to happen:
I expect the
airflow.scheduler_heartbeat
metric to be received at the same frequency specified by thescheduler_heartbeat_sec
configuration.How to reproduce it:
This is an example output when I set
scheduler_heartbeat_sec
to 5:And this is another example when I set
scheduler_heartbeat_sec
to 6:Notice that this time it fluctuates between 6 and 8 for some reason.
Now, setting
scheduler_heartbeat_sec
to 10, here is a much better stable output with the expected frequency:Anything else we need to know:
Yes, you are awesome, but you might already know this.
Beta Was this translation helpful? Give feedback.
All reactions