Skip to content
This repository has been archived by the owner on Jul 25, 2024. It is now read-only.

ci/tasks.py: offload testjob post processing to its own task #1115

Merged
merged 1 commit into from
Dec 12, 2023

Commits on Dec 7, 2023

  1. ci/tasks.py: offload testjob post processing to its own task

    The reason for having this is for deployments of SQUAD on auto-scalable
    systems such as Kubernetes. When the load in SQUAD is high, Kubernetes
    creates new replicas of workers to consume from the queue.
    
    When the load is back to low, Kubernetes starts trimming workers no longer
    being used. There is a very specific corner case with this approach though.
    
    When Kubernetes trims a worker, it sends SIGTERM to it and wait 30s by default
    for the worker to self terminate. In Linaro's deployment of SQUAD, there is a
    particular kind of test job that comes from Android CTS/VTS. They are huge and
    take a lot more than 30s to finish. If the worker is not finished by the 30s
    mark, Kubernetes sends SIGKILL to it and it dies abruptly, causing inconsistencies.
    
    Yes we can increase the 30s timeout, but if SQUAD is under heavy load, increasing
    the timeout might still cause inconsistency if the worker doesn't self terminate
    in that timeout.
    
    The solution fo this problem is the creation of a new queue called 'ci_fetch_postprocess'.
    Deployments with great load should then create a different kind of worker that
    never dies and does not auto-scale, thus eliminating the problem completely.
    
    Tasks in 'ci_fetch_postprocess' are the plugin ones, which are the culprit of the issue.
    chaws committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    6805375 View commit details
    Browse the repository at this point in the history