Skip to content

Integration tests: Green assessment due to false green DAG #654

@Maleware

Description

@Maleware

When you have a logging or smoke test running at current main it would turn tests green although the DAG has thrown an error.

Given:
on current main before #653
and running make run-dev for the airflow operator.

running ./scripts/run-tests --test logging_airflow-3.0.1_openshift-false_executor-celery --skip-release --skip-delete

will produce a cluster state where airflow can not connect to the worker and will throw an error in the activated DAG:

Log message source details: sources=["Could not read served logs: HTTPConnectionPool(host='airflow-worker-custom-log-config-0', port=8793): Max retries exceeded with url: /log/dag_id=example_trigger_target_dag/run_id=manual__2025-07-09T17:32:56.030514+00:00/task_id=run_this/attempt=1.log (Caused by NameResolutionError(\"<urllib3.connection.HTTPConnection object at 0xffff603eaf30>: Failed to resolve 'airflow-worker-custom-log-config-0' ([Errno -2] Name or service not known)\"))"]
::group::Log message source details: sources=["Could not read served logs: HTTPConnectionPool(host='airflow-worker-custom-log-config-[0](http://localhost:8080/dags/example_trigger_target_dag/runs/manual__2025-07-09T17:32:56.030514+00:00/tasks/run_this?try_number=1#0)', port=8793): Max retries exceeded with url: /log/dag_id=example_trigger_target_dag/run_id=manual__2025-07-09T17:32:56.030514+00:00/task_id=run_this/attempt=1.log (Caused by NameResolutionError(\"<urllib3.connection.HTTPConnection object at 0xffff603eaf30>: Failed to resolve 'airflow-worker-custom-log-config-0' ([Errno -2] Name or service not known)\"))"]

However we would find a state reported:


State | success  <----- Success despite error
-- | --
Run ID | manual__2025-07-09T17:32:56.030514+00:00Copy
Run Type | manual
Run Duration | 2.87s
Last Scheduling Decision | 2025-07-09, 19:32:59
Queued at | 2025-07-09, 19:32:56
Start Date | 2025-07-09, 19:32:56
End Date | 2025-07-09, 19:32:59
Data Interval Start | 2025-07-09, 19:32:44
Data Interval End | 2025-07-09, 19:32:44
Trigger Source | rest_api

Which then would lead to

    logger.go:42: 19:34:13 | logging_airflow-3.0.1_openshift-false_executor-celery | skipping kubernetes event logging
=== NAME  kuttl
    harness.go:403: run tests finished
    harness.go:510: cleaning up
    harness.go:567: removing temp folder: ""
--- PASS: kuttl (294.10s)
    --- PASS: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/logging_airflow-3.0.1_openshift-false_executor-celery (294.07s)
PASS

a postiive result in the integration test. The same is true for at least the smoke test.

I consider this to be quite frightening and we should investigate further.

Metadata

Metadata

Assignees

Labels

Type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions