Skip to content

Add prek hook to auto-update copyright year in NOTICE files#67146

Open
xchwan wants to merge 9 commits into
apache:mainfrom
xchwan:prek_range_year_current
Open

Add prek hook to auto-update copyright year in NOTICE files#67146
xchwan wants to merge 9 commits into
apache:mainfrom
xchwan:prek_range_year_current

Conversation

@xchwan

@xchwan xchwan commented May 19, 2026

Copy link
Copy Markdown
Contributor

issue: #60540

Airflow has 170+ NOTICE files that required manual year updates annually (e.g. 2016-2025 → 2016-2026) with no automation in place.

Adds update-notice-year hook


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)
  • Claude Code Sonnet 4.6

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

Signed-off-by: Xch1 <qchwan@gmail.com>
@boring-cyborg boring-cyborg Bot added area:dev-tools backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch labels May 19, 2026
@xchwan xchwan changed the title feat:update_notice_year Add prek hook to auto-update copyright year in NOTICE files May 19, 2026
@potiuk

potiuk commented May 19, 2026

Copy link
Copy Markdown
Member

This will have the undesirble side effect that all PRs will be failing 01.01.2027 when everyone who can help might be partying hard (and not wake up early next day).

I am not sure we want it. If anything - this could be manual prek hook, and we should have workflow scheduled maybe every week or so to check if the year changed and post notification on slack in the #internal-* channel like other notifications. The issue is that it's next to impossible to test before Jan 2027

@xchwan

xchwan commented May 20, 2026

Copy link
Copy Markdown
Contributor Author

I think the simplest fix is to strip the year check out of check_notice_files.py and keep only the ASF declaration check. The scheduled notification approach still requires someone to manually run update_notice_year.py at the end anyway. And if the update is done in late December, Copyright 2016-2027 would be incorrect while still in 2026, which check_notice_files.py would then reject — update early and the year is wrong, wait until 01/01 and everyone is at a New Year's party. So:

  • pre-commit only blocks genuine errors where the ASF declaration is missing entirely
  • PRs never fail due to year drift
  • Someone can run update_notice_year.py when they remember, and nothing explodes if they don't

How do you think about this?

@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label May 24, 2026
Signed-off-by: Xch1 <qchwan@gmail.com>
@xchwan

xchwan commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

I went ahead and removed the year check from check_notice_files.py in this PR. My concern is that enforcing the current year in CI guarantees a failure on January 1st every year unless someone manually updates and merges the year bump before then — a chicken-and-egg problem where the fix requires a PR to pass CI, but CI is broken until the fix lands.

The ASF declaration check (The Apache Software Foundation) is retained. For the year itself, I think it's better left to the release manager to run prek run update-notice-year --all-files before the first release of each new year. Happy to hear if there's a better approach.

@jason810496 jason810496 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, thanks.

Comment thread scripts/ci/prek/update_notice_year.py Outdated
Comment on lines +52 to +58
print(f"⚠️ {path}: no standard ASF copyright line found, skipping")
return False
if match.group(2) == CURRENT_YEAR:
return False
new_content = COPYRIGHT_RE.sub(rf"\g<1>{CURRENT_YEAR}\3", content)
path.write_text(new_content)
print(f"✅ {path}: updated {match.group(2)} → {CURRENT_YEAR}")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: Though there're already some prek check included the emoji. It would be better to avoid adding more of them if possible.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emoji removed

Comment on lines +67 to +69
notice_files = sorted(
f for f in repo_root.rglob("NOTICE") if not any(part in EXCLUDE_DIRS for part in f.parts)
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to sort those files before the validation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorted() ensures the output order is deterministic across different operating systems and filesystems, since rglob() does not guarantee a consistent traversal order. Without it, the list of updated files would appear in a different order on each run, making the output harder to read and diff.

Comment thread scripts/ci/prek/check_notice_files.py Outdated
Comment on lines 33 to 49
from __future__ import annotations

import sys
from datetime import datetime
from pathlib import Path

CURRENT_YEAR = str(datetime.now().year)
ASF_DECLARATION = "The Apache Software Foundation"

errors = 0

for notice_file in sys.argv[1:]:
content = Path(notice_file).read_text()

expected = f"Copyright 2016-{CURRENT_YEAR} The Apache Software Foundation"
if "Copyright" in content and expected not in content:
print(f"❌ {notice_file}: Missing expected string: {expected!r}")
if ASF_DECLARATION not in content:
print(f"❌ {notice_file}: Missing expected string: {ASF_DECLARATION!r}")
errors += 1

sys.exit(1 if errors else 0)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about keeping this file as-is but only add the logging to show that we need to run prek run update-notice-year --all-files before exit 1.

Showing the next step to fix the check would be better IMO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit concerned — even with the helpful message, all PRs will still be blocked on January 1st until someone runs the command and gets a fix merged. The error message makes it clearer how to fix it, but doesn't prevent the CI from failing in the first place. That's what Jarek motions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants