Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pre-commit Copyright Check #181

Merged
merged 16 commits into from
Feb 1, 2024
Merged

Conversation

YanxuanLiu
Copy link
Collaborator

@YanxuanLiu YanxuanLiu commented Jan 24, 2024

fix #169

Add auto-copyright scripts to check and auto-correct copyright in pre-commit process.

fix #179

Add Check License Headers Action to check if files have a license header.
You can define the license check rules in .github/workflows/license-check/license-check.py

Signed-off-by: YanxuanLiu <[email protected]>
@gerashegalov
Copy link
Collaborator

gerashegalov commented Jan 24, 2024

We also need to run a required check for missing copyrights in a self-hosted CI or in a GH runner.

There may be better solutions like Maven RAT and pre-commit hooks but I used the following command to find files with missing headers for #180 git ls-files | xargs -I% grep -Hc 'Licensed under the Apache License' "%" | grep ':0'

yielding

.github/ISSUE_TEMPLATE/bug_report.md:0
.github/ISSUE_TEMPLATE/documentation-request.md:0
.github/ISSUE_TEMPLATE/feature_request.md:0
.github/ISSUE_TEMPLATE/submit-question.md:0
CODE_OF_CONDUCT.md:0
CONTRIBUTING.md:0
NOTICE:0
README.md:0
SECURITY.md:0
TPC EULA.txt:0
nds/README.md:0
nds/tpcds-gen/README.md:0
nds/tpcds-gen/patches/code.patch:0
nds/tpcds-gen/patches/templates.patch:0

If it is confirmed as a legitimate set of files without license headers, in this PR we should store it as the list of excluded files / wildcards. The PR check will then look if an additional violation is introduced that is not covered by the exclusion list.

@gerashegalov
Copy link
Collaborator

This PR looks more like a fix to #169 rather than #179 about license comments missing completely.

Signed-off-by: YanxuanLiu <[email protected]>
@YanxuanLiu YanxuanLiu marked this pull request as draft January 25, 2024 04:07
Signed-off-by: YanxuanLiu <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
@YanxuanLiu
Copy link
Collaborator Author

YanxuanLiu commented Jan 25, 2024

Added Check License Headers
The action result lists all files without license header, and files excluded.

Run viperproject/check-license-header@v2
Warning: Config does not cover the file 'CODE_OF_CONDUCT.md'
Warning: Config does not cover the file 'CONTRIBUTING.md'
Warning: Config does not cover the file 'nds/README.md'
Warning: Config does not cover the file 'nds/tpcds-gen/README.md'
Warning: Config does not cover the file 'README.md'
Warning: Config does not cover the file 'SECURITY.md'
Error: 'nds/tpcds-gen/patches/code.patch' does not contain license from '.github/workflows/license-check/license-Apache.txt'
Error: 'nds/tpcds-gen/patches/templates.patch' does not contain license from '.github/workflows/license-check/license-Apache.txt'
Error: 'NOTICE' does not contain license from '.github/workflows/license-check/license-Apache.txt'
Error: 'TPC EULA.txt' does not contain license from '.github/workflows/license-check/license-Apache.txt'
Error: 4 error(s) and [6](https://github.com/YanxuanLiu/spark-rapids-benchmarks/actions/runs/7650544819/job/20846798019#step:3:7) warning(s) found. Warnings are treated as errors.

@YanxuanLiu YanxuanLiu marked this pull request as ready for review January 25, 2024 06:13
NvTimLiu
NvTimLiu previously approved these changes Jan 25, 2024
Copy link
Collaborator

@NvTimLiu NvTimLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@NvTimLiu
Copy link
Collaborator

I think we can also application these license check actions into other GitHub spark projects like spark-rapids-jni/spark-rapids/spark-rapids-ml/spark-rapids-examples, etc.

@gerashegalov
Copy link
Collaborator

+1, cc @GaryShen2008 regarding common build infra offline conversation on this topic

I think we can also application these license check actions into other GitHub spark projects like spark-rapids-jni/spark-rapids/spark-rapids-ml/spark-rapids-examples, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may first start with our own shell-script based check to the tune of #181 (comment)

because this check is not yet approved to run in our account

Error: .github#L1
viperproject/check-license-header@v2 is not allowed to be used in NVIDIA/spark-rapids-benchmarks. Actions in this workflow must be: within a repository that belongs to your Enterprise account

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's too bad, let me build another script for this check.

Signed-off-by: YanxuanLiu <[email protected]>
@YanxuanLiu
Copy link
Collaborator Author

YanxuanLiu commented Jan 30, 2024

Replaced third party function with .github/workflows/license-check/license-check.py

Now the action can run as expected.

@gerashegalov
Copy link
Collaborator

We can see if the check is working in the status of the PR introducing it. This iteration is not a complete fix https://github.com/NVIDIA/spark-rapids-benchmarks/actions/runs/7707505941/job/21004813999?pr=181

We need more excludes

ERROR: ./nds/tpcds-gen/patches/templates.patch does not contain license header.
ERROR: ./NOTICE does not contain license header.
ERROR: ./nds/tpcds-gen/patches/code.patch does not contain license header.
ERROR: ./TPC EULA.txt does not contain license header.

Signed-off-by: YanxuanLiu <[email protected]>
@YanxuanLiu
Copy link
Collaborator Author

We can see if the check is working in the status of the PR introducing it. This iteration is not a complete fix https://github.com/NVIDIA/spark-rapids-benchmarks/actions/runs/7707505941/job/21004813999?pr=181

We need more excludes

ERROR: ./nds/tpcds-gen/patches/templates.patch does not contain license header.
ERROR: ./NOTICE does not contain license header.
ERROR: ./nds/tpcds-gen/patches/code.patch does not contain license header.
ERROR: ./TPC EULA.txt does not contain license header.

Added more excludes: NOTICE, TPC EULA.txt, **/*.patch. For TPC EULA.txt, I'm not sure if we should exclude this file or **/*.txt

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@YanxuanLiu YanxuanLiu merged commit df7fa26 into NVIDIA:dev Feb 1, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants