Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed automerge guideline: Require at least X lines of Julia code in /src, test, docs/readme/docstrings #351

Open
ericphanson opened this issue Mar 8, 2021 · 9 comments · May be fixed by #492

Comments

@ericphanson
Copy link
Member

ericphanson commented Mar 8, 2021

X=5 was suggested on Slack by @oxinabox.

If anyone thinks we should not have such a check or would like to bikeshed the value of X, please comment here! (Would rather get that out of the way before implementing it...)

Lyndon also suggested a nice error message:

Error message can be something that's says:

  • you seem to have registered a near empty package. Did you forget to push your commits?
  • Name squatting (E.g. for a package you plan to write later) is no permitted.
    (Also need to add this to the readme)
    please reregregister once the package contains usable functionality.
    If we have detected this incorrectly (e.g. you package is short but useful) please comment for manual merge.
@ericphanson
Copy link
Member Author

ericphanson commented Oct 28, 2022

I think we should do all 3 of:

  • src must have at least X lines of Julia code
  • test has a runtests.jl file and at least Y lines of code
  • readme or docs or docstrings have at least Z lines of text or code

Currently, I haven't found a source-code counter that works on Julia docstrings correctly, so I think we could omit the docs one to start.

In terms of the numbers, on Slack @rmsrosa suggested X > 5, I suggested 10 for all of them, @StefanKarpinski suggested 20, and @GunnarFarneback suggested calibrating against the existing packages to see the cutoffs.

In https://julialang.org/blog/2021/08/general-survey/ we did some of that but we don't have all the info we'd want there, and also the number of packages has grown a lot since then. But if we do go off of those numbers, only 84% of packages have at least 20 lines of tests, and 89% have at least 10 lines. We would have to look at some examples to see if those are "OK" or should be considered "non-compliant" with the intention of the new rules.

I think also we should require this only for new packages (not new versions). Why?

  • New packages are the most likely violators
  • New versions generally add lines overall rather than remove them
  • If we want to manually override the check (e.g. for some light wrapper package, say), then we only have to do it once, for the new package registration, rather than for every version

IMO we could add new versions later, if we figure out an "overrides" system to persist an allowlist that a package is OK that it doesn't have enough lines of src/test/whatever. But to start with I think new packages is already something useful.

@DilumAluthge
Copy link
Member

Yeah, new packages are the most common place where we notice people registering essentially empty packages.

@ericphanson
Copy link
Member Author

In terms of implementation, I already started preparing for this when implementing the license check. In particular, during the AutoMerge process, we place a copy of the source code here:

pkg_code_path::String
. Then we use it for the license check. We can also use it for further checks, e.g. these src/docs/tests checks.

We could implement it as follows:

  1. Add a dependency on PackageAnalyzer.jl, or vendor whatever code we need from there
  2. Add a "guideline" for each of the checks we want to do in https://github.com/JuliaRegistries/RegistryCI.jl/blob/master/src/AutoMerge/guidelines.jl. This would just be a function that calls analyze(path_to_package_code) and passes/fails based on the results.
  3. Add the guidelines to the list here, after guideline_code_can_be_downloaded (which actually downlaods the code)
  4. Add unit tests for the guideline functions, maybe similar to the license check ones: https://github.com/JuliaRegistries/RegistryCI.jl/blob/master/test/automerge-unit.jl#L512

@ericphanson ericphanson changed the title Proposed automerge guideline: Require at least X lines of Julia code in /src Proposed automerge guideline: Require at least X lines of Julia code in /src, test, docs/readme/docstrings Oct 28, 2022
@MarkNahabedian
Copy link
Contributor

I have a fork with a first cut at this here:
https://github.com/MarkNahabedian/RegistryCI.jl

The various thresholds are specified by environment variables to make them easy to control from the registration workflow. Does that seem reasonable?

I wrote a unit test that works if I run it by hand. From Pkg.test though it gets a file system error.

Even before I made any changes, Pkg.test fails in futime on a directory. I don't know if this is because I'm running on MSWindows.

I'd like to try running the unit tests on GitHub, but I don't know which workflow to enable, or even if GitHub will let me enable them individually.

@ericphanson
Copy link
Member Author

The various thresholds are specified by environment variables to make them easy to control from the registration workflow. Does that seem reasonable?

So far we've mostly handled this via argumetns passed into the "public" functions:

# Keyword Arguments
- `merge_new_packages`: should AutoMerge merge registration PRs for new packages
- `merge_new_versions`: should AutoMerge merge registration PRs for new versions of packages
- `new_package_waiting_period`: new package waiting period, e.g `Day(3)`.
- `new_jll_package_waiting_period`: new JLL package waiting period, e.g `Minute(20)`.
- `new_version_waiting_period`: new package version waiting period, e.g `Minute(10)`.
- `new_jll_version_waiting_period`: new JLL package version waiting period, e.g `Minute(10)`.
- `registry`: the registry name you want to run AutoMerge on.
- `tagbot_enabled`: if tagbot is enabled.
- `authorized_authors`: list of who can submit registration, e.g `String["JuliaRegistrator"]`.
- `authorized_authors_special_jll_exceptions`: a list of users who can submit JLL packages (which have strict rules about allowed dependencies and are subject to `new_jll_*_waiting_period`s instead of `new_*_waiting_period`s).
- `additional_statuses`: list of additional commit statuses that must pass before AutoMerge will merge a PR
- `additional_check_runs`: list of additional check runs that must pass before AutoMerge will merge a PR
- `error_exit_if_automerge_not_applicable`: if `false`, AutoMerge will not error on PRs made by non-AutoMerge-authorized users
- `master_branch`: name of `master_branch`, e.g you may want to specify this to `"main"` for new GitHub repositories.
- `master_branch_is_default_branch`: if `master_branch` specified above is the default branch.
- `suggest_onepointzero`: should the AutoMerge comment include a suggestion to tag a 1.0 release for v0.x.y packages.
- `registry_deps`: list of registry dependencies, e.g your packages may depend on `General`.
- `api_url`: the registry host API URL, default is `"https://api.github.com"`.
- `check_license`: check package has a valid license, default is `false`.
- `public_registries`: If a new package registration has a UUID that matches
that of a package already registered in one of these registries supplied here
(and has either a different name or different URL) then an error will be thrown.
This to prevent AutoMerge from being used for "dependency confusion"
attacks on those registries.
- `read_only`: run in read only mode, default is `false`.

I think ENV variables are mostly reasonable since this is likely to be used from CI environments, but that we shouldn't have 2 ways of configuration, and since we already have configuration via arguments, we should continue to do so. That is, I think we shouldn't use ENV variables here.

I wrote a unit test that works if I run it by hand. From Pkg.test though it gets a file system error.
Even before I made any changes, Pkg.test fails in futime on a directory. I don't know if this is because I'm running on MSWindows.
I'd like to try running the unit tests on GitHub, but I don't know which workflow to enable, or even if GitHub will let me enable them individually.

The easiest way to have the tests run automatically (and the easiest way to get review feedback) is to open a pull request. That will cause this repositories CI to automatically run (except for integration tests, which a maintainer will need to trigger via Bors). Please open a pull request from your repository to this one.

@MarkNahabedian
Copy link
Contributor

MarkNahabedian commented Jan 17, 2023 via email

@ericphanson
Copy link
Member Author

Those names look reasonable to me

@MarkNahabedian
Copy link
Contributor

Pull request: #492

@MarkNahabedian
Copy link
Contributor

I'm stuck. The pre-Julia 1.6 builds are failing because PackageAnalyzer is incopatible. Please suggest how I should work around that. Without thinking, I put a version test in the "applicability"" test for the new guideline, but the failure happens long before then, during cmpatibility checking.

@ericphanson ericphanson linked a pull request Jan 26, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants