Proposed automerge guideline: Require at least `X` lines of Julia code in `/src`, `test`, `docs`/readme/docstrings #351

ericphanson · 2021-03-08T20:52:59Z

X=5 was suggested on Slack by @oxinabox.

If anyone thinks we should not have such a check or would like to bikeshed the value of X, please comment here! (Would rather get that out of the way before implementing it...)

Lyndon also suggested a nice error message:

Error message can be something that's says:

you seem to have registered a near empty package. Did you forget to push your commits?

Name squatting (E.g. for a package you plan to write later) is no permitted.
(Also need to add this to the readme)
please reregregister once the package contains usable functionality.
If we have detected this incorrectly (e.g. you package is short but useful) please comment for manual merge.

The text was updated successfully, but these errors were encountered:

ericphanson · 2022-10-28T11:28:23Z

I think we should do all 3 of:

src must have at least X lines of Julia code
test has a runtests.jl file and at least Y lines of code
readme or docs or docstrings have at least Z lines of text or code

Currently, I haven't found a source-code counter that works on Julia docstrings correctly, so I think we could omit the docs one to start.

In terms of the numbers, on Slack @rmsrosa suggested X > 5, I suggested 10 for all of them, @StefanKarpinski suggested 20, and @GunnarFarneback suggested calibrating against the existing packages to see the cutoffs.

In https://julialang.org/blog/2021/08/general-survey/ we did some of that but we don't have all the info we'd want there, and also the number of packages has grown a lot since then. But if we do go off of those numbers, only 84% of packages have at least 20 lines of tests, and 89% have at least 10 lines. We would have to look at some examples to see if those are "OK" or should be considered "non-compliant" with the intention of the new rules.

I think also we should require this only for new packages (not new versions). Why?

New packages are the most likely violators
New versions generally add lines overall rather than remove them
If we want to manually override the check (e.g. for some light wrapper package, say), then we only have to do it once, for the new package registration, rather than for every version

IMO we could add new versions later, if we figure out an "overrides" system to persist an allowlist that a package is OK that it doesn't have enough lines of src/test/whatever. But to start with I think new packages is already something useful.

DilumAluthge · 2022-10-28T11:30:32Z

Yeah, new packages are the most common place where we notice people registering essentially empty packages.

ericphanson · 2022-10-28T11:35:23Z

In terms of implementation, I already started preparing for this when implementing the license check. In particular, during the AutoMerge process, we place a copy of the source code here:

RegistryCI.jl/src/AutoMerge/types.jl

Line 90 in 0c50828

pkg_code_path::String

. Then we use it for the license check. We can also use it for further checks, e.g. these src/docs/tests checks.

We could implement it as follows:

Add a dependency on PackageAnalyzer.jl, or vendor whatever code we need from there
Add a "guideline" for each of the checks we want to do in https://github.com/JuliaRegistries/RegistryCI.jl/blob/master/src/AutoMerge/guidelines.jl. This would just be a function that calls analyze(path_to_package_code) and passes/fails based on the results.
Add the guidelines to the list here, after guideline_code_can_be_downloaded (which actually downlaods the code)
Add unit tests for the guideline functions, maybe similar to the license check ones: https://github.com/JuliaRegistries/RegistryCI.jl/blob/master/test/automerge-unit.jl#L512

MarkNahabedian · 2023-01-17T15:15:12Z

I have a fork with a first cut at this here:
https://github.com/MarkNahabedian/RegistryCI.jl

The various thresholds are specified by environment variables to make them easy to control from the registration workflow. Does that seem reasonable?

I wrote a unit test that works if I run it by hand. From Pkg.test though it gets a file system error.

Even before I made any changes, Pkg.test fails in futime on a directory. I don't know if this is because I'm running on MSWindows.

I'd like to try running the unit tests on GitHub, but I don't know which workflow to enable, or even if GitHub will let me enable them individually.

ericphanson · 2023-01-17T15:33:51Z

The various thresholds are specified by environment variables to make them easy to control from the registration workflow. Does that seem reasonable?

So far we've mostly handled this via argumetns passed into the "public" functions:

RegistryCI.jl/src/AutoMerge/public.jl

Lines 12 to 38 in 2a2fe14

    
           # Keyword Arguments 
        
           - `merge_new_packages`: should AutoMerge merge registration PRs for new packages 
        
           - `merge_new_versions`: should AutoMerge merge registration PRs for new versions of packages 
        
           - `new_package_waiting_period`: new package waiting period, e.g `Day(3)`. 
        
           - `new_jll_package_waiting_period`: new JLL package waiting period, e.g `Minute(20)`. 
        
           - `new_version_waiting_period`: new package version waiting period, e.g `Minute(10)`. 
        
           - `new_jll_version_waiting_period`: new JLL package version waiting period, e.g `Minute(10)`. 
        
           - `registry`: the registry name you want to run AutoMerge on. 
        
           - `tagbot_enabled`: if tagbot is enabled. 
        
           - `authorized_authors`: list of who can submit registration, e.g `String["JuliaRegistrator"]`. 
        
           - `authorized_authors_special_jll_exceptions`: a list of users who can submit JLL packages (which have strict rules about allowed dependencies and are subject to `new_jll_*_waiting_period`s instead of `new_*_waiting_period`s). 
        
           - `additional_statuses`: list of additional commit statuses that must pass before AutoMerge will merge a PR 
        
           - `additional_check_runs`: list of additional check runs that must pass before AutoMerge will merge a PR 
        
           - `error_exit_if_automerge_not_applicable`: if `false`, AutoMerge will not error on PRs made by non-AutoMerge-authorized users 
        
           - `master_branch`: name of `master_branch`, e.g you may want to specify this to `"main"` for new GitHub repositories. 
        
           - `master_branch_is_default_branch`: if `master_branch` specified above is the default branch. 
        
           - `suggest_onepointzero`: should the AutoMerge comment include a suggestion to tag a 1.0 release for v0.x.y packages. 
        
           - `registry_deps`: list of registry dependencies, e.g your packages may depend on `General`. 
        
           - `api_url`: the registry host API URL, default is `"https://api.github.com"`. 
        
           - `check_license`: check package has a valid license, default is `false`. 
        
           - `public_registries`: If a new package registration has a UUID that matches 
        
              that of a package already registered in one of these registries supplied here 
        
              (and has either a different name or different URL) then an error will be thrown. 
        
              This to prevent AutoMerge from being used for "dependency confusion" 
        
              attacks on those registries. 
        
           - `read_only`: run in read only mode, default is `false`.

I think ENV variables are mostly reasonable since this is likely to be used from CI environments, but that we shouldn't have 2 ways of configuration, and since we already have configuration via arguments, we should continue to do so. That is, I think we shouldn't use ENV variables here.

I wrote a unit test that works if I run it by hand. From Pkg.test though it gets a file system error.
Even before I made any changes, Pkg.test fails in futime on a directory. I don't know if this is because I'm running on MSWindows.
I'd like to try running the unit tests on GitHub, but I don't know which workflow to enable, or even if GitHub will let me enable them individually.

The easiest way to have the tests run automatically (and the easiest way to get review feedback) is to open a pull request. That will cause this repositories CI to automatically run (except for integration tests, which a maintainer will need to trigger via Bors). Please open a pull request from your repository to this one.

MarkNahabedian · 2023-01-17T18:36:50Z

Before I make those changes, are the parameter names I picked ok? I'll lower case them if they're to be command line arguments. From the doc string: Make sure that various line counts meet or exceed specified thresholds. Thresholds are controlled by these environment variables: * SRC_MIN_LINES: Minimum number of lines of source code * README_MIN_LINES: % Minimum number of lines in the README file * TEST_MIN_LINES: % Minimum number of lines of code in the test directory * DOC_MIN_LINES: # Minimum number of lines of documentation Those marked with a % can be expressed as a count, or as a percentage of the number of lines of source code. For TEST_MIN_LINES and DOC_MIN_LINES, the denominatttor of the percentage also includes the number of lines of test code or doc code respectively. On Jan 17, 2023 10:34 AM, Eric Hanson ***@***.***> wrote: The various thresholds are specified by environment variables to make them easy to control from the registration workflow. Does that seem reasonable? So far we've mostly handled this via argumetns passed into the "public" functions: https://github.com/JuliaRegistries/RegistryCI.jl/blob/2a2fe14f60751694546ef1b9148b7be27b93ebec/src/AutoMerge/public.jl#L12-L38 I think ENV variables are mostly reasonable since this is likely to be used from CI environments, but that we shouldn't have 2 ways of configuration, and since we already have configuration via arguments, we should continue to do so. That is, I think we shouldn't use ENV variables here. I wrote a unit test that works if I run it by hand. From Pkg.test though it gets a file system error. Even before I made any changes, Pkg.test fails in futime on a directory. I don't know if this is because I'm running on MSWindows. I'd like to try running the unit tests on GitHub, but I don't know which workflow to enable, or even if GitHub will let me enable them individually. The easiest way to have the tests run automatically (and the easiest way to get review feedback) is to open a pull request<https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests>. That will cause this repositories CI to automatically run (except for integration tests, which a maintainer will need to trigger via Bors). Please open a pull request from your repository to this one. — Reply to this email directly, view it on GitHub<#351 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIBY3EZWNZSEX3S4L4TS2UTWS23WVANCNFSM4Y2DVBGQ>. You are receiving this because you commented.Message ID: ***@***.***>

ericphanson · 2023-01-24T14:09:17Z

Those names look reasonable to me

MarkNahabedian · 2023-01-24T20:15:01Z

Pull request: #492

MarkNahabedian · 2023-01-25T20:49:18Z

I'm stuck. The pre-Julia 1.6 builds are failing because PackageAnalyzer is incopatible. Please suggest how I should work around that. Without thinking, I put a version test in the "applicability"" test for the new guideline, but the failure happens long before then, during cmpatibility checking.

ericphanson mentioned this issue Mar 8, 2021

CI should require presence of test/runtest.jl for new packages #313

Open

ericphanson changed the title ~~Proposed automerge guideline: Require at least X lines of Julia code in /src~~ Proposed automerge guideline: Require at least X lines of Julia code in /src, test, docs/readme/docstrings Oct 28, 2022

ericphanson linked a pull request Jan 26, 2023 that will close this issue

Check that there are "enough" tests and documentation #492

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed automerge guideline: Require at least `X` lines of Julia code in `/src`, `test`, `docs`/readme/docstrings #351

Proposed automerge guideline: Require at least `X` lines of Julia code in `/src`, `test`, `docs`/readme/docstrings #351

ericphanson commented Mar 8, 2021 •

edited

Loading

ericphanson commented Oct 28, 2022 •

edited

Loading

DilumAluthge commented Oct 28, 2022

ericphanson commented Oct 28, 2022

MarkNahabedian commented Jan 17, 2023

ericphanson commented Jan 17, 2023

MarkNahabedian commented Jan 17, 2023 via email

ericphanson commented Jan 24, 2023

MarkNahabedian commented Jan 24, 2023

MarkNahabedian commented Jan 25, 2023

Proposed automerge guideline: Require at least X lines of Julia code in /src, test, docs/readme/docstrings #351

Proposed automerge guideline: Require at least X lines of Julia code in /src, test, docs/readme/docstrings #351

Comments

ericphanson commented Mar 8, 2021 • edited Loading

ericphanson commented Oct 28, 2022 • edited Loading

DilumAluthge commented Oct 28, 2022

ericphanson commented Oct 28, 2022

MarkNahabedian commented Jan 17, 2023

ericphanson commented Jan 17, 2023

MarkNahabedian commented Jan 17, 2023 via email

ericphanson commented Jan 24, 2023

MarkNahabedian commented Jan 24, 2023

MarkNahabedian commented Jan 25, 2023

Proposed automerge guideline: Require at least `X` lines of Julia code in `/src`, `test`, `docs`/readme/docstrings #351

Proposed automerge guideline: Require at least `X` lines of Julia code in `/src`, `test`, `docs`/readme/docstrings #351

ericphanson commented Mar 8, 2021 •

edited

Loading

ericphanson commented Oct 28, 2022 •

edited

Loading