Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add big codebases that are not formatted with Black #5

Open
JelleZijlstra opened this issue Jan 14, 2022 · 4 comments
Open

Add big codebases that are not formatted with Black #5

JelleZijlstra opened this issue Jan 14, 2022 · 4 comments
Labels
C: projects type: enhancement New feature or request

Comments

@JelleZijlstra
Copy link
Collaborator

Currently we run diff-shades on repos that are already formatted with Black, but I think it would be useful to add some non-Blackened projects, so we get a sense of how we format code that isn't already mostly Black-like. This is especially relevant for functionality that interacts with the magic trailing comma, like psf/black#2368.

Some ideas:

It looks like diff-shades might get slow enough that we should be selective in what we include.

@ichard26
Copy link
Owner

We could steal even more ideas from mypy's integration of mypy-primer which is to spread out the projects over multiple parallel jobs but that'd introduce a fair amount of complexity (and would worsen the number of jobs issue psf/black already has).

@ichard26
Copy link
Owner

Another way to deal with the slowness would be to compile the baseline/target revision before running it against the projects. Assuming perfect efficiency this could save 6-10 minutes for an uncached run or 3-5 minutes for a run where the baseline was cached. This effectively means we are 100% trusting mypyc but I guess we were heading in that direction already :)

@ichard26 ichard26 added type: enhancement New feature or request C: projects labels Jan 14, 2022
@ichard26
Copy link
Owner

One more optimization I thought about last night was to reuse the code stored in the analyses instead of recloning the projects. This wouldn't help runs where the baseline wasn't cached but this would help quick iterations on PRs saving 20 to 60 seconds depending on network performance. I'd have to store the pyproject.toml contents too but that's relatively simple compared to more parallelism.

@ichard26
Copy link
Owner

For what it's worth, we probably want to avoid any projects beyond half a million lines of code since they are slow to format and start to worsen the signal to noise / time ratio. IMO maintaining / improving variety is more important. If we really want to add bigger projects we might want to cut down on the line count using --exclude.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: projects type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants