Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize Python deps and use pip-compile #7874

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

willbarton
Copy link
Member

@willbarton willbarton commented Jul 19, 2023

This change is to simplify our Python dependency specifications and to use pip-tools's pip-compile to generate hashed requirements files with the full dependency tree from input files. This also performs dependency resolution and will highlight any conflicting versions. The model for this is that of Mozilla's Bedrock project, the Django project that serves mozilla.org.

The new file structure is as follows:

  • deployment.in: requirements to run consumerfinance.gov in any environment
  • test.in: requirements for executing Python tests locally or in CI
  • dev.in: requirements for development work, running, and testing
  • docs.txt: requirements to build the consumerfinance.gov docs.
  • scripts.txt: Requirements for running certain jobs on Jenkins, so scripts can run in Jenkins without having to install all the other requirements.

The contents of base.txt, django.txt, wagtail.txt, and libraries.txt move into deployment.in, which contains the minimum necessary to run consumerfinance.gov. test.in includes tools like coverage, diff_cover, and tox that are required to run our Python tests. docs.in and scripts.in are relatively unchanged, and dev.in combines all of the above to construct a local environment.

These relationships between files now look like this:

flowchart TD
    deployment.in
    test.in
    dev.in
    scripts.in
    docs.in

    deployment.in --> dev.in
    test.in --> dev.in
    scripts.in --> dev.in
Loading

Where previously they looked like this:

flowchart TD
    ci.txt
    docs.txt

    base.txt
    deployment.txt
    django.txt
    libraries.txt
    local.txt
    wagtail.txt

    scripts.txt
    test.txt

    django.txt --> base.txt
    wagtail.txt --> base.txt
    libraries.txt --> base.txt

    base.txt --> deployment.txt

    base.txt --> local.txt
Loading

Notes and todos

If this approach is okay @chosak and anyone else who wants to review:

  • We'll need to update the requirements files in Snyk.
  • We'll need to document the process of updating/adding a requirement and running ./compile-requirements.sh.

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)

@willbarton willbarton force-pushed the pip-tools branch 2 times, most recently from 8c410d4 to ea63f51 Compare July 21, 2023 14:24
docs/index.md Outdated Show resolved Hide resolved
Copy link
Member

@chosak chosak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level, what does this reorganization get us? I get that it gives us SHAs for the packages we install, but what benefit does that provide over the already-pinned version numbers?

Did you look at using pyproject.toml instead, as documented here? That would avoid the need to have so many files floating around.

I do like the collapsing of django.txt and wagtail.txt into the regular requirements file.

requirements/docs.txt Show resolved Hide resolved
@willbarton
Copy link
Member Author

willbarton commented Aug 16, 2023

At a high level, what does this reorganization get us? I get that it gives us SHAs for the packages we install, but what benefit does that provide over the already-pinned version numbers?

The big thing it gets us is dependency resolution with failures when a conflict happens. We get to specify our direct requirements in the .in files, and pip-compile will do concrete resolution of all dependencies these need and pin them in the .txt files (yes, with SHAs, etc). Any conflicts will fail the compilation.

@willbarton willbarton marked this pull request as ready for review August 22, 2023 16:06
@willbarton
Copy link
Member Author

Snyk will fail on this PR until we update the requirements files in Snyk.

When I have an approval I'll do that, before merging.

Copy link
Member

@chosak chosak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level, renaming deployment.txt to prod.txt feels potentially confusing to me since we use that set of deployment requirements for non-production environments. Can we leave it as deployment.txt?

It'd also be nice to have some kind of automation around keeping the .in/.txt files synchronized. I didn't see any obvious mechanism for this in the bedrock repo, so perhaps this hasn't been an issue for them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The create_research_report.groovy file in our internal Jenkins job repo references this file. I don't know if/how that job is being run but it will need to be updated if this file is going away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are a few separate updates that need to be made. I was waiting to see if this even makes sense.

requirements/prod.in Outdated Show resolved Hide resolved
Dockerfile Outdated
@@ -48,7 +48,7 @@ RUN apk add --no-cache --virtual .backend-deps bash curl postgresql

# Install python requirements
COPY requirements requirements
RUN cp -Rfp /build/* /usr/local && rm -Rf /build && pip install -r requirements/local.txt
RUN cp -Rfp /build/* /usr/local && rm -Rf /build && pip install -r requirements/prod.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like prod.txt is being installed twice now?

#!/bin/bash

# ==========================================================================
# Import data from a gzipped dump. Provide the filename as the first arg.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description needs to be updated.

docs/index.md Outdated
Comment on lines 38 to 42
- `prod.in`: requirements for deployment to production and other servers
- `test.in`: requirements for executing Python tests locally or in CI
- `dev.in`: requirements for development work, running, and testing
- `docs.in`: requirements to build the consumerfinance.gov docs.
- `scripts.in`: Requirements for running our smoke test and alert polling scripts without having to install all the other requirements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nitpicky: inconsistent use of periods for these bullets.

- `wagtail.txt`: specifies Wagtail version. In its own file to make it easier to test multiple versions, same as with `django.txt`.
- `prod.in`: requirements for deployment to production and other servers
- `test.in`: requirements for executing Python tests locally or in CI
- `dev.in`: requirements for development work, running, and testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says "testing" but isn't that what test.in is for?

requirements/dev.in Outdated Show resolved Hide resolved
@willbarton
Copy link
Member Author

willbarton commented Sep 5, 2023

At a high level, renaming deployment.txt to prod.txt feels potentially confusing to me since we use that set of deployment requirements for non-production environments. Can we leave it as deployment.txt?

I renamed it because I've switched the order of inheritance around a bit. prod now contains the minimum necessary dependencies for running the site, anywhere we need to run it. Previously, we cobbled that min set together from several different files, the composition of which depended on where we might be running.

I can rename it, but I don't feel like deployment accurately reflects its use?

@willbarton
Copy link
Member Author

willbarton commented Dec 1, 2023

As next steps here, I'm going to work on:

  • Rebasing and getting it up to date
  • Renaming prod.txt back to deployment.txt
  • Using pyproject.toml
  • A GHA to ensure that the .txt files are in sync with the .in files

@willbarton willbarton force-pushed the pip-tools branch 8 times, most recently from 74128b0 to 9db5398 Compare December 4, 2023 13:48
@willbarton
Copy link
Member Author

Using pyproject.toml

I'm not going to do this for now. I like the idea of organizing the dependencies in a single location, but there doesn't seem to be a way to use pyproject.toml without accidentally making this technically python packageable, which could... introduce a lot of unexpected weirdness without a lot more work.

@willbarton
Copy link
Member Author

It's not entirely clear to me why both backend and filter-backend are running. Digging into the filter-backend runs, it's using this workflow file: https://github.com/cfpb/consumerfinance.gov/actions/runs/7087530960/workflow?pr=7874

Whereas the backend run is using this workflow file: https://github.com/cfpb/consumerfinance.gov/actions/runs/7087530958/workflow?pr=7874

So, this looks like filter-backend is being invoked using the old workflow file that does not include requirements/**.in, so the jobs are being skipped. backend is using the workflow file from the PR that adds requirements/**.in, so the jobs in it are not being skipped.

What I don't understand is why filter-backend is using the old workflow file? But... I don't know that this lack of knowledge on my part is a blocker.

@@ -16,7 +16,7 @@ We have six satellite apps that are maintained outside of the consumerfinance.go
- [teachers-digital-platform](https://github.com/cfpb/teachers-digital-platform)

These satellite apps are imported into consumerfinance.gov as part of the project
[requirements files](https://github.com/cfpb/consumerfinance.gov/blob/main/requirements/libraries.txt).
[requirements files](https://github.com/cfpb/consumerfinance.gov/blob/main/requirements/deployment.txt).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this point to deployment.in instead?

@@ -16,7 +16,7 @@ We have six satellite apps that are maintained outside of the consumerfinance.go
- [teachers-digital-platform](https://github.com/cfpb/teachers-digital-platform)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I can let you get away with editing this file without correcting this part above 😄 TDP is no longer a satellite app, neither is ccdb5-ui, so we only have two satellite apps now, not 2.

@@ -50,7 +50,7 @@ want or need to test their work as part of the larger consumerfinance.gov projec

The standard [installation](../installation/) process for consumerfinance.gov
includes whatever versions of these packages are specified in project
[requirements files](https://github.com/cfpb/consumerfinance.gov/blob/main/requirements/libraries.txt).
[requirements files](https://github.com/cfpb/consumerfinance.gov/blob/main/requirements/deployment.txt).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto re: deployment.in.

Comment on lines 1 to 3
django==3.2.23
wagtail==4.2.4

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider merging these into the main list for simplicity? I get that these are the two biggies but we also have things like elasticsearch which are also pretty big...

@@ -22,6 +24,7 @@ elasticsearch<7.11 # Keep pinned to the deployed ES version
govdelivery==1.4.0
Jinja2==3.1.2
lxml==4.9.1
newrelic==8.5.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this into the main list of requirements necessitates rewriting this part of the docs.

@chosak
Copy link
Member

chosak commented Dec 4, 2023

The deployable-zipfile flow needs to be updated since it references deployment.txt here.

(While you're in there it would be nice to also remove these references:

webfonts_path="$cfgov_refresh_volume/static.in/cfgov-fonts"

to the separate fonts directory here because this is no longer needed as of #7025.)

This change is to simplify our Python dependency specifications and to use [pip-tools](https://pypi.org/project/pip-tools/#description)'s `pip-compile` to generate hashed requirements files with the full dependency tree from input files. This also performs dependency resolution and will highlight any conflicting versions. The model for this is that of [Mozilla's Bedrock project, the Django project that serves mozilla.org](https://github.com/mozilla/bedrock).

The new file structure is as follows:

- `deployment.in`: requirements for deployment to production and other servers
- `test.in`: requirements for executing Python tests locally or in CI
- `dev.in`: requirements for development work, running, and testing
- `docs.txt`: requirements to build the consumerfinance.gov docs.
- `scripts.txt`: Requirements for running certain jobs on Jenkins, so scripts can run in Jenkins without having to install all the other requirements.

The contents of `base.txt`, `django.txt`, `wagtail.txt`, and `libraries.txt` move into `deployment.in`, which contains the minimum necessary to run consumerfinance.gov. `test.in` includes tools like `coverage`, `diff_cover`, and `tox` that are required to run our Python tests. `docs.in` and `scripts.in` are relatively unchanged, and `dev.in` combines all of the above to construct a local environment.

These relationships between files now look like this:

```mermaid
flowchart TD
    deployment.in
    test.in
    dev.in
    docs.in
    scripts.in

    deployment.in --> dev.in
    test.in --> dev.in
    scripts.in --> dev.in
```

Where previously they looked like this:

```mermaid
flowchart TD
    ci.txt
    docs.txt

    base.txt
    deployment.txt
    django.txt
    libraries.txt
    local.txt
    wagtail.txt

    scripts.txt
    test.txt

    django.txt --> base.txt
    wagtail.txt --> base.txt
    libraries.txt --> base.txt

    base.txt --> deployment.txt

    base.txt --> local.txt
```
Update tox more
This will check that our requirements are appropriately compiled.
@willbarton
Copy link
Member Author

The deployable-zipfile flow needs to be updated since it references deployment.txt here.

What needs to change here? deployment.txt still exists and should still be the correct file?

This change updates our New Relic docs to remove information that is no longer valid.

It's not clear to me that this entire section is particularly useful to document here though.
@chosak
Copy link
Member

chosak commented Dec 7, 2023

What needs to change here?

My bad - I must have been confused about which files were staying and which were going. Belay that comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants