Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR for Python Analytics Processing #1418

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kcirtapfromspace
Copy link
Contributor

Description of change

ADR for Python Analytics Processing

How to test

Read, review

Issue(s)

Checklists

Every PR

  • Meets issue criteria
  • JIRA ticket status updated
  • Code is meaningfully tested
  • Meets accessibility standards (WCAG 2.1 Levels A, AA)
  • API Documentation updated
  • Boundary diagram updated
  • Logical Data Model updated
  • Architectural Decision Records written for major infrastructure decisions
  • UI review complete

Production Deploy

  • Staging smoke test completed

After merge/deploy

  • Update JIRA ticket status


## Alternatives Considered

We have considered several other alternatives, such as using a serverless architecture, deploying the code as a standalone service, and using Kubernetes. However, we have rejected these alternatives for reasons such as complexity, cost, and mainly lack of compatibility with cloud.gov and its technical limitations. There is also the option of using a different cloud provider which would reduce friction, but we have decided to stick with cloud.gov as the boundary would be too large to change at this point. We have also considered using a different programming language, such as R, but we have decided to stick with Python as it is the language that the model was developed in and offers the most flexibility in terms of deployment options.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about whether or not we think using GovCloud just for this would involve ATO. I presume that it wouldn't. If that's the case, then spinning up a separate server or leveraging a lambda over there entirely is something we eliminated because of the other reasons mentioned? I.E. complexity or boundary changing?

It feels to me like a much more straightforward way to go about this (although it might be more work up front), at least in part because we want to end up there at the end of the day.

--var SESSION_SECRET=${<< parameters.session_secret >>}
...
```
### Node Child Processes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only thought on the two options is that child processes make me kind of uneasy, but that could be unfounded because I can't really elaborate why. As long as it is doable resource wise, it may be the easiest way to proceed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest concern here (if it is valid) would be the shared memory footprint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of managing an additional set of package dependencies that may or may not persist.

@kryswisnaskas
Copy link
Collaborator

Nicely written!

* Added responsibility for all security updates and bug fixes.
* More compliance responsibility means more work.
* Increases boundary between the application and the infrastructure.
* May need additional ATO for Docker container.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can point out that by establishing only private routes between the application and the Docker container, this risk can be mitigated.

* Can use existing Node.js infrastructure.

**Cons**

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this approach share the same memory as our App.

IE if the app is limited to 2gb, that would be shared between the app and the child spawn?

* Reliable and scalable option for our production environment.
* Can build container images and run containers on local workstation.
Fine-grained control over compilation and root filesystem.
* Docker containers are portable and can be easily moved between environments.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would have its own memory allocation.


## Context

Our team has developed a natural language processing model for the identifying duplicate goals using python. We need to productionize this model and deploy it to our cloud infrastructure on cloud.gov, which has specific technical limitations. We have two design options to consider - containerizing the Python code and deploying it using cloud.govs underlining technology through Cloud Foundry's Docker support, or using Node child processes to wrap execution Python code on one of the workers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you want to break this up a bit:

From:
We have two design options to consider - containerizing the Python code and deploying it using cloud.govs underlining technology through Cloud Foundry's Docker support, or using Node child processes to wrap execution Python code on one of the workers.

To:
We have two design options to consider - The first is containerizing the Python code and deploying it using cloud.gov's underlining technology in Cloud Foundry. The second option would be to spawn a node child processes to execute the python code.

or something with this breakdown

@kryswisnaskas
Copy link
Collaborator

I am fine with either solution. The next step would be to pick one and update the boundry diagram to start conversations with the IPT board.

@thewatermethod
Copy link
Collaborator

Should this be merged into our repo?

@nvms
Copy link
Collaborator

nvms commented Jul 12, 2024

Let's update the conclusion with the path we've chosen and get this merged in 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants