-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADR for Python Analytics Processing #1418
base: main
Are you sure you want to change the base?
Conversation
|
||
## Alternatives Considered | ||
|
||
We have considered several other alternatives, such as using a serverless architecture, deploying the code as a standalone service, and using Kubernetes. However, we have rejected these alternatives for reasons such as complexity, cost, and mainly lack of compatibility with cloud.gov and its technical limitations. There is also the option of using a different cloud provider which would reduce friction, but we have decided to stick with cloud.gov as the boundary would be too large to change at this point. We have also considered using a different programming language, such as R, but we have decided to stick with Python as it is the language that the model was developed in and offers the most flexibility in terms of deployment options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about whether or not we think using GovCloud just for this would involve ATO. I presume that it wouldn't. If that's the case, then spinning up a separate server or leveraging a lambda over there entirely is something we eliminated because of the other reasons mentioned? I.E. complexity or boundary changing?
It feels to me like a much more straightforward way to go about this (although it might be more work up front), at least in part because we want to end up there at the end of the day.
--var SESSION_SECRET=${<< parameters.session_secret >>} | ||
... | ||
``` | ||
### Node Child Processes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only thought on the two options is that child processes make me kind of uneasy, but that could be unfounded because I can't really elaborate why. As long as it is doable resource wise, it may be the easiest way to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My biggest concern here (if it is valid) would be the shared memory footprint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of managing an additional set of package dependencies that may or may not persist.
Nicely written! |
* Added responsibility for all security updates and bug fixes. | ||
* More compliance responsibility means more work. | ||
* Increases boundary between the application and the infrastructure. | ||
* May need additional ATO for Docker container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can point out that by establishing only private routes between the application and the Docker container, this risk can be mitigated.
* Can use existing Node.js infrastructure. | ||
|
||
**Cons** | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this approach share the same memory as our App.
IE if the app is limited to 2gb, that would be shared between the app and the child spawn?
* Reliable and scalable option for our production environment. | ||
* Can build container images and run containers on local workstation. | ||
Fine-grained control over compilation and root filesystem. | ||
* Docker containers are portable and can be easily moved between environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would have its own memory allocation.
|
||
## Context | ||
|
||
Our team has developed a natural language processing model for the identifying duplicate goals using python. We need to productionize this model and deploy it to our cloud infrastructure on cloud.gov, which has specific technical limitations. We have two design options to consider - containerizing the Python code and deploying it using cloud.govs underlining technology through Cloud Foundry's Docker support, or using Node child processes to wrap execution Python code on one of the workers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you want to break this up a bit:
From:
We have two design options to consider - containerizing the Python code and deploying it using cloud.govs underlining technology through Cloud Foundry's Docker support, or using Node child processes to wrap execution Python code on one of the workers.
To:
We have two design options to consider - The first is containerizing the Python code and deploying it using cloud.gov's underlining technology in Cloud Foundry. The second option would be to spawn a node child processes to execute the python code.
or something with this breakdown
I am fine with either solution. The next step would be to pick one and update the boundry diagram to start conversations with the IPT board. |
Should this be merged into our repo? |
Let's update the conclusion with the path we've chosen and get this merged in 🙏 |
Description of change
ADR for Python Analytics Processing
How to test
Read, review
Issue(s)
Checklists
Every PR
Production Deploy
After merge/deploy