ADR for Python Analytics Processing #1418

kcirtapfromspace · 2023-03-14T17:52:55Z

Description of change

ADR for Python Analytics Processing

How to test

Read, review

Issue(s)

https://ocio-jira.acf.hhs.gov/browse/TTAHUB-0

Checklists

Every PR

Meets issue criteria
JIRA ticket status updated
Code is meaningfully tested
Meets accessibility standards (WCAG 2.1 Levels A, AA)
API Documentation updated
Boundary diagram updated
Logical Data Model updated
Architectural Decision Records written for major infrastructure decisions
UI review complete

Production Deploy

Staging smoke test completed

After merge/deploy

Update JIRA ticket status

thewatermethod · 2023-03-16T18:53:37Z

docs/adr/0017-python-anlaytics-processing.md

+
+## Alternatives Considered
+
+We have considered several other alternatives, such as using a serverless architecture, deploying the code as a standalone service, and using Kubernetes. However, we have rejected these alternatives for reasons such as complexity, cost, and mainly lack of compatibility with cloud.gov and its technical limitations. There is also the option of using a different cloud provider which would reduce friction, but we have decided to stick with cloud.gov as the boundary would be too large to change at this point. We have also considered using a different programming language, such as R, but we have decided to stick with Python as it is the language that the model was developed in and offers the most flexibility in terms of deployment options.


Curious about whether or not we think using GovCloud just for this would involve ATO. I presume that it wouldn't. If that's the case, then spinning up a separate server or leveraging a lambda over there entirely is something we eliminated because of the other reasons mentioned? I.E. complexity or boundary changing?

It feels to me like a much more straightforward way to go about this (although it might be more work up front), at least in part because we want to end up there at the end of the day.

docs/adr/0017-python-anlaytics-processing.md

thewatermethod · 2023-03-16T18:57:45Z

docs/adr/0017-python-anlaytics-processing.md

+              --var SESSION_SECRET=${<< parameters.session_secret >>}
+              ...
+```
+### Node Child Processes


My only thought on the two options is that child processes make me kind of uneasy, but that could be unfounded because I can't really elaborate why. As long as it is doable resource wise, it may be the easiest way to proceed.

My biggest concern here (if it is valid) would be the shared memory footprint.

I'm not a fan of managing an additional set of package dependencies that may or may not persist.

kryswisnaskas · 2023-03-16T19:05:45Z

Nicely written!

kryswisnaskas · 2023-03-16T19:04:14Z

docs/adr/0017-python-anlaytics-processing.md

+* Added responsibility for all security updates and bug fixes.
+* More compliance responsibility means more work.
+* Increases boundary between the application and the infrastructure.
+* May need additional ATO for Docker container.


We can point out that by establishing only private routes between the application and the Docker container, this risk can be mitigated.

AdamAdHocTeam · 2023-03-16T21:37:24Z

docs/adr/0017-python-anlaytics-processing.md

+* Can use existing Node.js infrastructure.
+
+**Cons**
+


Would this approach share the same memory as our App.

IE if the app is limited to 2gb, that would be shared between the app and the child spawn?

AdamAdHocTeam · 2023-03-16T21:38:01Z

docs/adr/0017-python-anlaytics-processing.md

+* Reliable and scalable option for our production environment.
+* Can build container images and run containers on local workstation.
+Fine-grained control over compilation and root filesystem.
+* Docker containers are portable and can be easily moved between environments.


Would have its own memory allocation.

AdamAdHocTeam · 2023-03-16T21:43:29Z

docs/adr/0017-python-anlaytics-processing.md

+
+## Context
+
+Our team has developed a natural language processing model for the identifying duplicate goals using python. We need to productionize this model and deploy it to our cloud infrastructure on cloud.gov, which has specific technical limitations. We have two design options to consider - containerizing the Python code and deploying it using cloud.govs underlining technology through Cloud Foundry's Docker support, or using Node child processes to wrap execution Python code on one of the workers.


Not sure if you want to break this up a bit:

From:
We have two design options to consider - containerizing the Python code and deploying it using cloud.govs underlining technology through Cloud Foundry's Docker support, or using Node child processes to wrap execution Python code on one of the workers.

To:
We have two design options to consider - The first is containerizing the Python code and deploying it using cloud.gov's underlining technology in Cloud Foundry. The second option would be to spawn a node child processes to execute the python code.

or something with this breakdown

kryswisnaskas · 2023-03-22T21:58:46Z

I am fine with either solution. The next step would be to pick one and update the boundry diagram to start conversations with the IPT board.

thewatermethod · 2024-02-09T20:27:22Z

Should this be merged into our repo?

nvms · 2024-07-12T15:23:41Z

Let's update the conclusion with the path we've chosen and get this merged in 🙏

add ADR for python analytics processing

29c451f

thewatermethod reviewed Mar 16, 2023

View reviewed changes

docs/adr/0017-python-anlaytics-processing.md Outdated Show resolved Hide resolved

thewatermethod reviewed Mar 16, 2023

View reviewed changes

kryswisnaskas reviewed Mar 16, 2023

View reviewed changes

kryswisnaskas requested review from AdamAdHocTeam, GarrettEHill, hardwarehuman and nvms March 16, 2023 21:24

AdamAdHocTeam reviewed Mar 16, 2023

View reviewed changes

kryswisnaskas approved these changes Mar 22, 2023

View reviewed changes

Down select to 1 option for NLP processing

ba8fb66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR for Python Analytics Processing #1418

ADR for Python Analytics Processing #1418

kcirtapfromspace commented Mar 14, 2023

thewatermethod Mar 16, 2023

thewatermethod Mar 16, 2023

AdamAdHocTeam Mar 16, 2023

kcirtapfromspace Mar 17, 2023

kryswisnaskas commented Mar 16, 2023

kryswisnaskas Mar 16, 2023

AdamAdHocTeam Mar 16, 2023

AdamAdHocTeam Mar 16, 2023

AdamAdHocTeam Mar 16, 2023

kryswisnaskas commented Mar 22, 2023

thewatermethod commented Feb 9, 2024

nvms commented Jul 12, 2024


		## Alternatives Considered

		We have considered several other alternatives, such as using a serverless architecture, deploying the code as a standalone service, and using Kubernetes. However, we have rejected these alternatives for reasons such as complexity, cost, and mainly lack of compatibility with cloud.gov and its technical limitations. There is also the option of using a different cloud provider which would reduce friction, but we have decided to stick with cloud.gov as the boundary would be too large to change at this point. We have also considered using a different programming language, such as R, but we have decided to stick with Python as it is the language that the model was developed in and offers the most flexibility in terms of deployment options.


		## Context

		Our team has developed a natural language processing model for the identifying duplicate goals using python. We need to productionize this model and deploy it to our cloud infrastructure on cloud.gov, which has specific technical limitations. We have two design options to consider - containerizing the Python code and deploying it using cloud.govs underlining technology through Cloud Foundry's Docker support, or using Node child processes to wrap execution Python code on one of the workers.

ADR for Python Analytics Processing #1418

Are you sure you want to change the base?

ADR for Python Analytics Processing #1418

Conversation

kcirtapfromspace commented Mar 14, 2023

Description of change

How to test

Issue(s)

Checklists

Every PR

Production Deploy

After merge/deploy

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kryswisnaskas commented Mar 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kryswisnaskas commented Mar 22, 2023

thewatermethod commented Feb 9, 2024

nvms commented Jul 12, 2024