Skip to content

Commit 2c9d075

Browse files
authored
[Issue #2277] Document storage ADR (#2533)
## Summary Fixes #2277 ### Time to review: __5 mins__ ## Changes proposed Document our decision process for file storage for documents associated with Opportunities, NOFOs and otherwise.
1 parent 8233cc7 commit 2c9d075

File tree

2 files changed

+94
-0
lines changed

2 files changed

+94
-0
lines changed

documentation/wiki/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,7 @@
116116
* [Dashboard Data Storage](decisions/adr/2024-03-19-dashboard-storage.md)
117117
* [Dashboard Data Tool](decisions/adr/2024-04-10-dashboard-tool.md)
118118
* [Search Engine](decisions/adr/2024-10-02-search-engine.md)
119+
* [Document Storage](decisions/adr/2024-10-18-document-storage.md)
119120
* [Infra](decisions/infra/README.md)
120121
* [Use markdown architectural decision records](decisions/infra/0000-use-markdown-architectural-decision-records.md)
121122
* [CI/CD interface](decisions/infra/0001-ci-cd-interface.md)
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Simpler needs to store and lifecycle documents associated with Opportunities, which we will do via AWS S3 buckets
2+
3+
- **Status:** Active<!-- REQUIRED -->
4+
- **Last Modified:** 2024-10-18 <!-- REQUIRED -->
5+
- **Related Issue:** [#2277](https://github.com/HHS/simpler-grants-gov/issues/2277) <!-- RECOMMENDED -->
6+
- **Deciders:** Matt Dragon, Aaron Couch, Kai Siren, Michael Chouinard, Lucas Brown<!-- REQUIRED -->
7+
- **Tags:** nofo, document, attachment, s3, storage <!-- OPTIONAL -->
8+
9+
## Context and Problem Statement
10+
11+
Opportunities include supporting documents that help define the opportunity, provide more instructions about applying, or otherwise supplement the Opportunity Listing. These documents represent individual files, sometimes within a folder/directory hierarchy that are provided to the Grant Seekers as a single Zip download currently. Among these files is one very special file, the Notice of Funding Opportunity (NOFO) that every Opportunity must publish.
12+
13+
## Decision Drivers <!-- RECOMMENDED -->
14+
15+
- Use the AWS and Nava platforms whenever feasible
16+
- Minimize cost per file (there will be a large number of files, but most will be rarely, if ever, accessed once the Opportunity closes)
17+
- Ease of processing - Use the best tools, with already supported libraries as they're intended
18+
- Follow best practices
19+
20+
## Options Considered
21+
22+
- [AWS Simple Storage Service (S3)](#aws-simple-storage-service-s3-buckets) bucket(s)
23+
- [Store files in PostgreSQL](#store-files-in-postgresql)
24+
- [Other off-the-shelf or homegrown storage solution](#other-off-the-shelf-or-homegrown-storage-solution)
25+
26+
## Decision Outcome <!-- REQUIRED -->
27+
28+
Chosen option: "AWS S3", because it represents the lowest Total Cost of Ownership (TCO) and industry best practices, including baked in support for access control for files, backups, etc. We will use 2 buckets, one for Published Opportunities, and one for DRAFT Opportunities. When an Opportunity is Published we'll ensure that the associated documents are copied to the Published bucket, making them accessible to the general public. Prior to publishing the documents will be accounted for in S3, so that the file storage is consistent throughout the lifecycle, but only the Publishing Service will have access to the files, ensuring they are not released to the public before the Opportunity and they can be revoked from public view if the Opportunity is accidentally Published.
29+
30+
### Positive Consequences <!-- OPTIONAL -->
31+
32+
- Cost can be managed in trade off with performance profile of requests for files
33+
- Directly integrates with the AWS Content Delivery Network(CDN), CloudFront
34+
- Existing tooling/API allows for manipulation of files from the Publishing System and manually if needed.
35+
- S3 API is standard mimicked/supported by other cloud storage providers if we ever wanted to move these files elsewhere.
36+
37+
### Negative Consequences <!-- OPTIONAL -->
38+
39+
- Requires it's own management if we wanted to sync the files to another environment
40+
- Disconnects the lifecycle from data in the DB, so any archiving/deleting of files doesn't happen automatically
41+
42+
## Pros and Cons of the Options <!-- OPTIONAL -->
43+
44+
### AWS Simple Storage Service (S3) bucket(s)
45+
46+
Utilize the AWS S3 Service to store/host files. This problem is precisely what S3 was built to solve. It provides strong tooling, monitoring, logging, all built and ready to use. We can architect in such a way that files get scanned before being placed in the final bucket, and get very fine grained support for file versioning, backups, lifecycle, etc. <!-- OPTIONAL -->
47+
48+
- **Pros**
49+
- Cost can be managed in trade off with performance
50+
- Integrates with AWS Content Delivery Network(CDN), CloudFront
51+
- Existing tooling/API
52+
- S3 API is standard mimicked/supported by other cloud storage providers if we ever wanted to move these files elsewhere.
53+
- Built in support for auto-expiring links (which we want at least in the near term until we come up with more of a final structure/naming strategy)
54+
- **Cons**
55+
- Another separate resource to manage if we're trying to sync/simulate Prod with other environments
56+
57+
### Store files in PostgreSQL
58+
59+
The existing system stores the contents of the files in the Oracle DB. This is also possible in PostgreSQL <!-- OPTIONAL -->
60+
61+
- **Pros**
62+
- Single data source to backup, move between environments, etc.
63+
- Simplified architecture as all communication is just with the DB server
64+
- Files and DB records share the same lifecycle so full end-to-end delete/clean up is easier
65+
- **Cons**
66+
- Bloats the DB with file storage which likely will rapidly outpace proper DB row storage
67+
- Makes the DB a bigger performance bottleneck as it's now handling both app data and file storage/serving responsibilities
68+
- Difficult if not impossible to virus/malware scan files stored in this way
69+
- Makes backups more costly and difficult to move around due to increased size
70+
71+
### Other off-the-shelf or homegrown storage solution
72+
73+
Implement an existing off-the-shelf file storage server or build our own<!-- OPTIONAL -->
74+
75+
- **Pros**
76+
- If we built our own it would be a custom fit, do exactly what we needed and nothing more
77+
- Off-the-Shelf might be cheaper
78+
- **Cons**
79+
- Off-the-Shelf
80+
- We own everything, storage redundancy, security, patching/upgrades, Ops
81+
- Additional vendor contract, security assessment, relationship to manage
82+
- Data leaves our AWS VPC Secure Environment
83+
- Roll our own
84+
- This isn't the core value of the system that justifies building our own
85+
86+
## Links <!-- OPTIONAL -->
87+
88+
- [AWS Simple Storage Service (S3)](https://aws.amazon.com/s3/)
89+
- Alternatives
90+
- [Ceph](https://ceph.io/en/)
91+
- [Backblaze B2](https://www.backblaze.com/cloud-storage)
92+
- [Wasabi Hot Cloud Storage](https://wasabi.com/cloud-object-storage)
93+
- [List of Alternatives](https://medium.com/@paulgoll/aws-s3-alternatives-in-2024-3918651f77d9)

0 commit comments

Comments
 (0)