Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope out how we might organize and vend extract files #2454

Closed
3 tasks done
chouinar opened this issue Oct 11, 2024 · 5 comments
Closed
3 tasks done

Scope out how we might organize and vend extract files #2454

chouinar opened this issue Oct 11, 2024 · 5 comments
Assignees

Comments

@chouinar
Copy link
Collaborator

chouinar commented Oct 11, 2024

Summary

The legacy website provides the ability to download XML extracts of opportunity data: https://www.grants.gov/xml-extract - this is something we should also support. Having extracts (especially if they're just the same data as our search endpoints) reduces the traffic our search endpoint will need to handle as we can point anyone that would want to scrape our data to an extract file.

We already built a quick initial script for generating the files, but don't meaningfully store them anywhere or have a way to look them up - this ticket should focus on figuring that idea out.

A few high-level thoughts:

  • The files will be stored on s3
  • Whatever approach we use for vending other files (s3 presigned URLs or a CDN link) we should use here as well
  • We'll likely want a table that stores what files we've generated as well as an API that can query that table to get the latest records
  • We may have multiple file formats for a given extract batch (csv and json for now)
  • It's plausible that in the future we will have more extracts, try to come up with an approach that could handle multiple sets of files (ie. the endpoint might require you to tell it what files you want to lookup)

Acceptance criteria

  • Scope out a plan
  • Run the idea past design, even if they don't quite yet have time for designs, make sure the idea of a page for extracts that they have matches what we might do for an API
  • Write up the follow-up implementation tickets
@mikehgrantsgov
Copy link
Collaborator

Here are some tickets that can spawned from this:

DB:
Create a table to track generated files, including metadata like file type, creation date, batch ID?
API:
Build an API to query the database for the latest file records.
Note: support multiple file formats (CSV, JSON) and explore future formats.
Design:
Placeholder to collaborate with design to ensure the user interface aligns.
Future scalability spike?
Plan for future expansion to handle multiple sets of files and additional formats?

@mikehgrantsgov mikehgrantsgov self-assigned this Nov 5, 2024
@mikehgrantsgov
Copy link
Collaborator

@chouinar should the tickets above be created? I can create them if you think we are ready. Anyone/team should be tagged for input beforehand?

@chouinar
Copy link
Collaborator Author

chouinar commented Nov 7, 2024

@mikehgrantsgov - go ahead and create the tickets, some of the design work will be a TODO, but can't hurt to set them up

@widal001
Copy link
Collaborator

widal001 commented Nov 8, 2024

Beep boop: Automatically setting the point and sprint values for this issue in project HHS/13 because they were unset when the issue was closed.

@mikehgrantsgov
Copy link
Collaborator

Created the following:
#2791
#2792
#2793
#2794
#2795

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants