Load attachments into the search index #3321

chouinar · 2024-12-19T18:58:43Z

Summary

This work should be behind an environment variable feature flag which defaults to NOT doing anything (locally we can enable it)

After we've setup the attachment pipeline in the prior ticket (#3320) we want to load attachments into the index.

We'll need to do the following:

For every attachment load it from S3
Base64 encode the attachment and set it as an attachments list on the opportunity JSON like so:

{
    "opportunity_id": 1,
    "opportunity_title": "my title",
    "summary" : {...},
     .. a bunch of other fields not included here for brevity,
    "attachments": [
    {
      "filename" : "ipsum.txt",
      "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
    },
    {
      "filename" : "test.txt",
      "data" : "VGhpcyBpcyBhIHRlc3QK"
    }
  ]
}

To make a pipeline get used when uploaded a record, you need to specify pipeline="whatever_we_called_the_pipeline" when calling the self._client.bulk method inside of our search client (have pipeline be an optional field passed into the bulk method).

NOTE: We likely need to have infra modify the search cluster to be larger for this. We will need a large search index (disk, not CPU, maybe memory) as the data size will grow from ~1gb to 55gb+ when we do this as the attachments are about 55gb.

Acceptance criteria

Attachments loaded into search for each opportunity
Thorough testing

The text was updated successfully, but these errors were encountered:

github-project-automation bot added this to Simpler.Grants.gov Product Backlog Dec 19, 2024

github-project-automation bot moved this to Icebox in Simpler.Grants.gov Product Backlog Dec 19, 2024

chouinar moved this from Icebox to Todo in Simpler.Grants.gov Product Backlog Dec 19, 2024

babebe self-assigned this Jan 7, 2025

babebe moved this from Todo to In Progress in Simpler.Grants.gov Product Backlog Jan 8, 2025

babebe linked a pull request Jan 8, 2025 that will close this issue

[WIP]3321/load attachment #3467

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load attachments into the search index #3321

Load attachments into the search index #3321

chouinar commented Dec 19, 2024 •

edited

Loading

Load attachments into the search index #3321

Load attachments into the search index #3321

Comments

chouinar commented Dec 19, 2024 • edited Loading

Summary

Acceptance criteria

chouinar commented Dec 19, 2024 •

edited

Loading