Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load attachments into the search index #3321

Open
2 tasks
chouinar opened this issue Dec 19, 2024 · 0 comments · May be fixed by #3467
Open
2 tasks

Load attachments into the search index #3321

chouinar opened this issue Dec 19, 2024 · 0 comments · May be fixed by #3467
Assignees

Comments

@chouinar
Copy link
Collaborator

chouinar commented Dec 19, 2024

Summary

This work should be behind an environment variable feature flag which defaults to NOT doing anything (locally we can enable it)

After we've setup the attachment pipeline in the prior ticket (#3320) we want to load attachments into the index.

We'll need to do the following:

  • For every attachment load it from S3
  • Base64 encode the attachment and set it as an attachments list on the opportunity JSON like so:
{
    "opportunity_id": 1,
    "opportunity_title": "my title",
    "summary" : {...},
     .. a bunch of other fields not included here for brevity,
    "attachments": [
    {
      "filename" : "ipsum.txt",
      "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
    },
    {
      "filename" : "test.txt",
      "data" : "VGhpcyBpcyBhIHRlc3QK"
    }
  ]
}

To make a pipeline get used when uploaded a record, you need to specify pipeline="whatever_we_called_the_pipeline" when calling the self._client.bulk method inside of our search client (have pipeline be an optional field passed into the bulk method).

NOTE: We likely need to have infra modify the search cluster to be larger for this. We will need a large search index (disk, not CPU, maybe memory) as the data size will grow from ~1gb to 55gb+ when we do this as the attachments are about 55gb.

Acceptance criteria

  • Attachments loaded into search for each opportunity
  • Thorough testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

2 participants