You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This work should be behind an environment variable feature flag which defaults to NOT doing anything (locally we can enable it)
After we've setup the attachment pipeline in the prior ticket (#3320) we want to load attachments into the index.
We'll need to do the following:
For every attachment load it from S3
Base64 encode the attachment and set it as an attachments list on the opportunity JSON like so:
{
"opportunity_id": 1,
"opportunity_title": "my title",
"summary" : {...},
.. a bunch of other fields not included here for brevity,"attachments": [
{
"filename" : "ipsum.txt",
"data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
},
{
"filename" : "test.txt",
"data" : "VGhpcyBpcyBhIHRlc3QK"
}
]
}
To make a pipeline get used when uploaded a record, you need to specify pipeline="whatever_we_called_the_pipeline" when calling the self._client.bulk method inside of our search client (have pipeline be an optional field passed into the bulk method).
NOTE: We likely need to have infra modify the search cluster to be larger for this. We will need a large search index (disk, not CPU, maybe memory) as the data size will grow from ~1gb to 55gb+ when we do this as the attachments are about 55gb.
Acceptance criteria
Attachments loaded into search for each opportunity
Thorough testing
The text was updated successfully, but these errors were encountered:
Summary
This work should be behind an environment variable feature flag which defaults to NOT doing anything (locally we can enable it)
After we've setup the attachment pipeline in the prior ticket (#3320) we want to load attachments into the index.
We'll need to do the following:
To make a pipeline get used when uploaded a record, you need to specify
pipeline="whatever_we_called_the_pipeline"
when calling theself._client.bulk
method inside of our search client (have pipeline be an optional field passed into the bulk method).NOTE: We likely need to have infra modify the search cluster to be larger for this. We will need a large search index (disk, not CPU, maybe memory) as the data size will grow from ~1gb to 55gb+ when we do this as the attachments are about 55gb.
Acceptance criteria
The text was updated successfully, but these errors were encountered: