Pinecone upsert optimization #10

ivanazeljkovic · 2023-11-08T11:46:30Z

Pinecone upsert optimizations include few changes:

Write separate function for upserting documents to Pinecone vector store in order to support both synchronous and asynchronous upsert
Introduce document_chunk_size parameter which along with batch_size optimize preprocessing and upserting documents
Change default batch_size to 64 and document_chunk_size to 1000 based on experiments and benchmarks

NOTE: Please do not merge this pull request

* remove the preview package * more cleanup * fix linter

* fix: load_from_deepset_cloud without name in yaml * fix tests * add reno

* added pods and pod_type arg to PineconeDocumentStore * added a test * added releasenotes * added pod, pod_type to mock pinecone index and releated methods * removing test file from others * added integration test * fixes * set default values for new parameters * rm test that needs a pro plan --------- Co-authored-by: Stefano Fiorucci <[email protected]> Co-authored-by: anakin87 <[email protected]>

…pset-ai#6493)

…-ai#6497)

…set-ai#6529) * cast to PromptModelInvocationLayer * fix pylint pointless-exception-statement * use two variables to avoid re-assignment * black * use mocked tokenizer in unit test

…et-ai#6533) * convert use_auth_token into token * add explanatory comment

* remove OpenAIAnswerGenerator * reno * Remove BaseGenerator and GenerativeQAPipeline * remove more references * docs * leftover * reno

* Add MongoDB Atlas document store * Add tests * Fix linting * Add release notes + remove extra roman import * Add pymongo driver info * mypy and pylint fixes * Use future import annotations * fix errors in tests * remove from __future__ import * Fix * remove tests * add docstring --------- Co-authored-by: Julian Risch <[email protected]>

…set-ai#6547) * Fix minor_version_release.yml to run on the v1.x branch * review

* feat: add bedrock embeddings to embedding encoder * reno & black * feat: refactoring for bedrock embedding encoder * feat: bedrock embedding encoder * feat: bedrock refactoring * feat: bedrock refactoring * feat: bedrock refactoring * feat: bedrock refactoring * feat: bedrock refactoring * feat: bedrock refactoring * feat: bedrock refactoring, add cohere * feat: bedrock refactoring, add cohere * pylint: disable too-many-return-statements in method * feat: bedrock refactoring * feat: bedrock refactoring * feat: bedrock refactoring * feat: bedrock refactoring * feat: bedrock refactoring * feat: bedrock refactoring * fix mypy and pylint errors * manually run precommit * refactor init * fix cohere truncate and refactor embed * fix mypy * proper exception handing --------- Co-authored-by: Stefano Fiorucci <[email protected]> Co-authored-by: anakin87 <[email protected]> Co-authored-by: tstadel <[email protected]>

…ai#6574) * Changed standard final_answer_pattern in base.py to avoid stopping match with newline * added testing on multiline * added testing on final_answer_pattern for multiline matching * reformatting * added release note

…eciprocal rank fusion (deepset-ai#5704) * Change type of PromptModel invocation_layer_class init param (deepset-ai#6497) * fix: mypy `"str" not callable` for `PromptModelInvocationLayer` (deepset-ai#6529) * cast to PromptModelInvocationLayer * fix pylint pointless-exception-statement * use two variables to avoid re-assignment * black * use mocked tokenizer in unit test * Add normalization and weighting for `JoinDocuments` reciprocal rank fusion * Add weights and score normalization for reciprocal rank fusion in JoinDocuments node. * Fix black-jupyter * Fix JoinDocuments test for rrf + score normalization --------- Co-authored-by: Silvano Cerza <[email protected]> Co-authored-by: Julian Risch <[email protected]>

* Add model_kwargs to SentenceTransformersRanker * Add unit test * Add unit tests for the torch dtype extraction * Add release notes * Fix formatting * Fix patch * Make function more explicit

ivanazeljkovic added 2 commits November 8, 2023 12:32

Optimize documents upsert using asynchronous requests

adb13cd

Remove print statement

24031b3

ivanazeljkovic changed the title ~~Pinecone upset optimisation~~ Pinecone upsert optimisation Nov 8, 2023

github-actions bot added topic:tests topic:document_store topic:pinecone labels Nov 8, 2023

ivanazeljkovic changed the title ~~Pinecone upsert optimisation~~ Pinecone upsert optimization Nov 8, 2023

github-actions bot added the type:documentation label Nov 8, 2023

ivanazeljkovic and others added 19 commits November 9, 2023 09:34

Fix mypy and pylint issues

5918ded

chore: remove preview code from the 1.x release line (deepset-ai#6447)

f8507cb

* remove the preview package * more cleanup * fix linter

fix: load_from_deepset_cloud without name in yaml (deepset-ai#6478)

41e9c29

* fix: load_from_deepset_cloud without name in yaml * fix tests * add reno

Change use_auth_token to token in TransformersDocumentClassifier (dee…

73c3302

…pset-ai#6493)

Change type of PromptModel invocation_layer_class init param (deepset…

6e20adf

…-ai#6497)

fix: mypy "str" not callable for PromptModelInvocationLayer (deep…

850cf34

…set-ai#6529) * cast to PromptModelInvocationLayer * fix pylint pointless-exception-statement * use two variables to avoid re-assignment * black * use mocked tokenizer in unit test

fix: HFInvocationLayer - convert use_auth_token to token (deeps…

5304193

…et-ai#6533) * convert use_auth_token into token * add explanatory comment

chore: remove OpenAIAnswerGenerator (deepset-ai#6531)

295af24

* remove OpenAIAnswerGenerator * reno * Remove BaseGenerator and GenerativeQAPipeline * remove more references * docs * leftover * reno

fix: make minor_version_release.yml to run on the v1.x branch (deep…

5693dd0

…set-ai#6547) * Fix minor_version_release.yml to run on the v1.x branch * review

Update unstable version to 1.24.0-rc0

9afa66a

update version

46f95a0

fix release notes

8fe8e48

update version

03a05e2

feat: Add model kwargs to SentenceTransformersRanker (deepset-ai#6627)

5ff81c2

* Add model_kwargs to SentenceTransformersRanker * Add unit test * Add unit tests for the torch dtype extraction * Add release notes * Fix formatting * Fix patch * Make function more explicit

ivanazeljkovic force-pushed the feature/pinecone-upsert-optimization branch from 5918ded to c25c4e9 Compare December 25, 2023 15:46

github-actions bot added topic:CI topic:build/distribution labels Dec 25, 2023

github-actions bot added the 2.x label Dec 25, 2023

ivanazeljkovic force-pushed the feature/pinecone-upsert-optimization branch from 853e88d to 5918ded Compare December 25, 2023 15:55

ivanazeljkovic added 2 commits December 25, 2023 16:57

Merge branch 'v1.23.x' into feature/pinecone-upsert-optimization

011c582

Fixed conflict

241fa7e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pinecone upsert optimization #10

Pinecone upsert optimization #10

ivanazeljkovic commented Nov 8, 2023

Pinecone upsert optimization #10

Are you sure you want to change the base?

Pinecone upsert optimization #10

Conversation

ivanazeljkovic commented Nov 8, 2023