Skip to content

Pinecone upsert optimization #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

ivanazeljkovic
Copy link

Pinecone upsert optimizations include few changes:

  • Write separate function for upserting documents to Pinecone vector store in order to support both synchronous and asynchronous upsert
  • Introduce document_chunk_size parameter which along with batch_size optimize preprocessing and upserting documents
  • Change default batch_size to 64 and document_chunk_size to 1000 based on experiments and benchmarks

NOTE: Please do not merge this pull request

@ivanazeljkovic ivanazeljkovic changed the title Pinecone upset optimisation Pinecone upsert optimisation Nov 8, 2023
@ivanazeljkovic ivanazeljkovic changed the title Pinecone upsert optimisation Pinecone upsert optimization Nov 8, 2023
ivanazeljkovic and others added 19 commits November 9, 2023 09:34
* remove the preview package

* more cleanup

* fix linter
* fix: load_from_deepset_cloud without name in yaml

* fix tests

* add reno
* added pods and pod_type arg to PineconeDocumentStore

* added a test

* added releasenotes

* added pod, pod_type to mock pinecone index and releated methods

* removing test file from others

* added integration test

* fixes

* set default values for new parameters

* rm test that needs a pro plan

---------

Co-authored-by: Stefano Fiorucci <[email protected]>
Co-authored-by: anakin87 <[email protected]>
…set-ai#6529)

* cast to PromptModelInvocationLayer

* fix pylint pointless-exception-statement

* use two variables to avoid re-assignment

* black

* use mocked tokenizer in unit test
…et-ai#6533)

* convert use_auth_token into token

* add explanatory comment
* remove OpenAIAnswerGenerator

* reno

* Remove BaseGenerator and GenerativeQAPipeline

* remove more references

* docs

* leftover

* reno
* Add MongoDB Atlas document store

* Add tests

* Fix linting

* Add release notes + remove extra roman import

* Add pymongo driver info

* mypy and pylint fixes

* Use future import annotations

* fix errors in tests

* remove from __future__ import

* Fix

* remove tests

* add docstring

---------

Co-authored-by: Julian Risch <[email protected]>
…set-ai#6547)

* Fix minor_version_release.yml to run on the v1.x branch

* review
* feat: add bedrock embeddings to embedding encoder

* reno & black

* feat: refactoring for bedrock embedding encoder

* feat: bedrock embedding encoder

* feat: bedrock refactoring

* feat: bedrock refactoring

* feat: bedrock refactoring

* feat: bedrock refactoring

* feat: bedrock refactoring

* feat: bedrock refactoring

* feat: bedrock refactoring, add cohere

* feat: bedrock refactoring, add cohere

* pylint: disable too-many-return-statements in method

* feat: bedrock refactoring

* feat: bedrock refactoring

* feat: bedrock refactoring

* feat: bedrock refactoring

* feat: bedrock refactoring

* feat: bedrock refactoring

* fix mypy and pylint errors

* manually run precommit

* refactor init

* fix cohere truncate and refactor embed

* fix mypy

* proper exception handing

---------

Co-authored-by: Stefano Fiorucci <[email protected]>
Co-authored-by: anakin87 <[email protected]>
Co-authored-by: tstadel <[email protected]>
…ai#6574)

* Changed standard final_answer_pattern in base.py to avoid stopping match with newline

* added testing on multiline

* added testing on final_answer_pattern for multiline matching

* reformatting

* added release note
…eciprocal rank fusion (deepset-ai#5704)

* Change type of PromptModel invocation_layer_class init param (deepset-ai#6497)

* fix: mypy `"str" not callable` for `PromptModelInvocationLayer` (deepset-ai#6529)

* cast to PromptModelInvocationLayer

* fix pylint pointless-exception-statement

* use two variables to avoid re-assignment

* black

* use mocked tokenizer in unit test

* Add normalization and weighting for `JoinDocuments` reciprocal rank fusion

* Add weights and score normalization for reciprocal rank fusion in JoinDocuments node.

* Fix black-jupyter

* Fix JoinDocuments test for rrf + score normalization

---------

Co-authored-by: Silvano Cerza <[email protected]>
Co-authored-by: Julian Risch <[email protected]>
* Add model_kwargs to SentenceTransformersRanker

* Add unit test

* Add unit tests for the torch dtype extraction

* Add release notes

* Fix formatting

* Fix patch

* Make function more explicit
@github-actions github-actions bot added the 2.x label Dec 25, 2023
@ivanazeljkovic ivanazeljkovic force-pushed the feature/pinecone-upsert-optimization branch from 853e88d to 5918ded Compare December 25, 2023 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.