Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug for _index_documents_actions when the batch is too large #38987

Open
HuskyDanny opened this issue Dec 25, 2024 · 3 comments
Open

Bug for _index_documents_actions when the batch is too large #38987

HuskyDanny opened this issue Dec 25, 2024 · 3 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-author-feedback Workflow: More information is needed from author to address the issue. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Search

Comments

@HuskyDanny
Copy link

HuskyDanny commented Dec 25, 2024

  • Package Name: azure-search-documents==11.5.2
  • Package Version: azure-search-documents==11.5.2
  • Operating System: Linux
  • Python Version: python 3.10

Describe the bug
The exception's handling will trigger keyerror for error_map
Function: def _index_documents_actions(self, actions: List[IndexAction], **kwargs: Any) -> List[IndexingResult]:

The key reason is the index() function will expect to pop error_map from kwargs, so you have to include it in the kwargs
To Reproduce
Steps to reproduce the behavior:

  1. Simply read a json file over 2k documents and upload the documents
    `

read json file

with open(json_file_path, "r") as file:
    documents = json.load(file)

search_client = SearchClient(
    endpoint=service_endpoint, index_name=index_name, credential=credential
)

result = search_client.upload_documents(documents=documents)

`

Expected behavior
Items should be splitted by half and get inserted to the index

Screenshots
Image
Image

Additional context
I fixed the hanlding by having the error_map in the kwargs:

`

def _index_documents_actions(self, actions: List[IndexAction], **kwargs: Any) -> List[IndexingResult]:
error_map = {413: RequestEntityTooLargeError}
kwargs["headers"] = self._merge_client_headers(kwargs.get("headers"))
kwargs["error_map"] = error_map
batch = IndexBatch(actions=actions)
try:
batch_response = self._client.documents.index(batch=batch, **kwargs)
return cast(List[IndexingResult], batch_response.results)
except RequestEntityTooLargeError:
if len(actions) == 1:
raise
pos = round(len(actions) / 2)
batch_response_first_half = self._index_documents_actions(
actions=actions[:pos], **kwargs
)
if batch_response_first_half:
result_first_half = batch_response_first_half
else:
result_first_half = []
batch_response_second_half = self._index_documents_actions(
actions=actions[pos:], **kwargs
)
if batch_response_second_half:
result_second_half = batch_response_second_half
else:
result_second_half = []
result_first_half.extend(result_second_half)
return result_first_half

`

@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Dec 25, 2024
@xiangyan99 xiangyan99 added Search Client This issue points to a problem in the data-plane of the library. and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Dec 30, 2024
@xiangyan99 xiangyan99 self-assigned this Dec 30, 2024
@xiangyan99
Copy link
Member

Thanks for the feedback, we’ll investigate asap.

@github-actions github-actions bot added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Dec 30, 2024
@xiangyan99
Copy link
Member

I am not quite clear.

Seems like you proposed to change
batch_response = self._client.documents.index(batch=batch, error_map=error_map, **kwargs)

to
kwargs["error_map"] = error_map
batch_response = self._client.documents.index(batch=batch, **kwargs)

I don't see how it makes difference.

@xiangyan99 xiangyan99 added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Jan 2, 2025
@github-actions github-actions bot removed the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Jan 2, 2025
Copy link

github-actions bot commented Jan 2, 2025

Hi @HuskyDanny. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-author-feedback Workflow: More information is needed from author to address the issue. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Search
Projects
None yet
Development

No branches or pull requests

2 participants