Skip to content

Conversation

simonelbaz
Copy link

@simonelbaz simonelbaz commented Sep 18, 2025

…d if ListBucket is not allowed for the user

Thanks for opening a pull request!

Rationale for this change

This PR gives the user to choose not to create directory in the bucket before writing dataset.
In case the create_directory option is set to FALSE, no verification will be made by R arrow.
The S3 storage will itself verify if the directory exists and if the users has the rigth to modify it.
This way no ListBucket or HeadBucket are necessary to achieve the write operation.

df |> arrow::write_dataset(
  minio$path(paste0("smartsla-bucket/rarrow/")),
  partitioning = "qualitative",
  create_directory = FALSE,
  format = "parquet"
)

What changes are included in this PR?

create_directory is now available to the user in the write_dataset function.
Before this PR, this option was automatically set to TRUE (by default).

Are these changes tested?

Yes

Are there any user-facing changes?

No, the default value for create_directory is still TRUE.

This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)

N/A

This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

N/A

Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@simonelbaz simonelbaz changed the title [ISSUE 42173][R] Writing partitionned dataset with Rarrow on S3 failed if ListBucket is not allowed for the user [GH-42173][R] Writing partitionned dataset with Rarrow on S3 failed if ListBucket is not allowed for the user Sep 18, 2025
@simonelbaz simonelbaz changed the title [GH-42173][R] Writing partitionned dataset with Rarrow on S3 failed if ListBucket is not allowed for the user GH-42173: [R] Writing partitionned dataset with Rarrow on S3 failed if ListBucket is not allowed for the user Sep 18, 2025
Copy link

⚠️ GitHub issue #42173 has been automatically assigned in GitHub to PR creator.

@simonelbaz simonelbaz changed the title GH-42173: [R] Writing partitionned dataset with Rarrow on S3 failed if ListBucket is not allowed for the user GH-42173: [R][S3] Writing partitionned dataset with Rarrow on S3 failed if ListBucket is not allowed for the user Sep 18, 2025
Copy link

⚠️ GitHub issue #42173 has been automatically assigned in GitHub to PR creator.

@simonelbaz simonelbaz marked this pull request as draft September 18, 2025 21:08
…3 failed if ListBucket is not allowed for the user
…3 failed if ListBucket is not allowed for the user
@simonelbaz simonelbaz marked this pull request as ready for review September 25, 2025 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant