Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] S3 PUT does not work for Office Files (Excel, Word, PowerPoint) #3702

Open
skjones91199 opened this issue Nov 2, 2024 · 14 comments
Open
Labels
bug Something isn't working

Comments

@skjones91199
Copy link

Type of Connector

Independent Publisher Connector

Name of Connector

Amazon S3 Bucket

Describe the bug

Trying to use the connector to put office files into an Amazon S3 bucket.

The PUT appears to work, but the file in the amazon s3 bucket is corrupted. The error is below:

Image
Image

File types tried:
.docx
.pptx
.xlsx
.png

.txt, .csv, and .pdf file work.

I retrieved the file contents via a SharePoint site, OneDrive, and another S3 bucket.

Is this a security bug?

No, this is not a security bug

What is the severity of this bug?

Severity 2 - One or more important connector features are down

To Reproduce

  1. Create an instant cloud flow
  2. Add an action to get a file (from sharepoint, onedrive, or s3). For this test I used 'Get S3 Object Content'
  3. I have used a Compose step here to get the output, but it doesn't make a difference in the result.
  4. use the put object to put the content to another bucket.

Expected behavior

The file is uploaded to the S3 bucket used in step #4. The file opens in its corresponding app without error. I've attached more informations3 put object information.txt

Environment summary

Power Automate
Amazon S3
OS - windows 10

Additional context

I've attached a file with the raw inputs and outputs for the s3 get and put actions.

@skjones91199 skjones91199 added the bug Something isn't working label Nov 2, 2024
@skjones91199
Copy link
Author

To be clear - no error is received. The error appears when trying to use the file that was put into the s3 bucket.

@megel
Copy link
Contributor

megel commented Nov 14, 2024

@skjones91199 can you please add a sreenshot of your PUT action. Please include also possible transformations of your file content before you use the action.

I have recognized, sometime it make sence to use a compose action in which is initialized with the file content and passes the content as output to the PUT action.

Please try this:

File from SharePoint --> Compose --> PUT S3

Especially for a file stored in sharepoint, you must decode the content:

Image
Content gets the Output from Data action

I use this formula for my test flow:

decodeBase64(body('Get_file_content')?['$content'])

Hope this helps
BR / MIchael

@skjones91199
Copy link
Author

Thanks for the reply, Michael! I do appreciate it. I setup a test flow to implement the suggestions you made, and had the same results. I've included the screenshot of the compose and put actions, as well as the input & output from the put action when testing the flow.

I have also tried getting the file contents from an s3 bucket with the same result.

I appreciate any suggestions!Image
Image
input - PUT action.txt
output - PUT action.txt

@nonamef
Copy link

nonamef commented Nov 19, 2024

We're also have the same issue and the above steps didn't help #3702 (comment). I noticed the screenshot for 'Put Object' has a different green icon compared with us having the red icon. Is it the same connector?

@ckane
Copy link

ckane commented Nov 20, 2024

I am also seeing the same problem. I inspected the XLSX after the transfer, the file size is bigger and it appears that a bunch of higher-value ASCII bytes seem to be inserted in odd locations. I'll try to do some canned tests tomorrow and see if I can add some artifacts to this issue.

@megel
Copy link
Contributor

megel commented Nov 21, 2024

Ahh, I see. @ckane I can confirm that the stream is not correctly encoded at AWS S3. I need to investigate into this issue.

@ckane
Copy link

ckane commented Nov 22, 2024

Ok, attaching the files

Original: Another Test 2024-11-22.xlsx

Uploaded (corrupted): Another_Test_2024-11-22.xlsx

Using hexdump I see a lot of the ef bf bd byte sequences in the corrupted one, which suggests the UTF-8 "replacement character" is being inserted somewhere along the way: https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%EF%BF%BD&mode=char

Looking at the original file, it appears these are inserted where the byte value is 0x80 (128) or higher. I suspect some sort of Microsoft ANSI->UTF-8 encoding is occurring. I did decode the Base64 that is present in the "INPUTS" Parameters to your PUT OBJECT action, and the base64 displayed there (in the body.content field) properly decodes to the appropriate XLSX data.

@skjones91199
Copy link
Author

Ok, attaching the files

Original: Another Test 2024-11-22.xlsx

Uploaded (corrupted): Another_Test_2024-11-22.xlsx

Using hexdump I see a lot of the ef bf bd byte sequences in the corrupted one, which suggests the UTF-8 "replacement character" is being inserted somewhere along the way: https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%EF%BF%BD&mode=char

Looking at the original file, it appears these are inserted where the byte value is 0x80 (128) or higher. I suspect some sort of Microsoft ANSI->UTF-8 encoding is occurring. I did decode the Base64 that is present in the "INPUTS" Parameters to your PUT OBJECT action, and the base64 displayed there (in the body.content field) properly decodes to the appropriate XLSX data.

@ckane - thank you for the information! After you get the file you have the 'base64(body('Get_file_content')) in the Content field of the Put object? That didn't work for me, but I think I'm interpreting your steps incorrectly. Would you perhaps give a bit more detail on how you got this working?

@ckane
Copy link

ckane commented Nov 26, 2024

Yes, I have a SharePoint "Get file content" block that is feeding into the "Put Object"

Using the "Code View" in the PowerAutomate UI, the inputs.parameters.body field is populated with @body('Get_file_content'), and yes, when I inspect a successful or unsuccessful flow run, and look at the "Parameters" tab of the "Put Object", I can see that the INPUT parameter body.$content contains the base64-encoded file contents. If I copy->paste that value from the PowerAutomate UI for the flow run, into a file on my system, then base64-decode the created file, the resulting raw file is not corrupted. It seems like it either gets corrupted inside the Put Object action due to some sort of string->binary or binary->string encoding conversion, or it gets corrupted during PUT into S3 due to an attempted encoding translation in S3.

@ckane
Copy link

ckane commented Nov 26, 2024

When the file triggering the run is all plain ASCII text (such as a CSV instead of XLSX), the file comes out fine in S3. The challenge with that is that I want my users to be able to edit in-place in SharePoint, and we cannot do that with a CSV format file.

@skjones91199
Copy link
Author

In our case, the file put up in S3 needs to be available to our data scientists who have processes that depend on the file being in xlsx format.

@ckane
Copy link

ckane commented Nov 27, 2024

Yeah, our use case is similar

@skjones91199
Copy link
Author

Yeah, our use case is similar

So it seems there is no viable workaround for using the Put action. Are you considering modifying the code yourself? I haven't done that before, but wonder if that is the best alternative.

@megel
Copy link
Contributor

megel commented Dec 8, 2024

Hi @ckane & @skjones91199,

sorry for the waiting but I have good news to you! I found the issue. The root cause was, that the Base64 encoded content from Power Automate was not converted into binary content for S3. This worked for plain text files such as CSV or Base64 encoded formats (PDF), but not for XLSX files.

Therefore, I have modified the script.csx and added a method that tries to convert the given content from base64 directly into binary content before sending the request. If the content can't be converted from Base64 it is directly passed "as is" without any changes to S3.

The fix is included in the pull request #3731

Note: you can test the upcoming bug fix by importing the connector files from this repository by using Power Platform CLI pac connector create or pac connector update.

Have a nice weekend & BR / Michael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants
@ckane @megel @nonamef @skjones91199 and others