Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add persistence for localfs datasets #557

Merged

Conversation

VladOS95-cyber
Copy link
Contributor

@VladOS95-cyber VladOS95-cyber commented Dec 2, 2024

What does this PR do?

Add persistency logic for localfs datasetio provider

  • Addresses issue (#issue)

Test Plan

Please describe:

  • tests you ran to verify your changes with result summaries.
  • provide instructions so it can be reproduced.

Sources

Please link relevant resources if necessary.
#539

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Ran pre-commit to handle lint / formatting issues.
  • Read the contributor guideline,
    Pull Request section?
  • Updated relevant documentation.
  • Wrote necessary unit or integration tests.

@facebook-github-bot
Copy link

Hi @VladOS95-cyber!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 3, 2024
@VladOS95-cyber VladOS95-cyber force-pushed the add-persistence-for-localfs-ds branch from 111ca21 to 07fca6e Compare December 3, 2024 18:44
@VladOS95-cyber VladOS95-cyber force-pushed the add-persistence-for-localfs-ds branch from 07fca6e to 0252898 Compare December 5, 2024 16:13
@VladOS95-cyber VladOS95-cyber marked this pull request as ready for review December 5, 2024 16:14
@VladOS95-cyber
Copy link
Contributor Author

Hi @ashwinb @yanxi0830 @hardikjshah @dltn @raghotham! This PR is ready for a review. Please, take a look.

@dineshyv dineshyv self-requested a review December 11, 2024 19:46
@yanxi0830
Copy link
Contributor

yanxi0830 commented Dec 11, 2024

Hi @VladOS95-cyber thanks for the PR, could you attach the tests you ran in the test plan?

@VladOS95-cyber
Copy link
Contributor Author

VladOS95-cyber commented Dec 13, 2024

Hi @yanxi0830! My test results:
Снимок

I don't think that my changes is a root cause of failed one. I found that in def get_dataframe_from_url(url: URL): in url_utils.py, df = pandas.read_excel(data_bytes) line 41 tries to read data_bytes that was converted from text type 'mimetype': 'application/vnd.ms-excel'. pandas.read_excel throws an expection trying to read it and it does not matter if we specify engines manually or not. But it could successfully read by pandas.read_csv(data_bytes) and does not throw any exceptions and all tests are green. It seems when it deals with in memory bytes or old xml formats it cannot properly read the data. By the way, all necessary dependencies like xlrd and openpyxl i installed, so it should not be a problem.
I could try to provide my fix for it if it is needed, or I miss something here?

@VladOS95-cyber
Copy link
Contributor Author

Hi @yanxi0830! Just a kind reminder, please take a look when you have time, in order to not lose this PR.

@yanxi0830
Copy link
Contributor

Thanks @VladOS95-cyber for the PR, sorry for the delay. Merged!

@yanxi0830 yanxi0830 merged commit 96735e9 into meta-llama:main Jan 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants