Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to read plain encoded INT96 timestamp from Parquet file #456

Closed
wants to merge 26 commits into from

Conversation

mskapilks
Copy link

Current only INT96 dictionary encoded timestamp is supported this PR add support for plain encoding

@mskapilks
Copy link
Author

cc: @zhouyuan

@zhouyuan
Copy link
Collaborator

zhouyuan commented Dec 8, 2023

@mskapilks Thank you for helping on this feature! Could you please help to make a pull request to Meta/Velox? We are now doing daily rebase, so if the pull request is merged by Meta/Velox then it will become available in Gluten very soon.

CC: @rui-mo is also working on timestamp related enabling.

thanks, -yuan

@mskapilks
Copy link
Author

@zhouyuan This PR builds upon the already work done in oap-project/velox to read dictionary encoded timestamp. PR add additional support to read plain encoded as well. In Meta velox timestamp reading from parquet is not supported yet, so could exactly make a PR on Meta/Velox.

@zhli1142015
Copy link

zhli1142015 commented Dec 11, 2023

Hello @zhouyuan and @rui-mo ,
Looks timestamp parquet reader is only available in this repo, could you guide @mskapilks , how to process this change? Will you help review this change and merge to Rui's time-stamp change? Or should we wait you check in time stamp support to meta velox, and any timeline for it?
Thanks.
facebookincubator@d18bae8#diff-58ef2bd79ff4173b5a384a3016d6f7bd7d4b237876251cc9d9910e162993eaaa

@rui-mo
Copy link
Collaborator

rui-mo commented Dec 12, 2023

@mskapilks Thank you for the great work. Seems this PR reuses some existing change on timestamp reader. Let me explain the context here.
We opened a PR in Velox, but as Velox community suggested, we'd better start from INT64 support first, and refer to the actual logical type to decode the Parquet. For details, please check facebookincubator#4680 (comment) and facebookincubator#4680 (comment).

Will you help review this change and merge to Rui's time-stamp change? Or should we wait you check in time stamp support to meta velox, and any timeline for it?

As we are lack of bandwidth right now, it will take longer time till we make time to continue this work. If @mskapilks is interested to promote this work in Velox, please feel free to open new PRs. If it is preferred to merge this PR in this repo, it also works for us.

cc @zhli1142015

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add a unit test in ParquetTableScanTest to guard the functionality. And please fix the code format. Thanks.

@mskapilks
Copy link
Author

We are also looking to add INT64 support once this PR get merged. For those I think we can get traction in Meta/Velox.

@mskapilks
Copy link
Author

Added test.
I don't think the formatting error is related to my change. It gives error for file DecimalRound.cpp

@zhouyuan
Copy link
Collaborator

Added test. I don't think the formatting error is related to my change. It gives error for file DecimalRound.cpp

Indeed, the format issue is introduced in recent rebase. @zhztheplayer will fix it in the next rebase.

thanks, -yuan

@mskapilks
Copy link
Author

Any idea why this is failing in build?

image

@zhouyuan
Copy link
Collaborator

@mskapilks it's not related - the URL for hadoop binary changed. will be fixed in the next rebase.

#459 (comment)

@mskapilks
Copy link
Author

mskapilks commented Dec 20, 2023

Rebased again. @zhouyuan @rui-mo Can you please take a look? Thanks

@zhouyuan
Copy link
Collaborator

@mskapilks thanks, looks good, I'll pick this in the next rebase

@mskapilks
Copy link
Author

The failure test DecimalUtilTest.divideWithRoundUp seems unrelated. Will rebase again

@rui-mo
Copy link
Collaborator

rui-mo commented Dec 21, 2023

@mskapilks Thanks for your continuous efforts on maintaining this PR.I also met CI failure due to DecimalUtilTest.divideWithRoundUp, and it can be fixed with below change.
https://github.com/oap-project/velox/pull/465/files#diff-93fb5c8d85f82393426a2af433500e34465a9937220171dc93d0a002ee8120d9R33
Maybe you could include this change in your PR to get the CI passes, and I am going make a fix for it in Velox upstream.

@mskapilks
Copy link
Author

@rui-mo Thanks for suggestion, updated the PR with fix

@zhouyuan
Copy link
Collaborator

@mskapilks already included this commit in the update branch, thanks!

@zhouyuan zhouyuan closed this Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants