Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Decimal32 and Decimal64 #7061

Draft
wants to merge 58 commits into
base: main
Choose a base branch
from

Conversation

CurtHagenlocher
Copy link

Which issue does this PR close?

Closes #6661.

Rationale for this change

Decimal32 and Decimal64 were added to Arrow recently; this implements support in arrow-rs.

What changes are included in this PR?

Code and tests for the new types are included.

Are there any user-facing changes?

New types Decimal32Array, Decimal64Array, Decimal32Type, and Decimal64Type are added. New values Decimal32 and Decimal64 have been added to the DataType enum. Consumers may need to update their matches accordingly.

32-bit and 64-bit decimal values from Parquet files are still being returned as Decimal128 by default unless the consumer specifically asks for the narrower type.

CurtHagenlocher and others added 30 commits October 19, 2024 10:15
…rquet fails (apache#6886)

* Minor: make it easier to find instructions when fmt fails

* purposely introduce a fmt issue

* Revert "purposely introduce a fmt issue"

This reverts commit 440e520.

* Update .github/workflows/rust.yml

Co-authored-by: Ed Seidl <[email protected]>

---------

Co-authored-by: Ed Seidl <[email protected]>
* Minor: add comments explaining bad MSRV

* purposely introduce msrv brek

* output in JSON format

* Revert "purposely introduce msrv brek"

This reverts commit 61872b6.
* Add 54.4.0 to release schedule

* prettoer
* Add deprecation / API removal policy

* Increase proposal to 2 releases

* change from policy to guidelines, add flexibility

* prettier

* Make instructions more actionable
* add function to create ProjectionMask from column names

* add some more tests
* doc: add comment for timezone string

Signed-off-by: xxchan <[email protected]>

* Update arrow-schema/src/datatype.rs

Co-authored-by: Raphael Taylor-Davies <[email protected]>

---------

Signed-off-by: xxchan <[email protected]>
Co-authored-by: Raphael Taylor-Davies <[email protected]>
* Update version to 54.0.0

* Update changelog

* update notes

* updtes

* update
apache#6875)

* add `extend_dictionary` in dictionary builder for improved performance

* fix extends all nulls

* support null in mapped value

* adding comment

* run `clippy` and `fmt`

* fix ci

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <[email protected]>

---------

Co-authored-by: Andrew Lamb <[email protected]>
* [object_store]: Version and Changelog for 0.11.2

* increment version

* update script

* changelog

* tweaks

* Update object_store/CHANGELOG.md

Co-authored-by: Raphael Taylor-Davies <[email protected]>

---------

Co-authored-by: Raphael Taylor-Davies <[email protected]>
…pache#6907)

* feat(parquet): Add next_row_group API for ParquetRecordBatchStream

Signed-off-by: Xuanwo <[email protected]>

* chore: Returning error instead of using unreachable

Signed-off-by: Xuanwo <[email protected]>

---------

Signed-off-by: Xuanwo <[email protected]>
…he#6849)

* [arrow-string] Implement string view suport for regexp match

Signed-off-by: Tai Le Manh <[email protected]>

* update unit tests

* fix clippy warnings

* Add test cases

Signed-off-by: Tai Le Manh <[email protected]>

---------

Signed-off-by: Tai Le Manh <[email protected]>
* Add doctest example for

* Remove typo

* Update arrow-buffer/src/buffer/immutable.rs

---------

Co-authored-by: Andrew Lamb <[email protected]>
* object_store: Add `thiserror` dependency

* object_store/memory: Migrate from `snafu` to `thiserror`

* object_store/parse: Migrate from `snafu` to `thiserror`

* object_store/util: Migrate from `snafu` to `thiserror`

* object_store/local: Migrate from `snafu` to `thiserror`

* object_store/delimited: Migrate from `snafu` to `thiserror`

* object_store/path/parts: Migrate from `snafu` to `thiserror`

* object_store/path: Migrate from `snafu` to `thiserror`

* object_store/http: Migrate from `snafu` to `thiserror`

* object_store/client: Migrate from `snafu` to `thiserror`

* object_store/aws: Migrate from `snafu` to `thiserror`

* object_store/azure: Migrate from `snafu` to `thiserror`

* object_store/gcp: Migrate from `snafu` to `thiserror`

* object_store/lib: Migrate from `snafu` to `thiserror`

* Remove `snafu` dependency
* feat: add GenericListViewBuilder

* remove uszie

* fix tests

* remove static

* lint

* chore: add comment for should fail test

* Update arrow-array/src/builder/generic_list_view_builder.rs

Co-authored-by: Marco Neumann <[email protected]>

* Update arrow-array/src/builder/generic_list_view_builder.rs

Co-authored-by: Marco Neumann <[email protected]>

* fix name & lint

---------

Co-authored-by: Marco Neumann <[email protected]>
…pache#6925)

Updates the requirements on [itertools](https://github.com/rust-itertools/itertools) to permit the latest version.
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](rust-itertools/itertools@v0.13.0...v0.14.0)

---
updated-dependencies:
- dependency-name: itertools
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…he#6932)

* chore: add docs for how to use Extend for generic methods on ArrayBuilders

* chore: move to mod docs and add more examples
psvri and others added 21 commits January 12, 2025 19:33
…rom<Bytes>` and `From<bytes::Bytes>` impls (apache#6939)

* Improve Bytes documentation

* Improve Buffer documentation, add From<Bytes> and From<bytes::Bytes> impls

* avoid linking to private docs

* Deprecate `Buffer::from_bytes`

* Apply suggestions from code review

Co-authored-by: Jeffrey Vo <[email protected]>

---------

Co-authored-by: Jeffrey Vo <[email protected]>
…ults (apache#6738)

* Reduce  panics

* t pushmove integer logical type from format.rs to schema type.rs

* remove some changes as per reviews

* use wrapping_shl

* fix typo in error message

* return error for invalid decimal length

---------

Co-authored-by: jp0317 <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
* Update most MSRVs

* Make cargo-msrv verify every package in repo instead of just a select few and purposefully break arrow-flight msrv

* Add test to ensure workspace rust version is being used at least somewhere

* Fix exit1 => exit 1

* Make arrow-flight work, at the very least, with 'cargo metadata'

* Fix arrow-flight/gen rust-version to make CI pass now

* Get rid of pretty msrv logging as it can't all be displayed

* Do '-mindepth 2' with find to prevent running cargo msrv on the workspace as a whole

* Use correct MSRV for object_store

* remove workspace msrv check

* revert msrv

* push object_store MSRV back down to 1.62.1

* Revert unrelated formatting changes

* Fix object_store msrv

---------

Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Jeffrey Vo <[email protected]>
* Document the ParquetRecordBatchStream buffering

* Update parquet/src/arrow/async_reader/mod.rs

Co-authored-by: Raphael Taylor-Davies <[email protected]>

---------

Co-authored-by: Raphael Taylor-Davies <[email protected]>
* reuse buffer in view array

* Update parquet/src/arrow/array_reader/byte_view_array.rs

Co-authored-by: Raphael Taylor-Davies <[email protected]>

* use From<Bytes> instead

---------

Co-authored-by: Raphael Taylor-Davies <[email protected]>
* regenerate arrow-ipc/src/gen with patched flatbuffers

* use git repo instead of local path

* add backticks

* expand allowed overage to accommodate more alignment padding

* re-enable nanoarrow integration test

* add assertions that struct alignment is correct

* remove struct alignment assertions

* apply a patch to generated code rather than requiring patched flatc

* point to google/flatbuffers with pub PushAlignment

* add license header to gen.patch

* use flatbuffers 24.12.23

* remove unnecessary gen.patch
…6955)

* Add test and benchmarks for writing floats with NaNs

* Remove extra benchmark with no NaNs
* add peek_next_page_offset

* Update parquet/src/file/serialized_reader.rs

Co-authored-by: Andrew Lamb <[email protected]>

---------

Co-authored-by: Andrew Lamb <[email protected]>
* Improve `ParquetRecordBatchStreamBuilder` docs

* Apply suggestions from code review

Thank you @etseidl  ❤️

Co-authored-by: Ed Seidl <[email protected]>

* Update parquet/src/arrow/async_reader/mod.rs

Co-authored-by: Ed Seidl <[email protected]>

---------

Co-authored-by: Ed Seidl <[email protected]>
…he#6953)

* Treat NaNs equal to NaN when interning for dictionary encoding

* Compare all values by bytes rather than adding Intern trait
@github-actions github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate labels Feb 1, 2025
@CurtHagenlocher
Copy link
Author

This change is rather large. It would in principle be possible to first submit a separate PR with a small number of refactoring changes before the PR that adds the new types. I think the context for the refactoring changes is useful, but would be willing to do the split if there's demand for it.

@tustvold
Copy link
Contributor

tustvold commented Feb 1, 2025

would be willing to do the split if there's demand for it

I think it will be necessary to break this into smaller incremental pieces to get this in. Not just the refactoring, but also the functionality itself - the addition to DataType for example could be its own PR.

I appreciate this is more effort on your end, but we're very review constrained, and a 3000 line diff is simply not tractable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support new Arrow types decimal32 and decimal64