Arrow: Vectorized reads of decimal columns with default values fail with IllegalArgumentException

### Apache Iceberg version

main (development)

### Query engine

Spark

### Please describe the bug 🐞


## Issue Summary

When the vectorized Arrow reader is used to read a v3 Iceberg table that has a `decimal` column carrying an `initialDefault` or `writeDefault`, vector allocation fails with:

```
java.lang.IllegalArgumentException: Cannot cast default value to FIXED[9]: 12345.6789
  at org.apache.iceberg.types.Types$NestedField.castDefault(Types.java:892)
  at org.apache.iceberg.types.Types$NestedField.<init>(Types.java:881)
  at org.apache.iceberg.types.Types$NestedField$Builder.build(Types.java:850)
  at org.apache.iceberg.arrow.vectorized.VectorizedArrowReader.getPhysicalType(VectorizedArrowReader.java:255)
  at org.apache.iceberg.arrow.vectorized.VectorizedArrowReader.allocateFieldVector(VectorizedArrowReader.java:228)
  at org.apache.iceberg.arrow.vectorized.VectorizedArrowReader.read(VectorizedArrowReader.java:151)
```

The message varies with the underlying Parquet physical encoding:
- `FIXED_LEN_BYTE_ARRAY`-backed decimal → `Cannot cast default value to fixed[N]: <default>`

Same read path with vectorization disabled has no errors:

```
spark.sql.iceberg.vectorization.enabled=false
```

## Repro

1. Create a v3 Iceberg table with a decimal column that has a default value:

```sql
CREATE TABLE local.db.t (
  id INT,
  amount DECIMAL(5, 2) DEFAULT 0.00
) USING iceberg TBLPROPERTIES ('format-version' = '3');

INSERT INTO local.db.t VALUES (1, 1.23), (2, 4.56), (3, 7.89);
```

2. Read with vectorization enabled (the default):

```sql
SET spark.sql.iceberg.vectorization.enabled=true;
SELECT * FROM local.db.t;
```

The query fails with the stack trace above. The failure is deterministic only when the column is not dictionary-encoded; with dictionary encoding, allocation goes through `allocateDictEncodedVector` and bypasses the buggy path, so small/highly-repetitive data sets may appear to read successfully.

## Root cause

`VectorizedArrowReader#getPhysicalType` rewrites a decimal Iceberg field to its underlying physical type (`int` / `long` / `fixed[N]`) so the right Arrow vector class can be allocated:

```java
physicalType = Types.NestedField.from(logicalType).ofType(type).build();
```

`Types.NestedField.Builder.from(field)` copies the field's `initialDefault` and `writeDefault` onto the builder. `NestedField`'s constructor then calls `castDefault(literal, type)` against the new physical type — for a decimal default this delegates to `DecimalLiteral.to(LongType | IntegerType | FixedType)`, which is undefined and returns `null`, tripping the `Preconditions.checkArgument` in `castDefault`.

Conceptually, the defaults belong to the logical (decimal) view of the column and should not flow to the physical representation — the physical type is an internal detail used only to size the Arrow vector. The non-vectorized readers (`BaseParquetReaders`, `SparkParquetReaders`, `FlinkParquetReaders`) all apply defaults at the logical-type layer and are unaffected.

Proposed PR for the fix: https://github.com/apache/iceberg/pull/16501

### Willingness to contribute

- [x] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arrow: Vectorized reads of decimal columns with default values fail with IllegalArgumentException #16502

Apache Iceberg version

Query engine

Please describe the bug 🐞

Issue Summary

Repro

Root cause

Willingness to contribute

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Arrow: Vectorized reads of decimal columns with default values fail with IllegalArgumentException #16502

Description

Apache Iceberg version

Query engine

Please describe the bug 🐞

Issue Summary

Repro

Root cause

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions