From dfcbdb9217dc69a8e8b83c47837d5125d472c947 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Mon, 18 May 2026 17:23:41 -0600 Subject: [PATCH 1/4] docs: remove references to native_datafusion and native_iceberg_compat scans --- .../adding_a_new_spark_version.md | 12 +-- docs/source/contributor-guide/bug_triage.md | 8 +- .../user-guide/latest/compatibility/index.md | 2 +- .../user-guide/latest/compatibility/scans.md | 77 ++++++------------- .../latest/compatibility/spark-versions.md | 16 ++-- docs/source/user-guide/latest/datasources.md | 21 ++--- .../latest/understanding-comet-plans.md | 2 +- 7 files changed, 50 insertions(+), 88 deletions(-) diff --git a/docs/source/contributor-guide/adding_a_new_spark_version.md b/docs/source/contributor-guide/adding_a_new_spark_version.md index 1f16e10153..65109e4d0c 100644 --- a/docs/source/contributor-guide/adding_a_new_spark_version.md +++ b/docs/source/contributor-guide/adding_a_new_spark_version.md @@ -138,9 +138,7 @@ own test suites under the new profile. Promote the new Spark version from the compile-only job to the main test jobs in `.github/workflows/pr_build_linux.yml` (and `pr_build_macos.yml` if -capacity allows). Use `scan_impl: "auto"` so both `native_datafusion` and -`native_iceberg_compat` get exercised, matching how earlier versions are -configured. +capacity allows). Match how earlier versions are configured. ### Run the Suite Locally First @@ -256,14 +254,12 @@ new-version bring-up are: ### CI for the Spark SQL Tests Spark SQL tests do not run from the main PR build workflows. They have -their own dedicated workflow files: +their own dedicated workflow file: - `.github/workflows/spark_sql_test.yml` -- `.github/workflows/spark_sql_test_native_iceberg_compat.yml` -Add the new version to the matrix in each of these files (`spark-short`, -`spark-full`, `java`, `scan-impl`). Use the closest existing entry as a -template. +Add the new version to the matrix (`spark-short`, `spark-full`, `java`). +Use the closest existing entry as a template. Before merging, run `make format`, run clippy (`cd native && cargo clippy --all-targets --workspace -- -D warnings`), and diff --git a/docs/source/contributor-guide/bug_triage.md b/docs/source/contributor-guide/bug_triage.md index 9e51f44637..2d541e6527 100644 --- a/docs/source/contributor-guide/bug_triage.md +++ b/docs/source/contributor-guide/bug_triage.md @@ -73,8 +73,7 @@ help contributors find bugs in their area of expertise. | `area:ffi` | Arrow FFI / JNI boundary | | `area:ci` | CI/CD, GitHub Actions, build tooling | -The following pre-existing labels also serve as area indicators: `native_datafusion`, -`native_iceberg_compat`, `spark 4`, `spark sql tests`. +The following pre-existing labels also serve as area indicators: `spark 4`, `spark sql tests`. ## Triage Process @@ -109,9 +108,8 @@ Periodically review open bugs to ensure priorities are still accurate: crashes, because crashes are at least visible. 2. **User-reported over test-only.** A bug hit by a real user on a real workload takes priority over one found only in test suites. -3. **Core path over experimental.** Bugs in the default scan mode (`native_comet`) or widely-used - expressions take priority over bugs in experimental features like `native_datafusion` or - `native_iceberg_compat`. +3. **Core path over experimental.** Bugs in the Parquet scan or widely-used expressions take + priority over bugs in experimental features. 4. **Production safety over feature completeness.** Fixing a data corruption bug is more important than adding support for a new expression. diff --git a/docs/source/user-guide/latest/compatibility/index.md b/docs/source/user-guide/latest/compatibility/index.md index 1ba9d9e181..59c6a906f1 100644 --- a/docs/source/user-guide/latest/compatibility/index.md +++ b/docs/source/user-guide/latest/compatibility/index.md @@ -23,7 +23,7 @@ Comet aims to provide consistent results with the version of Apache Spark that i This guide documents areas where Comet's behavior is known to differ from Spark. Topics are grouped by subsystem: -- **Parquet**: limitations when reading Parquet files (both scan implementations, shared and per-implementation). +- **Parquet**: limitations when reading Parquet files. - **Floating-point comparison**: NaN and signed-zero handling in comparisons. - **Regular expressions**: differences between the Rust regexp crate and Java's regex engine. - **Operators**: operator-level compatibility notes, including window functions and round-robin partitioning. diff --git a/docs/source/user-guide/latest/compatibility/scans.md b/docs/source/user-guide/latest/compatibility/scans.md index f784a4f467..0b50227b62 100644 --- a/docs/source/user-guide/latest/compatibility/scans.md +++ b/docs/source/user-guide/latest/compatibility/scans.md @@ -19,22 +19,13 @@ under the License. # Parquet Compatibility -Comet currently has two distinct implementations of the Parquet scan operator. +Comet's Parquet scan offloads decoding to native code and produces Arrow batches for the rest of +the plan. Comet falls back to Spark when the scan cannot be converted (for example, due to one of +the unsupported features listed below). -| Scan Implementation | Notes | -| ----------------------- | ---------------------- | -| `native_datafusion` | Fully native scan | -| `native_iceberg_compat` | Hybrid JVM/native scan | +## Parquet Scan Limitations -The configuration property `spark.comet.scan.impl` is used to select an implementation. The default setting is -`spark.comet.scan.impl=auto`, which attempts to use `native_datafusion` first, and falls back to Spark if the scan -cannot be converted (e.g., due to unsupported features). Most users should not need to change this setting. However, -it is possible to force Comet to use a particular implementation for all scan operations by setting this -configuration property to one of the following implementations. For example: `--conf spark.comet.scan.impl=native_datafusion`. - -## Shared Limitations - -The following features are not supported by either scan implementation, and Comet will fall back to Spark in these scenarios: +The following features are not supported and cause Comet to fall back to Spark: - Decimals encoded in binary format. - `ShortType` columns, by default. When reading Parquet files written by systems other than Spark that contain @@ -46,17 +37,31 @@ The following features are not supported by either scan implementation, and Come columns are always safe because they can only come from signed `INT8`, where truncation preserves the signed value. - Default values that are nested types (e.g., maps, arrays, structs). Literal default values are supported. - Spark's Datasource V2 API. When `spark.sql.sources.useV1SourceList` does not include `parquet`, Spark uses the - V2 API for Parquet scans. The DataFusion-based implementations only support the V1 API. + V2 API for Parquet scans. Comet's Parquet scan only supports the V1 API. - Spark metadata columns (e.g., `_metadata.file_path`) +- No support for row indexes +- No support for reading Parquet field IDs +- No support for `input_file_name()`, `input_file_block_start()`, or `input_file_block_length()` SQL functions. + Comet's Parquet scan does not use Spark's `FileScanRDD`, so these functions cannot populate their values. +- No support for `ignoreMissingFiles` or `ignoreCorruptFiles` being set to `true` +- Duplicate field names in case-insensitive mode (e.g., a Parquet file with both `B` and `b` columns) + are detected at read time and raise a `SparkRuntimeException` with error class `_LEGACY_ERROR_TEMP_2093`, + matching Spark's behavior. +- `spark.sql.parquet.enableVectorizedReader=false`. Disabling the vectorized reader opts into + Spark's parquet-mr semantics (silent overflow, null-on-narrowing), which Comet's native reader + does not replicate. By default Comet falls back to Spark in this case. Set + `spark.comet.scan.allowDisabledParquetVectorizedReader=true` to opt in to running the + Comet Parquet scan regardless. See + [#4352](https://github.com/apache/datafusion-comet/issues/4352). -The following shared limitation may produce incorrect results without falling back to Spark: +The following limitation may produce incorrect results without falling back to Spark: - No support for datetime rebasing. When reading Parquet files containing dates or timestamps written before Spark 3.0 (which used a hybrid Julian/Gregorian calendar), dates/timestamps will be read as if they were written using the Proleptic Gregorian calendar. This may produce incorrect results for dates before October 15, 1582. -The following shared limitation raises an error at scan time rather than falling back to Spark: +The following limitation raises an error at scan time rather than falling back to Spark: - Invalid UTF-8 bytes in `STRING` columns. Spark permits arbitrary byte sequences in a `STRING` column (for example from `CAST(X'C1' AS STRING)`), but Comet's native execution path is built on @@ -65,28 +70,7 @@ The following shared limitation raises an error at scan time rather than falling query, or cast the column to `BINARY` before persisting, if you need to preserve non-UTF-8 bytes. See [#4121](https://github.com/apache/datafusion-comet/issues/4121). -## `native_datafusion` Limitations - -The `native_datafusion` scan has some additional limitations, mostly related to Parquet metadata. All of these -cause Comet to fall back to Spark (including when using `auto` mode). Note that the `native_datafusion` scan -requires `spark.comet.exec.enabled=true` because the scan node must be wrapped by `CometExecRule`. - -- No support for row indexes -- No support for reading Parquet field IDs -- No support for `input_file_name()`, `input_file_block_start()`, or `input_file_block_length()` SQL functions. - The `native_datafusion` scan does not use Spark's `FileScanRDD`, so these functions cannot populate their values. -- No support for `ignoreMissingFiles` or `ignoreCorruptFiles` being set to `true` -- Duplicate field names in case-insensitive mode (e.g., a Parquet file with both `B` and `b` columns) - are detected at read time and raise a `SparkRuntimeException` with error class `_LEGACY_ERROR_TEMP_2093`, - matching Spark's behavior. -- `spark.sql.parquet.enableVectorizedReader=false`. Disabling the vectorized reader opts into - Spark's parquet-mr semantics (silent overflow, null-on-narrowing), which Comet's native reader - does not replicate. By default Comet falls back to Spark in this case. Set - `spark.comet.scan.allowDisabledParquetVectorizedReader=true` to opt in to running the - `native_datafusion` scan regardless. See - [#4352](https://github.com/apache/datafusion-comet/issues/4352). - -The following `native_datafusion` limitations may produce incorrect results on Spark versions prior to 4.0 +The following limitation may produce incorrect results on Spark versions prior to 4.0 without falling back to Spark: - Reading `TimestampLTZ` as `TimestampNTZ`. On Spark 3.x, Spark raises an error per @@ -112,8 +96,8 @@ Schema mismatch happens in two real-world scenarios: table types at read time. Spark's vectorized Parquet reader fully validates these conversions in `ParquetVectorUpdaterFactory.getUpdater` -and throws `SchemaColumnConvertNotSupportedException` for unsupported pairs. `native_datafusion` mirrors -that validation in its schema adapter; the entries below are the remaining gaps. +and throws `SchemaColumnConvertNotSupportedException` for unsupported pairs. Comet's Parquet scan +mirrors that validation in its schema adapter; the entries below are the remaining gaps. Note that the exact set of accepted conversions has changed between Spark versions (for example, Spark 3.x's `schemaEvolution.enabled` flag gates `INT32 → INT64`, `FLOAT → DOUBLE`, @@ -136,14 +120,3 @@ SchemaColumnConvertNotSupportedException`) instead of the one-level chain Spark' `SparkException` instead. Walk the cause chain to recover the `SchemaColumnConvertNotSupportedException`. Spark 4.0+ produces a single-level chain, matching vanilla Spark. See [#4354](https://github.com/apache/datafusion-comet/issues/4354). - -## `native_iceberg_compat` Limitations - -The `native_iceberg_compat` scan has the following additional limitation that may produce incorrect results -without falling back to Spark: - -- Some Spark configuration values are hard-coded to their defaults rather than respecting user-specified values. - This may produce incorrect results when non-default values are set. The affected configurations are - `spark.sql.parquet.binaryAsString`, `spark.sql.parquet.int96AsTimestamp`, `spark.sql.caseSensitive`, - `spark.sql.parquet.inferTimestampNTZ.enabled`, and `spark.sql.legacy.parquet.nanosAsLong`. See - [issue #1816](https://github.com/apache/datafusion-comet/issues/1816) for more details. diff --git a/docs/source/user-guide/latest/compatibility/spark-versions.md b/docs/source/user-guide/latest/compatibility/spark-versions.md index 8c17fd6b13..4856cf5a1b 100644 --- a/docs/source/user-guide/latest/compatibility/spark-versions.md +++ b/docs/source/user-guide/latest/compatibility/spark-versions.md @@ -31,13 +31,13 @@ Spark 3.4.3 is supported with Java 11/17 and Scala 2.12/2.13. ### Known Limitations - **Reading `TimestampLTZ` as `TimestampNTZ`**: Spark 3.4 raises an error for this operation - (SPARK-36182), but Comet's `native_datafusion` scan silently returns the raw UTC value instead. - See [Parquet Compatibility](scans.md#native_datafusion-limitations) for details. + (SPARK-36182), but Comet's Parquet scan silently returns the raw UTC value instead. + See [Parquet Compatibility](scans.md#parquet-scan-limitations) for details. - **Unsupported Parquet type conversions**: Spark 3.4 raises schema incompatibility errors for certain type mismatches (e.g., reading INT32 as BIGINT, decimal precision changes), but Comet's - `native_datafusion` scan may not detect these and could return unexpected values. - See [Parquet Compatibility](scans.md#native_datafusion-limitations) for details. + Comet's Parquet scan may not detect these and could return unexpected values. + See [Parquet Compatibility](scans.md#parquet-scan-limitations) for details. ## Spark 3.5 @@ -46,13 +46,13 @@ Spark 3.5.8 is supported with Java 11/17 and Scala 2.12/2.13. ### Known Limitations - **Reading `TimestampLTZ` as `TimestampNTZ`**: Spark 3.5 raises an error for this operation - (SPARK-36182), but Comet's `native_datafusion` scan silently returns the raw UTC value instead. - See [Parquet Compatibility](scans.md#native_datafusion-limitations) for details. + (SPARK-36182), but Comet's Parquet scan silently returns the raw UTC value instead. + See [Parquet Compatibility](scans.md#parquet-scan-limitations) for details. - **Unsupported Parquet type conversions**: Spark 3.5 raises schema incompatibility errors for certain type mismatches (e.g., reading INT32 as BIGINT, decimal precision changes), but Comet's - `native_datafusion` scan may not detect these and could return unexpected values. - See [Parquet Compatibility](scans.md#native_datafusion-limitations) for details. + Comet's Parquet scan may not detect these and could return unexpected values. + See [Parquet Compatibility](scans.md#parquet-scan-limitations) for details. ## Spark 4.0 diff --git a/docs/source/user-guide/latest/datasources.md b/docs/source/user-guide/latest/datasources.md index 065b719ba5..5fa05fedff 100644 --- a/docs/source/user-guide/latest/datasources.md +++ b/docs/source/user-guide/latest/datasources.md @@ -80,7 +80,6 @@ Start Comet with experimental reader and HDFS support as [described](installatio and add additional parameters ```shell ---conf spark.comet.scan.impl=native_datafusion \ --conf spark.hadoop.fs.defaultFS="hdfs://namenode:9000" \ --conf spark.hadoop.dfs.client.use.datanode.hostname = true \ --conf dfs.client.use.datanode.hostname = true @@ -158,7 +157,6 @@ JAVA_HOME="/opt/homebrew/opt/openjdk@17" make release PROFILES="-Pspark-4.1" COM withSQLConf( CometConf.COMET_ENABLED.key -> "true", CometConf.COMET_EXEC_ENABLED.key -> "true", - CometConf.COMET_NATIVE_SCAN_IMPL.key -> CometConf.SCAN_NATIVE_DATAFUSION, SQLConf.USE_V1_SOURCE_LIST.key -> "parquet", "fs.defaultFS" -> "hdfs://namenode:9000", "dfs.client.use.datanode.hostname" -> "true") { @@ -173,10 +171,11 @@ Or use `spark-shell` with HDFS support as described [above](#using-experimental- ## S3 -The `native_datafusion` and `native_iceberg_compat` Parquet scan implementations completely offload data loading -to native code. They use the [`object_store` crate](https://crates.io/crates/object_store) to read data from S3 and -support configuring S3 access using standard [Hadoop S3A configurations](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#General_S3A_Client_configuration) by translating them to -the `object_store` crate's format. +Comet's Parquet scan completely offloads data loading to native code. It uses the +[`object_store` crate](https://crates.io/crates/object_store) to read data from S3 and supports +configuring S3 access using standard +[Hadoop S3A configurations](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#General_S3A_Client_configuration) +by translating them to the `object_store` crate's format. This implementation maintains compatibility with existing Hadoop S3A configurations, so existing code will continue to work as long as the configurations are supported and can be translated without loss of functionality. @@ -210,8 +209,7 @@ Multiple credential providers can be specified in a comma-separated list using t ### Additional S3 Configuration Options -Beyond credential providers, the `native_datafusion` and `native_iceberg_compat` implementations support additional -S3 configuration options: +Beyond credential providers, Comet's Parquet scan supports additional S3 configuration options: | Option | Description | | ------------------------------- | -------------------------------------------------------------------------------------------------- | @@ -224,8 +222,7 @@ All configuration options support bucket-specific overrides using the pattern `f ### Examples -The following examples demonstrate how to configure S3 access with the `native_datafusion` and `native_iceberg_compat` -Parquet scan implementations using different authentication methods. +The following examples demonstrate how to configure S3 access using different authentication methods. **Example 1: Simple Credentials** @@ -234,7 +231,6 @@ This example shows how to access a private S3 bucket using an access key and sec ```shell $SPARK_HOME/bin/spark-shell \ ... ---conf spark.comet.scan.impl=native_datafusion \ --conf spark.hadoop.fs.s3a.access.key=my-access-key \ --conf spark.hadoop.fs.s3a.secret.key=my-secret-key ... @@ -247,7 +243,6 @@ This example demonstrates using an assumed role credential to access a private S ```shell $SPARK_HOME/bin/spark-shell \ ... ---conf spark.comet.scan.impl=native_datafusion \ --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider \ --conf spark.hadoop.fs.s3a.assumed.role.arn=arn:aws:iam::123456789012:role/my-role \ --conf spark.hadoop.fs.s3a.assumed.role.session.name=my-session \ @@ -257,7 +252,7 @@ $SPARK_HOME/bin/spark-shell \ ### Limitations -The S3 support of `native_datafusion` and `native_iceberg_compat` has the following limitations: +Comet's S3 support has the following limitations: 1. **Partial Hadoop S3A configuration support**: Not all Hadoop S3A configurations are currently supported. Only the configurations listed in the tables above are translated and applied to the underlying `object_store` crate. diff --git a/docs/source/user-guide/latest/understanding-comet-plans.md b/docs/source/user-guide/latest/understanding-comet-plans.md index 791f4bf766..93d1fe2c3e 100644 --- a/docs/source/user-guide/latest/understanding-comet-plans.md +++ b/docs/source/user-guide/latest/understanding-comet-plans.md @@ -147,7 +147,7 @@ by role. Names match what is shown in the plan output. | Node | Description | | ------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `CometScan` | V1 Parquet scan driven by Spark's file-source path through Comet's Parquet reader. Decoding runs in native code; the resulting Arrow batches cross JNI into the native plan. The active scan implementation is shown in brackets, e.g. `CometScan [native_iceberg_compat]`. | +| `CometScan` | V1 Parquet scan driven by Spark's file-source path through Comet's Parquet reader. Decoding runs in native code; the resulting Arrow batches cross JNI into the native plan. | | `CometBatchScan` | DataSource V2 scan, including Iceberg Parquet, that produces Arrow batches consumed by Comet. | | `CometNativeScan` | Fully native Parquet scan that runs entirely in DataFusion (no JVM Parquet reader involvement). | | `CometIcebergNativeScan` | Fully native Iceberg Parquet scan. | From aff7f59a16c67b4515114b9eab3b6a26fcf2da65 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Mon, 18 May 2026 19:03:08 -0600 Subject: [PATCH 2/4] style: prettier --- .../user-guide/latest/understanding-comet-plans.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/source/user-guide/latest/understanding-comet-plans.md b/docs/source/user-guide/latest/understanding-comet-plans.md index 93d1fe2c3e..c3c7f69808 100644 --- a/docs/source/user-guide/latest/understanding-comet-plans.md +++ b/docs/source/user-guide/latest/understanding-comet-plans.md @@ -145,13 +145,13 @@ by role. Names match what is shown in the plan output. ### Scans -| Node | Description | -| ------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `CometScan` | V1 Parquet scan driven by Spark's file-source path through Comet's Parquet reader. Decoding runs in native code; the resulting Arrow batches cross JNI into the native plan. | -| `CometBatchScan` | DataSource V2 scan, including Iceberg Parquet, that produces Arrow batches consumed by Comet. | -| `CometNativeScan` | Fully native Parquet scan that runs entirely in DataFusion (no JVM Parquet reader involvement). | -| `CometIcebergNativeScan` | Fully native Iceberg Parquet scan. | -| `CometCsvNativeScan` | Fully native CSV scan (experimental). | +| Node | Description | +| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `CometScan` | V1 Parquet scan driven by Spark's file-source path through Comet's Parquet reader. Decoding runs in native code; the resulting Arrow batches cross JNI into the native plan. | +| `CometBatchScan` | DataSource V2 scan, including Iceberg Parquet, that produces Arrow batches consumed by Comet. | +| `CometNativeScan` | Fully native Parquet scan that runs entirely in DataFusion (no JVM Parquet reader involvement). | +| `CometIcebergNativeScan` | Fully native Iceberg Parquet scan. | +| `CometCsvNativeScan` | Fully native CSV scan (experimental). | ### Native Execution Operators From d70e650bb054a3426de4f213cfb9667039d9471b Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Mon, 18 May 2026 22:21:08 -0600 Subject: [PATCH 3/4] docs: drop CometScan row and trim CometNativeScan description --- .../user-guide/latest/understanding-comet-plans.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/source/user-guide/latest/understanding-comet-plans.md b/docs/source/user-guide/latest/understanding-comet-plans.md index c3c7f69808..7fb93c8f53 100644 --- a/docs/source/user-guide/latest/understanding-comet-plans.md +++ b/docs/source/user-guide/latest/understanding-comet-plans.md @@ -145,13 +145,12 @@ by role. Names match what is shown in the plan output. ### Scans -| Node | Description | -| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `CometScan` | V1 Parquet scan driven by Spark's file-source path through Comet's Parquet reader. Decoding runs in native code; the resulting Arrow batches cross JNI into the native plan. | -| `CometBatchScan` | DataSource V2 scan, including Iceberg Parquet, that produces Arrow batches consumed by Comet. | -| `CometNativeScan` | Fully native Parquet scan that runs entirely in DataFusion (no JVM Parquet reader involvement). | -| `CometIcebergNativeScan` | Fully native Iceberg Parquet scan. | -| `CometCsvNativeScan` | Fully native CSV scan (experimental). | +| Node | Description | +| ------------------------ | --------------------------------------------------------------------------------------------- | +| `CometBatchScan` | DataSource V2 scan, including Iceberg Parquet, that produces Arrow batches consumed by Comet. | +| `CometNativeScan` | Fully native Parquet scan that runs entirely in DataFusion. | +| `CometIcebergNativeScan` | Fully native Iceberg Parquet scan. | +| `CometCsvNativeScan` | Fully native CSV scan (experimental). | ### Native Execution Operators From d697c8c1e25b5fc850392556a2a2fb1b51411ed8 Mon Sep 17 00:00:00 2001 From: Matt Butrovich Date: Tue, 19 May 2026 07:52:57 -0400 Subject: [PATCH 4/4] Apply suggestion from @mbutrovich --- docs/source/user-guide/latest/compatibility/scans.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/source/user-guide/latest/compatibility/scans.md b/docs/source/user-guide/latest/compatibility/scans.md index 0b50227b62..37524a829e 100644 --- a/docs/source/user-guide/latest/compatibility/scans.md +++ b/docs/source/user-guide/latest/compatibility/scans.md @@ -40,7 +40,6 @@ The following features are not supported and cause Comet to fall back to Spark: V2 API for Parquet scans. Comet's Parquet scan only supports the V1 API. - Spark metadata columns (e.g., `_metadata.file_path`) - No support for row indexes -- No support for reading Parquet field IDs - No support for `input_file_name()`, `input_file_block_start()`, or `input_file_block_length()` SQL functions. Comet's Parquet scan does not use Spark's `FileScanRDD`, so these functions cannot populate their values. - No support for `ignoreMissingFiles` or `ignoreCorruptFiles` being set to `true`