Motivation
Apache TsFile already has ecosystem integration with Spark, and the tree-model Spark TsFile connector can be used as a reference:
https://github.com/apache/iotdb-extras/tree/master/connectors/spark-tsfile
TsFile now also supports the table model, including TableSchema, ColumnCategory.TAG/FIELD, table-model read APIs, and table-model write APIs such as TsFileWriter#registerTableSchema and TsFileWriter#writeTable.
We need a Spark connector for table-model TsFile so users can read and write table-model TsFiles directly through Spark SQL/DataFrame APIs.
Goal
Develop a Spark SQL/DataFrame connector for TsFile table model. The connector should reuse existing TsFile Java read/write APIs as much as possible, instead of duplicating TsFile parsing or writing logic.
Expected Scope
The initial implementation should support:
- Reading table-model TsFile files or directories into Spark DataFrames.
- Inferring or loading table schemas from TsFile metadata, including:
- table name
- time column
- TAG columns
- FIELD columns
- TsFile data types and corresponding Spark SQL types
- Preserving table-model semantics:
- TAG columns identify devices
- FIELD columns represent measurements
- null values and sparse field values are handled correctly
- Reading multiple TsFiles with compatible schemas.
- Column pruning where possible.
- Predicate pushdown where possible, especially:
- time-range filters
- tag filters
- Writing Spark DataFrames into table-model TsFiles, with options such as:
- table name
- tag columns
- field columns
- encoding/compression defaults if needed
- Providing user-facing examples for Spark SQL/DataFrame read and write workflows.
Proposed User Experience
Example read API:
val df = spark.read
.format("tsfile")
.option("model", "table")
.option("table", "weather")
.load("/path/to/tsfile-dir")
df.select("time", "city", "device", "temperature")
.where("city = 'beijing'")
.show()
Motivation
Apache TsFile already has ecosystem integration with Spark, and the tree-model Spark TsFile connector can be used as a reference:
https://github.com/apache/iotdb-extras/tree/master/connectors/spark-tsfile
TsFile now also supports the table model, including
TableSchema,ColumnCategory.TAG/FIELD, table-model read APIs, and table-model write APIs such asTsFileWriter#registerTableSchemaandTsFileWriter#writeTable.We need a Spark connector for table-model TsFile so users can read and write table-model TsFiles directly through Spark SQL/DataFrame APIs.
Goal
Develop a Spark SQL/DataFrame connector for TsFile table model. The connector should reuse existing TsFile Java read/write APIs as much as possible, instead of duplicating TsFile parsing or writing logic.
Expected Scope
The initial implementation should support:
Proposed User Experience
Example read API: