A very simple Java implementation of the Apache Spark DataSourceV2 API.
This example is compatible with Spark 2.4.3.
The jar file containing the DataSource is built with the following command
$ mvn package
The DataSource can be demonstrated from the pyspark shell.
Pyspark should be launched with the following command:
$ pyspark --jars ./target/example-datasource-1.0.jar
You should see something like
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.3
/_/
Using Python version 3.7.3 (default, Jun 24 2019 04:54:02)
SparkSession available as 'spark'.
Then from within the pyspark shell, type the commands below:
>>> df = spark.read.format('example.ExampleDataSource').load()
>>> df.show()
In order to display the data provided by the DataSource
+-------+---+
| name|age|
+-------+---+
| Alfie| 24|
| Bertie| 36|
|Charlie| 48|
| Debbie| 60|
| Ernie| 72|
|Frankie| 84|
| Gettie| 96|
+-------+---+