Skip to content

A very simple Java implementation of the Spark DataSourceV2 API.

License

Notifications You must be signed in to change notification settings

gcdev373/example-spark-datasourcev2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

example-spark-datasourcev2

A very simple Java implementation of the Apache Spark DataSourceV2 API.

This example is compatible with Spark 2.4.3.

Building

The jar file containing the DataSource is built with the following command

$ mvn package

Testing

The DataSource can be demonstrated from the pyspark shell.

Pyspark should be launched with the following command:

$ pyspark --jars ./target/example-datasource-1.0.jar

You should see something like

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.3
      /_/

Using Python version 3.7.3 (default, Jun 24 2019 04:54:02)
SparkSession available as 'spark'.

Then from within the pyspark shell, type the commands below:

>>> df = spark.read.format('example.ExampleDataSource').load()
>>> df.show()

In order to display the data provided by the DataSource

+-------+---+
|   name|age|
+-------+---+
|  Alfie| 24|
| Bertie| 36|
|Charlie| 48|
| Debbie| 60|
|  Ernie| 72|
|Frankie| 84|
| Gettie| 96|
+-------+---+

About

A very simple Java implementation of the Spark DataSourceV2 API.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages