example-spark-datasourcev2

A very simple Java implementation of the Apache Spark DataSourceV2 API.

This example is compatible with Spark 2.4.3.

Building

The jar file containing the DataSource is built with the following command

$ mvn package

Testing

The DataSource can be demonstrated from the pyspark shell.

Pyspark should be launched with the following command:

$ pyspark --jars ./target/example-datasource-1.0.jar

You should see something like

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.3
      /_/

Using Python version 3.7.3 (default, Jun 24 2019 04:54:02)
SparkSession available as 'spark'.

Then from within the pyspark shell, type the commands below:

>>> df = spark.read.format('example.ExampleDataSource').load()
>>> df.show()

In order to display the data provided by the DataSource

+-------+---+
|   name|age|
+-------+---+
|  Alfie| 24|
| Bertie| 36|
|Charlie| 48|
| Debbie| 60|
|  Ernie| 72|
|Frankie| 84|
| Gettie| 96|
+-------+---+

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src/main/java/example		src/main/java/example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

example-spark-datasourcev2

Building

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

gcdev373/example-spark-datasourcev2

Folders and files

Latest commit

History

Repository files navigation

example-spark-datasourcev2

Building

Testing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages