Approaches on Enums in Spark 2.x

The project showcases approaches you can take on Spark missing Enum support.

The limitation is caused by lack of possibility to encode an ADT in DataSet column. The approach would require to provide a custom Encoder, which is not possible at the moment.

Kryo will not help you either, check "Fake case class parent" to understand why.

Articles or SO posts you can find useful:

What and how?

Each approach is showcased with a test suite that compares two situations:

Regular Scala collection with created objects
Spark-ingested Dataset based on the above collection

The test is linked in each title.

Keep in mind that in some cases, Spark looses certain data during encoding/decoding process, which is always reflected in the assertions!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/test/scala/com/github/atais		src/test/scala/com/github/atais
.gitignore		.gitignore
1caseclass.md		1caseclass.md
2extrafield.md		2extrafield.md
3hacky.md		3hacky.md
4typealias.md		4typealias.md
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Approaches on Enums in Spark 2.x

What and how?

Approaches

About

Uh oh!

Releases

Packages

Languages

atais/spark-enum

Folders and files

Latest commit

History

Repository files navigation

Approaches on Enums in Spark 2.x

What and how?

Approaches

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages