Skip to content

atais/spark-enum

Repository files navigation

Approaches on Enums in Spark 2.x

The project showcases approaches you can take on Spark missing Enum support.

The limitation is caused by lack of possibility to encode an ADT in DataSet column. The approach would require to provide a custom Encoder, which is not possible at the moment.

Kryo will not help you either, check "Fake case class parent" to understand why.

Articles or SO posts you can find useful:

What and how?

Each approach is showcased with a test suite that compares two situations:

  • Regular Scala collection with created objects
  • Spark-ingested Dataset based on the above collection

The test is linked in each title.

Keep in mind that in some cases, Spark looses certain data during encoding/decoding process, which is always reflected in the assertions!

Approaches

  1. Case class wrapper
  2. Extra field with primitive column
  3. Fake case class parent
  4. Type alias

About

Different approaches on Enum in Spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages