The project showcases approaches you can take on Spark missing Enum
support.
The limitation is caused by lack of possibility to encode an ADT
in DataSet
column.
The approach would require to provide a custom Encoder
, which is not possible at the moment.
Kryo
will not help you either, check "Fake case class parent" to understand why.
Articles or SO posts you can find useful:
Each approach is showcased with a test suite that compares two situations:
- Regular Scala collection with created objects
- Spark-ingested
Dataset
based on the above collection
The test is linked in each title.
Keep in mind that in some cases, Spark
looses certain data during encoding/decoding process,
which is always reflected in the assertions!