Skip to content

Latest commit

 

History

History
32 lines (27 loc) · 1.24 KB

README.md

File metadata and controls

32 lines (27 loc) · 1.24 KB

RecArraySerDe is a Hive custom SerDe that can be used to write data as a sequence of NumPy record arrays which, in turn, are the foundation of the FITS format.

Using this SerDe and generating the FITS headers from the metadata, a valid FITS file can be constructed by appending the serialized data to the respective headers and padding the result to 2880-byte boundaries.

The following column types are supported: BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DATE, TIMESTAMP, CHAR, VARCHAR and STRING.

The rest of column types are not supported: DECIMAL, BINARY, ARRAY, MAP, STRUCT and UNION.

The mapping between Hive and FITS types is as follows:

  • BOOLEAN: L
  • TINYINT: B (You need a TZERO card to properly store the values)
  • SMALLINT: I
  • INT: J
  • BIGINT: K
  • CHAR, VARCHAR, STRING: 255A (Per-column values after HIVE-13064 fix)
  • DATE: 10A (ISO-8601: "yyyy-MM-dd")
  • TIMESTAMP: 23A (ISO-8601: "yyyy-MM-dd'T'HH:mm:ss.SSS")

NULL values are serialized following FITS standard:

  • BOOLEAN: \0
  • TINYINT, SMALLINT, INT, BIGINT: Type.MIN_VALUE
  • FLOAT, DOUBLE: NaN
  • CHAR, VARCHAR, STRING: All bytes are \0 (FITS only requires the first)
  • DATE, TIMESTAMP: All bytes are \0