Description
Overview
The msgpack specification defines the str
types as:
Str format family stores a byte array
But in Java:
The String class represents character strings.
The difference here is a msgpack str value can carry any arbitrary array of bytes while Java enforces a character set encoding on its String objects. This means that a msgpack str allows byte arrays that cannot be properly decoded to Java Strings.
Why does this matter?
As the popularity of Redis grows beyond mere caching, Redis functions become an important feature for performing multiple operations in a single atomic unit of work. The definition of the FCALL/FCALL_RO commands have a list of keys and arguments that are order-based and consequently strongly couple the function code to the calling code. This can be alleviated greatly by using msgpack to send complex objects to a Redis function as its argument(s).
Redis functions bring together the Lua programming language for server-side functions with the cmsgpack library for encoding and decoding data. Java enters the picture with multiple libraries including Spring Data Redis, Jedis, and Lettuce.
To Redis, every key and value is a byte array, so a client may store a msgpack data structure as the value for a key or hash entry.
When this value is retrieved via a Lua script, it is treated by Lua as a string-- Lua has no byte[] data type because a string in Lua, just like in msgpack, can contain any arbitrary array of bytes.
But when Jackson deserializes this str value it decodes it into a String. At this point we lose fidelity because the byte array behind the Lua string and msgpack str does not match the byte array behind the Java String-- Java has converted the byte[] into a String using a character encoding that has character substitution logic to shoehorn the invalid data into the String.
So how can this be fixed?
It seems easy-- the MessagePackParser should read every string object just like a binary object-- into a byte[]. Then when Jackson matches this to a data type, for any String objects it can proceed using the same conversion logic as it currently employs by default; but when the target of deserialization has a byte[] data type, the raw bytes can be supplied directly.
This would allow what is currently not possible using Jackson deserialization of msgpack objects -- reading in data from Redis values that do not correspond to valid Java Strings.
Our current use case looks like the following:
- Java code produces a value to store in Redis and does so using Spring Data Redis. The Redis Template uses MessagePackRedisSerializer to store any arbitrary object as a msgpack string of bytes.
- We need to read many of these objects to process a single request, so we are using Lua to read multiple values from multiple hashes. As the values are read, they become Lua strings. Once all objects are collected, we use cmsgpack to pack the result and reply to the Java layer.
- The Java layer using jackson-dataformat-msgpack deserializes the values but receives incorrect byte[] after unpacking the response because it gets converted to a String first and then to byte[].