Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

writeRaw binary content #914

Open
rashtao opened this issue Jul 14, 2022 · 11 comments
Open

writeRaw binary content #914

rashtao opened this issue Jul 14, 2022 · 11 comments

Comments

@rashtao
Copy link

rashtao commented Jul 14, 2022

What is the recommended way to write raw binary content? As far as I can see com.fasterxml.jackson.core.JsonGenerator#writeRaw() variants only support String/char[]/SerializableString arguments, which are not suitable for binary formats.

Is there any plan to extend it and accept binary arguments, i.e. byte[]/InputStream?

@cowtowncoder
Copy link
Member

cowtowncoder commented Jul 15, 2022

JsonGenerator.writeBinary()? It only uses Base64 encoding if the underlying format does not have native binary values and binary formats typically do.

I do not recommend writeRaw() for any use as that requires caller to know a lot about underlying format (ditto for writeBinaryValue() that is slightly less dangerous); especially so with binary formats.

Also note that the answer may depend on specific format backend, at least when using last-effort methods like writeRaw().

@rashtao
Copy link
Author

rashtao commented Jul 15, 2022

I mean specifically writeRaw(), in order to append an already properly encoded byte sequence. I would like to achieve the same behavior of writeRaw(String|char[]|SerializableString) for binary dataformats. Of course not all format backends can support this, but at least Smile could support this.

Currently, if I have an already encoded valid byte array (e.g. in Smile format), in order to append it, I need first to parse it (i.e. with JsonParser#readValueAsTree() and then write it (i.e. with JsonGenerator#writeTree()), which consumes unneeded cpu and memory resources.

@cowtowncoder
Copy link
Member

Ok. That makes sense then. At this points, PRs would be welcome and/or specific per-format tickets to denote where exactly functionality is missing.

As a work-around one can usually hold on to OutputStream used (and use generator.flush()) but support for writeRaw() would definitely be cleaner as it can auto-flush.

@rashtao
Copy link
Author

rashtao commented Jul 18, 2022

Thanks!

I think it would be even better adding new methods to com.fasterxml.jackson.core.JsonGenerator, e.g. writeRaw((byte[]|InputStream) data).

This would allow further enhancements at databind level by adding support for byte[] value to com.fasterxml.jackson.databind.util.RawValue and will ultimately enable the capability to map binary raw values in databind API, i.e.:

  • in all usages of raw values in com.fasterxml.jackson.databind.node (eg. com.fasterxml.jackson.databind.node.ObjectNode#putRawValue())
  • supporting byte[] property value for @JsonRawValue annotation
  • serializing RawValue fields in binary data formats backends

What is your opinion about it?

@cowtowncoder
Copy link
Member

From API perspective it is sort of tricky, given that now we have set of methods that work/don't-work on different backends (text vs binary). I specifically would be against supporting binary-content - to -text-format (which some users would no doubt want :) ) because it could or could not work based on backend (not for Writer, yes for OutputStream).

But then again without this, writeRaw() variants are useless for binary content and one has to "merge" content with lower level constructs.

I think my initial thinking is that I would want to defer variant(s) that take streams, and start with just byte[] (with possible addition of ByteBuffer). API could be added in 2.14, with default implementation in JsonGenerator that throws an exception indicating backend does not support operation; but with implementation for at least one binary backend to verify usage.
We would want both writeRaw(byte[]) and writeRawValue(byte[]).

We might also need to either use an existing StreamWriteCapability (CAN_WRITE_BINARY_NATIVELY) or add new one(s) in case code needs to decide on text-or-binary handling for some raw values.

It might make sense to divide into first supporting low-level (JsonGenerator) functionality, test that, and then once merged, figure out how to make things work via POJOs and JsonNode. Looks like RawValue would need improvements; and existing RawSerializer wouldn't quite work as-is either.

I probably won't have tons of time to work on this myself but if you or anyone else has time, I would find time to code review and help with work.

@rashtao
Copy link
Author

rashtao commented Jul 19, 2022

Sounds good to me, thanks for sharing! I will draft a PR as I find time.
Should we move the discussion to jackson-core project in order to offer better visibility and gather further feedback?

@cowtowncoder cowtowncoder removed the 2.14 label Feb 4, 2023
@cowtowncoder cowtowncoder transferred this issue from FasterXML/jackson-dataformats-binary Feb 4, 2023
@rashtao
Copy link
Author

rashtao commented Dec 4, 2024

Hi @cowtowncoder ,

AFAICS, this can be already achieved with JsonGenerator#writeRawValue(SerializableString). Despite the name SerializableString, this interface can be used to access raw bytes, without quoting and ignoring the char or string representation of the data being appended.

For example, the implementation of UTF8JsonGenerator#writeRawValue(SerializableString) works this way:

@Override
public void writeRawValue(SerializableString text) throws IOException {
_verifyValueWrite(WRITE_RAW);
int len = text.appendUnquotedUTF8(_outputBuffer, _outputTail);
if (len < 0) {
_writeBytes(text.asUnquotedUTF8());
} else {
_outputTail += len;
}
}

The problem is that SerializedString (the only implementation of SerializableString in Jackson Core) does not offer a constructor accepting raw bytes. To work around this, I can provide my implementation of SerializableString:

class RawUserDataValue implements SerializableString {
    private final byte[] data;

    RawUserDataValue(byte[] data) {
        this.data = data;
    }

    @Override
    public String getValue() {
        throw new UnsupportedOperationException();
    }

    @Override
    public int charLength() {
        throw new UnsupportedOperationException();
    }

    @Override
    public char[] asQuotedChars() {
        throw new UnsupportedOperationException();
    }

    @Override
    public byte[] asUnquotedUTF8() {
        return data;
    }

    @Override
    public byte[] asQuotedUTF8() {
        throw new UnsupportedOperationException();
    }

    @Override
    public int appendQuotedUTF8(byte[] buffer, int offset) {
        throw new UnsupportedOperationException();
    }

    @Override
    public int appendQuoted(char[] buffer, int offset) {
        throw new UnsupportedOperationException();
    }

    @Override
    public int appendUnquotedUTF8(byte[] buffer, int offset) {
        final int length = data.length;
        if ((offset + length) > buffer.length) {
            return -1;
        }
        System.arraycopy(data, 0, buffer, offset, length);
        return length;
    }

    @Override
    public int appendUnquoted(char[] buffer, int offset) {
        throw new UnsupportedOperationException();
    }

    @Override
    public int writeQuotedUTF8(OutputStream out) {
        throw new UnsupportedOperationException();
    }

    @Override
    public int writeUnquotedUTF8(OutputStream out) throws IOException {
        final int length = data.length;
        out.write(data, 0, length);
        return length;
    }

    @Override
    public int putQuotedUTF8(ByteBuffer buffer) {
        throw new UnsupportedOperationException();
    }

    @Override
    public int putUnquotedUTF8(ByteBuffer buffer) {
        final int length = data.length;
        if (length > buffer.remaining()) {
            return -1;
        }
        buffer.put(data, 0, length);
        return length;
    }
}

and append raw value as bytes to the generator this way:

  JsonGenerator gen;
  byte[] rawValueToAppend;
  // ...
  gen.writeRawValue(new RawUserDataValue(rawValueToAppend));

This works for example with UTF8JsonGenerator, where rawValueToAppend is the UTF8 bytes representation of the JSON value that I want to append.

IMHO, this solution is enough from a functional standpoint. Nonetheless, we can consider about making the API a bit more clear w.r.t. this, handling differently serialized string content (e.g. SerializableString to represent chars/string content only) and serialized byte content (e.g. SerializableRawBytes to represent byte array/ByteBuffer content only).

@cowtowncoder
Copy link
Member

@rashtao True. I am bit worried about the fact that SerializableRawBytes (or maybe SerializableUTF8Bytes?) only implements part of API -- it cannot be used to add content to Writer, specifically -- so it'd be quite specialized. But I could consider a PR to add such a thing if someone was to submit it :)

@pjfanning
Copy link
Member

I think it is very untidy to create a new class that implements so little. Could we just add an extra method to UTF8JsonGenerator that takes a byte array as input. We already have UTF8 related methods that take byte arrays as inputs.

@cowtowncoder
Copy link
Member

I guess there's more than one way to skin a cat -- and if this should then be a method in JsonGenerator, but with failing implementation for Writer-backed generators.

I am not sure I agree with addition of a class necessarily being worse than adding a method at JsonGenerator level as interaction between generator and SerializableString is well defined.
Although then again, from naming perspective the new implementation would be misleading indeed (it's not really String in same sense as SerializedString is).

@rashtao
Copy link
Author

rashtao commented Dec 5, 2024

Right, I agree, adding new methods to support this is definitively more clear, i.e. as discussed above: #914 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants