Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new method ByteString.Output.write(ByteString) to eliminate double copy of data #19746

Open
archiecobbs opened this issue Dec 19, 2024 · 0 comments
Assignees
Labels

Comments

@archiecobbs
Copy link

What language does this apply to?

Java

Describe the problem you are trying to solve.

I am trying to use ByteString as a general purpose, thread-safe, immutable holder of byte[] data. It's almost perfect for that but not quite.

As part of this work, I have a frequent need to construct ByteStrings by cobbling together random other ByteStrings and raw byte[] fragments.

Obviously, the proper (and only) way to build new ByteStrings like this is by creating a ByteString.Output and writing to it.

When writing raw byte[] data, there's no problem doing this (except for issue #19741).

But when writing a ByteString, there's a problem: Because the ByteString.Output class only accepts bytes and byte[]s, so you have to convert the ByteString to a byte[] and then write it, but this results in your data being copied twice.

On the other hand, if there were a method ByteString.Output.write(ByteString bs), then this could be a zero copy operation, because underneath the covers ByteString.Output is simply building an internal list ByteStrings and so writing a ByteString is (almost) as simple as just appending it to that list.

Describe the solution you'd like

Something like this:

diff --git a/java/core/src/main/java/com/google/protobuf/ByteString.java b/java/core/src/main/java/com/google/protobuf/ByteString.java
index 1903a4deb..b3471b836 100644
--- a/java/core/src/main/java/com/google/protobuf/ByteString.java
+++ b/java/core/src/main/java/com/google/protobuf/ByteString.java
@@ -1108,6 +1108,17 @@ public abstract class ByteString implements Iterable<Byte>, Serializable {
       }
     }
 
+    /**
+     * Writes the complete contents of the given ByteString.
+     *
+     * @param byteString byte data to write
+     */
+    public synchronized void write(ByteString byteString) {
+        flushLastBuffer(false);
+        flushedBuffers.add(byteString);
+        flushedBuffersTotalBytes += byteString.size();
+    }
+
     /**
      * Creates a byte string with the size and contents of this output stream. This does not create
      * a new copy of the underlying bytes. If the stream size grows dynamically, the runtime is
@@ -1117,7 +1128,7 @@ public abstract class ByteString implements Iterable<Byte>, Serializable {
      * @return the current contents of this output stream, as a byte string.
      */
     public synchronized ByteString toByteString() {
-      flushLastBuffer();
+      flushLastBuffer(true);
       return ByteString.copyFrom(flushedBuffers);
     }
 
@@ -1188,10 +1199,10 @@ public abstract class ByteString implements Iterable<Byte>, Serializable {
     }
 
     /**
-     * Internal function used by {@link #toByteString()}. The current buffer may or may not be full,
-     * but it needs to be flushed.
+     * Internal function used by {@link #toByteString()} and {@link #write(ByteString)}. The current
+     * buffer may or may not be full, but it needs to be flushed.
      */
-    private void flushLastBuffer() {
+    private void flushLastBuffer(boolean resetBuffer) {
       if (bufferPos < buffer.length) {
         if (bufferPos > 0) {
           byte[] bufferCopy = Arrays.copyOf(buffer, bufferPos);
@@ -1206,7 +1217,9 @@ public abstract class ByteString implements Iterable<Byte>, Serializable {
         // case without wasting space.  In the rare case that more writes
         // *do* occur, this empty buffer will be flushed and an appropriately
         // sized new buffer will be created.
-        buffer = EMPTY_BYTE_ARRAY;
+        if (resetBuffer) {
+          buffer = EMPTY_BYTE_ARRAY;
+        }
       }
       flushedBuffersTotalBytes += bufferPos;
       bufferPos = 0;

Describe alternatives you've considered

Slow & ugly conversion from ByteStringbyte[] and then using ByteString.Output.write(byte[]).

@archiecobbs archiecobbs added the untriaged auto added to all issues by default when created. label Dec 19, 2024
@archiecobbs archiecobbs changed the title Add new method ByteString.write(ByteString) to eliminate double copy of data Add new method ByteString.Output.write(ByteString) to eliminate double copy of data Dec 19, 2024
@zhangskz zhangskz added java and removed untriaged auto added to all issues by default when created. labels Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants