Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection deserialization fails with cyclic object graph #13092

Open
lrytz opened this issue Mar 6, 2025 · 2 comments
Open

Collection deserialization fails with cyclic object graph #13092

lrytz opened this issue Mar 6, 2025 · 2 comments

Comments

@lrytz
Copy link
Member

lrytz commented Mar 6, 2025

Scala collections use a serialization proxy, which can leak during deserialization of a cyclic object graph.

Utility:

object SD {
  import java.io._, scala.util.chaining._
  def serialize(obj: AnyRef) = new ByteArrayOutputStream().tap(b => new ObjectOutputStream(b).writeObject(obj)).toByteArray
  def deserialize(a: Array[Byte]) = new ObjectInputStream(new ByteArrayInputStream(a)).readObject()
  def serializeDeserialize[T <: AnyRef](obj: T) = deserialize(serialize(obj)).asInstanceOf[T]
}

Test code:

  @Test def coll(): Unit = {
    val b = ListBuffer[AnyRef]()
    val bar = new Bar(b)
    b += bar
    SD.serializeDeserialize(b)
  }

This fails with

java.lang.ClassCastException:
cannot assign instance of scala.collection.generic.DefaultSerializationProxy
to field scala.collection.mutable.Bar.c of type scala.collection.mutable.Iterable
in instance of scala.collection.mutable.Bar

A stand-alone reproducer:

class A(var b: B) extends Serializable {
  def writeReplace: AnyRef = new AProxy(this.b)
}

class AProxy(val b: B) extends Serializable {
  def readResolve: AnyRef = new A(b)
}

class B(val a: A) extends Serializable

Test code:

  @Test def repr(): Unit = {
    val a = new A(null)
    val b = new B(a)
    a.b = b
    SD.serializeDeserialize(a)
  }

The readResolve method is only invoked once the AProxy instance is fully deserialized. During deserialization, references to this a object resolve to the proxy.

@retronym points out that this is documented, last paragraph in https://docs.oracle.com/javase/8/docs/platform/serialization/spec/input.html#a5903

The readResolve method is not invoked on the object until the object is fully constructed, so any references to this object in its object graph will not be updated to the new object nominated by readResolve. [...] if the reference types [...] are not compatible, the construction of the object graph will raise a ClassCastException.

Links


The same behavior can be triggered with Java collections (agian @retronym's example), just that the use of a serialization proxy is less widespread in Java collections.

  @Test def jcoll(): Unit = {
    import java.util.{ArrayList => JAL}
    import java.util.{List => JL}
    val c1 = new JAL[JL[_]]()
    val c2 = JL.of(c1)
    c1.add(c2)
    val c2c = SD.serializeDeserialize(c2)
    c2c.get(0).get(0).size() // ClassCastException: class java.util.CollSer cannot be cast to class java.util.List
  }
@retronym
Copy link
Member

retronym commented Mar 6, 2025

Noting that this was present in 2.12 the cicularly-referred to collection was one of the ones that used serialization proxies (immutable.List notably). Scala 2.13 uses them pervasively so is more exposed to the issue.

A workaround may be indirect the reference through a wrapper that does not use the serialization proxy pattern.

@lrytz
Copy link
Member Author

lrytz commented Mar 7, 2025

The JDK could probably make it work for classes that use the default serialization. JDK deserialization uses Unsafe.putObject(obj, fieldOffset, value). That call can be delayed if value has a readResolve method. Once the readResolve is actually called, the resulting object can be stored in the field. I did an experiment with ByteBuddy: https://github.com/lrytz/scala/tree/t13092.

But the issue is with classes that implement their own writeObject / readObject, like our DefaultSerializationProxy. The readObject method does

      while(count < k) {
        builder += in.readObject().asInstanceOf[A]
        count += 1
      }

where ObjectInputStream.readObject can return a proxy in case of cycles. Example with only collections:

    val b1 = ListBuffer[ListBuffer[AnyRef]]()
    val b2 = ListBuffer[AnyRef](b1)
    b1 += b2
    val b1c = SD.serializeDeserialize(b1)
    println(b1c.head.head.getClass) // DefaultSerializationProxy

I also saw that there are writeUnshared / readUnshared methods in ObjectOutputStream / ObjectInputStream. But I don't see how that would help, duplicating the proxies would lead to separate collection instances on deserialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants