-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace BinaryFormatter with MessagePack serialization #1166
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was looking at the dependencies, and the MessagePack library only seems to be compatible with .net standard 2.0. Should we start removing the targeting for .net standard 2.1? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oops. meant to put this comment on the Microsoft.Spark.csproj. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure why the netstandard2.1 tf was there before, looks redundant to me |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,12 +4,12 @@ | |
using System.IO; | ||
using System.Net; | ||
using System.Runtime.Serialization; | ||
using System.Runtime.Serialization.Formatters.Binary; | ||
using System.Threading; | ||
using Microsoft.Spark.Interop; | ||
using Microsoft.Spark.Interop.Ipc; | ||
using Microsoft.Spark.Network; | ||
using Microsoft.Spark.Services; | ||
using MessagePack; | ||
|
||
namespace Microsoft.Spark | ||
{ | ||
|
@@ -20,7 +20,6 @@ namespace Microsoft.Spark | |
/// also attempts to distribute broadcast variables using efficient broadcast algorithms to | ||
/// reduce communication cost. | ||
/// </summary> | ||
[Serializable] | ||
public sealed class Broadcast<T> : IJvmObjectReferenceProvider | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In order to register broadcast var in the registry on it being serialized, we can implement IMessagePackSerializationCallbackReceiver, and rename existing |
||
{ | ||
[NonSerialized] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's replace |
||
|
@@ -223,8 +222,7 @@ private void WriteToFile(object value) | |
/// <param name="stream">Stream to which the object is serialized</param> | ||
private void Dump(object value, Stream stream) | ||
{ | ||
var formatter = new BinaryFormatter(); | ||
formatter.Serialize(stream, value); | ||
MessagePackSerializer.Typeless.Serialize(stream, value); | ||
} | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,11 +6,11 @@ | |
using System.Collections.Generic; | ||
using System.IO; | ||
using System.Linq; | ||
using System.Runtime.Serialization.Formatters.Binary; | ||
using Microsoft.Spark.Interop.Ipc; | ||
using Microsoft.Spark.Sql; | ||
using Microsoft.Spark.Utils; | ||
using static Microsoft.Spark.Utils.CommandSerDe; | ||
using MessagePack; | ||
|
||
namespace Microsoft.Spark.RDD | ||
{ | ||
|
@@ -66,15 +66,13 @@ internal interface IDeserializer | |
} | ||
|
||
/// <summary> | ||
/// Deserializer using the BinaryFormatter. | ||
/// Deserializer using MessagePack. | ||
/// </summary> | ||
private sealed class BinaryDeserializer : IDeserializer | ||
{ | ||
private readonly BinaryFormatter _formater = new BinaryFormatter(); | ||
|
||
public object Deserialize(Stream stream, int length) | ||
{ | ||
return _formater.Deserialize(stream); | ||
return MessagePackSerializer.Typeless.Deserialize(stream); | ||
} | ||
Comment on lines
73
to
76
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a bug in MessagePack I found while tried to reuse these changes, when called from Collector::Collect: Deserializer erases entire stream after the first read. Workaround for it is to either read stream in small chunks and pass to MessagePack, or load entire stream in MemoryStream and then use it. |
||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several more classes that require similar empty ctors, such as ExternalClass or Broadcast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the standalone nuget package for BF is now available, I'm leaning towards scraping the whole PR and moving to that instead, because this is IMO a major limitation that could break any code that uses user/3rd party classes that reference a class without an empty ctor. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since microsoft doesn't add [Serializable] attributes to new types since .NET 6, sticking to NuGet will still not work long-term - As soon as there's a new commonly used type - integration will fall apart.
Imho the best option long-term would be either to investigate MsgPack further (Custom resolver for classes without default ctor and workaround for the bug with empty stream), or, if there's no way to work-around ctor behavior - make serializer configurable and include both options
I created a PR with version bump up for .NET and dependencies, for .NET8 it's still possible to use BinaryFormatter from library, for .NET9 we'll have to migrate to NuGet