Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow performance for Deserialization of a large Bundle #2800

Closed
KLazarov opened this issue Jun 19, 2024 · 2 comments
Closed

Very slow performance for Deserialization of a large Bundle #2800

KLazarov opened this issue Jun 19, 2024 · 2 comments

Comments

@KLazarov
Copy link

KLazarov commented Jun 19, 2024

Describe the bug
When I deserialize a large (1-4MB) Bundle resource the performance goes down 10-20x times. A 4MB Bundle can take up to 400ms to deserialize.

If I instead used dynamic object instead of Bundle, the performance is closer to 11ms for the same 4MB Bundle.

To Reproduce
Steps to reproduce the behavior:

  1. Load a 1+ MB json file, for example the json definitions as they are pretty big Fhir R4 definitions. I used the valuesets.json which is around 9.3 million characters.
  2. Deserialize it multiple times using Bundle -> get 400ms or more per object
  3. Deserialize it multiple times using object -> get around 11ms per object

Expected behavior
The deserializer should not lose 95%+ of it's performance just because I have set the type.

Code

The code also prewarms the Bundle deserialization as otherwise it can go up to 1.5 seconds. All tests were done on a AMD 5900x CPU, 64GB 3600MHz RAM, and Windows 11. The file is loaded only once in memory.

using System.Text.Json;
using Hl7.Fhir.Model;
using Hl7.Fhir.Serialization;

class Program
{
    public static void FhirR4SerializerTest(int runs, string filename)
    {
        Console.WriteLine("Running C# FHIR SDK benchmark!");

        string source = File.ReadAllText(filename);
        var parser = new FhirJsonParser();
        for (int i = 0; i < 20; i++)
        {
            Bundle bundle = parser.Parse<Bundle>(source);
        }

        DateTime dateTimeStart = DateTime.Now;

        for (int i = 0; i < runs; i++)
        {
            Bundle bundle = parser.Parse<Bundle>(source);
        }
        DateTime dateTimeFinish = DateTime.Now;
        var totalTimeTake = dateTimeFinish - dateTimeStart;
        Console.WriteLine($"Fhir SDK miliseconds taken: {totalTimeTake.TotalMilliseconds}");
    }

    public static void FhirR4SerializerTestv2(int runs, string filename)
    {
        Console.WriteLine("Running C# FHIR SDK system.text benchmark!");

        string source = File.ReadAllText(filename);
        var parser = new FhirJsonParser();
        var options = new JsonSerializerOptions().ForFhir(ModelInfo.ModelInspector);
        for (int i = 0; i < 20; i++)
        {
            Bundle bundle = JsonSerializer.Deserialize<Bundle>(source, options);
        }

        DateTime dateTimeStart = DateTime.Now;

        for (int i = 0; i < runs; i++)
        {
            Bundle bundle = JsonSerializer.Deserialize<Bundle>(source, options);
        }
        DateTime dateTimeFinish = DateTime.Now;
        var totalTimeTake = dateTimeFinish - dateTimeStart;
        Console.WriteLine($"Fhir SDK system.text miliseconds taken: {totalTimeTake.TotalMilliseconds}");
    }

    public static void FhirObjectSerializerTest(int runs, string filename)
    {
        Console.WriteLine("Running C# FHIR object benchmark!");

        string source = File.ReadAllText(filename);
        for (int i = 0; i < 20; i++)
        {
            object bundle = JsonSerializer.Deserialize<object>(source);
        }

        DateTime dateTimeStart = DateTime.Now;

        for (int i = 0; i < runs; i++)
        {
            object bundle = JsonSerializer.Deserialize<object>(source);
        }
        DateTime dateTimeFinish = DateTime.Now;
        var totalTimeTake = dateTimeFinish - dateTimeStart;
        Console.WriteLine($"Fhir object miliseconds taken: {totalTimeTake.TotalMilliseconds}");
    }
    static void Main(string[] args)
    {
        Console.WriteLine("Running C# benchmark!");

        int runs = 100;
        string filename = "bundle_file.json"; // Replace with the name of your file

        FhirR4SerializerTest(runs, filename);
        FhirObjectSerializerTest(runs, filename);
        FhirR4SerializerTestv2(runs, filename);
    }
}

Results

With custom Bundle that we use at work:

PS C:\Users\user\net-test> dotnet run -c Release
Running C# benchmark!
Running C# FHIR SDK benchmark!
Fhir SDK miliseconds taken: 45401,7309
Running C# FHIR object benchmark!
Fhir object miliseconds taken: 1280,7618
Running C# FHIR SDK system.text benchmark!
Fhir SDK system.text miliseconds taken: 28314,5444

With valuesets.json:

PS C:\Users\user\net-test> dotnet run -c Release
Running C# benchmark!
Running C# FHIR SDK benchmark!
Fhir SDK miliseconds taken: 38795,9362
Running C# FHIR object benchmark!
Fhir object miliseconds taken: 2475,0446
Running C# FHIR SDK system.text benchmark!
Fhir SDK system.text miliseconds taken: 29868,7799

Version used:

  • FHIR Version: only R4 tested
  • Version: Hl7.Fhir.R4 5.8.1
@mmsmits
Copy link
Member

mmsmits commented Jun 25, 2024

@KLazarov

Thanks for reaching out.
The tests you've attached show indeed a big difference in performance, however FhirObjectSerializerTest doesn't only differ in the fact that you've set a type compared to FhirR4SerializerTestv2.

In the FhirObjectSerializerTest you do not use options (new JsonSerializerOptions().ForFhir(ModelInfo.ModelInspector)), which contains all the FHIR logic of the Deserializer (all the parsing from string to DateTime, Int, Bool, etc. But also all the FHIR extension logic, validation, and other checks. This takes time. So it's not a fair comparison.

The way you test now, you just get a dictionary returned that corresponds to the Json file, it doesn't parse to our FHIR Poco models.

You can still improve the time of deserialization by turning off the validator in the settings (set it to null) if you want, but it will never come close to parsing json to a dictionary without any FHIR logic.

If you have concrete ideas on how to speed up serializer further, please let us know.

@mmsmits
Copy link
Member

mmsmits commented Jul 4, 2024

Maybe we can utilize System.Text.Json more in serializers to improve the speed.
However, this currently doesn't have our top priority.

If you would like to discuss this, re-open the issue.
We can do some design discussion and you can create a PR afterwards.

@mmsmits mmsmits closed this as not planned Won't fix, can't repro, duplicate, stale Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants