Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParquetWriter&Compression #298

Open
S345T opened this issue Sep 20, 2023 · 1 comment
Open

ParquetWriter&Compression #298

S345T opened this issue Sep 20, 2023 · 1 comment

Comments

@S345T
Copy link

S345T commented Sep 20, 2023

Hi,

We're trying to convert JSON to Parquet with compression for one of our requirements. We found ChoETL to be very useful. We have a question regarding CompressionMethod. We took the Sample52.json message from the repo as an example to see if it suffices our requirement. The compression method we're looking at is Gzip.

What we found out, when we completely took off the CompressionMethod in the ParquetWriter, it was around 5.7 MB. But, with CompressionMethod, it was around 6.9 MB.

We tried adding a compression level too.

With a value of 8 as Compression Level, it was around 6.6 MB. Understand it's .3 MB less but, was looking far less than that when the message got compressed.

Just wondering if we're using the component the way it should be used or, if it's the best it can offer as it stands.

Another thing we didn't understand, without CompressionMethod, the size was less.

using (var r = ChoJSONReader.LoadText(requestBody)
.UseJsonSerialization()
.JsonSerializationSettings(s => s.DateParseHandling = DateParseHandling.None)
.JsonSerializationSettings(s => s.NullValueHandling = NullValueHandling.Include)
)
{
using var parquetStream = new MemoryStream();
{
using (var w = new ChoParquetWriter(parquetStream)
.Configure(c => c.CompressionMethod = Parquet.CompressionMethod.Gzip)
.ThrowAndStopOnMissingField(false)
)
{
w.Write(r);
}
}

Thanks in Advance.

@Cinchoo
Copy link
Owner

Cinchoo commented Sep 21, 2023

latest release https://www.nuget.org/packages/ChoETL.Parquet/1.0.1.30 offers more compression algo.

try and let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants