Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pushstream support #8

Open
misterspeedy opened this issue Mar 7, 2022 · 8 comments
Open

Pushstream support #8

misterspeedy opened this issue Mar 7, 2022 · 8 comments
Labels
enhancement New feature or request

Comments

@misterspeedy
Copy link
Owner

Query from Natalie Perret:

is there a support for PushStream or Seq?
(i.e. low memory footprint)

https://twitter.com/natalie_perret/status/1500264966024224768

@natalie-o-perret
Copy link

About the pushtream thinking about that: https://gist.github.com/mrange/fbefd946dba6725a0b727b7d3fd81d6f

Actually could be about any kind of "streamable" sequences:

  • IAsyncEnumeable<T>
  • AsyncSeq<'T>
  • IEnumerable<T> / 'T seq
  • PushStream<'T>

@natalie-o-perret
Copy link

Considering what the function AsWorkBook does: https://github.com/misterspeedy/FsExcel/blob/main/src/FsExcel/FsExcel.fs#L143
i.e. the big bulk is about iterating for ... in ... do over a list of items and match ... with.
So not sure that if there is a huge benefit to iterate over a F# list, specifically.

That being said, ain't too sure 'bout the memory footprint of XLWorkbook when there are thousands of lines or sizable number of tabs.

I'm trying to see if we could use your lib at work, and if it can fit our (business) needs.

@misterspeedy
Copy link
Owner Author

Thanks for the feedback @natalie-o-perret! The DSL obviously uses an F# list for fluency at the worksheet definition stage (Fable.React style) so I guess iterating over it seemed natural. Maybe this could be done in a more sequency way, but the fundamental question is, can ClosedXML (or some alternative) accommodate this so the workbook is produced incrementally. My guess is 'no' as certain metadata, eg. the timestamp, need to be enclosed in the "stream" before it is completed. But I haven't dug into it.

Anyway, please could you let me know what volumes and execution environment you are targeting, and maybe I can do some benchmarking? An actual example of the dataset in question would be ideal though I realize that might be proprietary.

While I await that I'll do some basic volume testing and see what micro-optimizations I can make.

Thanks again for the input!

@misterspeedy
Copy link
Owner Author

A very crude volume test suggest the current code is fine for rendering around a million cells, but struggles with 10 million. In the latter case it takes a few minutes but still completes fine. Next step will be to add some proper BenchmarkDotNet tests.

@natalie-o-perret
Copy link

A very crude volume test suggest the current code is fine for rendering around a million cells, but struggles with 10 million. In the latter case it takes a few minutes but still completes fine. Next step will be to add some proper BenchmarkDotNet tests.

YMMV, depends on the hardware (CPU + Memory).

The OOM issues I've bumped into (with a different .NET Excel library (infamous NPOI)) happend with a few hundreds of thousands of lines in a dockerized app (so no crazy amount of resources).

Btw, I 💯 acknowledge that the way the business (accountants) is handling Excel stuff on our end makes very little sense to me (i.e. why bother with huge Excel files, but there are times there is just so much you can do to convince people who already have their habits don't really wanna die anytime soon, i.e. other priorities).

We will have to run some (integration) tests on next quarter with your lib. I will have a resource rewriting the service we've taken over beginning of this year (also try to re-re-re-convince again the bizfolks to adjust the way they are handling their stuff).

Anyhoo-way, will keep you posted, probably mid-next month, we have others things in the pipeline atm.

@natalie-o-perret
Copy link

es fine. Next step will be to add some proper BenchmarkDotNet tests.

Oh and btw, thanks for your lib and thank you for being careful about the issues have been filed so far.
Don't wanna pass for someone who is ungrateful for what you have been doing 🙇‍♀️

@misterspeedy
Copy link
Owner Author

misterspeedy commented Mar 9, 2022

@natalie-o-perret Thanks for the update and context. Yours sounds like a really great use case, provided at least once constraint is that you are building non-trivial structures which are fiddly to code in mutable style. If the structure is really simple it might be faster (and dare-I-say-it simpler) to use the underlying library directly. Hope FsExcel is a good fit - if it is I'll do my best to make it a great experience for you and your colleagues.

@misterspeedy misterspeedy added the enhancement New feature or request label Apr 25, 2022
@natalie-o-perret
Copy link

Relates to my comment here ~~

#34 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants