Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for streaming API. #47

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Add support for streaming API. #47

wants to merge 3 commits into from

Conversation

bagl
Copy link

@bagl bagl commented Apr 18, 2016

Issue #46.

One function for dumping data into XLSX file using streaming and one macro for doing the same, but explicitly by row using append-row! function (define in the scope of the macro body).

I do not know if append-row! scoped to macro body is a good idea...

@bagl
Copy link
Author

bagl commented Apr 19, 2016

Alternative withou using macro (just pass a function in that takes function to append row as its argument)

(defn with-streaming-workbook-fn!
  [file-name sheet-name body-fn]
  (let [wb (SXSSFWorkbook. 1)
        ^Sheet sh (add-sheet! wb sheet-name)
        row-num (atom 0)
        append-row!
        (fn [row-data]
          (let [r (.createRow sh @row-num)]
            (swap! row-num inc)
            (doseq [[j value] (map-indexed #(list %1 %2) row-data)]
              (set-cell! (.createCell r j) value))))]
    (try
      (body-fn append-row!)
      (save-workbook-into-file! file-name wb)
      (finally
        (.dispose wb)))))

There are no name collisions then...

@mjul
Copy link
Owner

mjul commented Jun 8, 2016

Thanks for contributing. It solves the immediate problem for writing data.

However, it does not play well with the sheet styling code. I think the styling code is at fault as it is imperative in nature and not so elegant.

I wonder if you have thought about this and we could find a design that would enable the library to work (to the extent possible) with both in-memory and streaming spreadsheets.

There is an old idea for a V2 design with a data structure that can be piped (thrushed) between the Docjure functions instead of passing the underlying POI objects. This would enable us to use it as a builder (lazily, even) which would be good for streaming and also allow to make the styling and formatting code more elegant as it could have a notion of context ("make the last row bold", "add another sheet at the end").

It would probably involve multi-methods so that the functions can take advantage of the context. For example we can track the selected sheet so that rows added will be added to the last one selected.

With multiple sheets in play it might even be able to jump between them in data processing code as we would build up the structures lazily and still be able to write them sequentially if needed the streaming API.

Please share your thoughts.

@kennethkalmer
Copy link

Thanks for this PR, I just tested it by placing the new functions in an isolated namespace and got my fairly small workbooks generated using vanilla lein (-Xmx1g). Workbooks in question are 18MB (~53000 rows, many columns) and 5.9MB (47000 rows, still many columns).

@ghost
Copy link

ghost commented Aug 30, 2017

@mjul Is there something preventing this merge?
Metabase is depending on docjure to export their tables to excel. And this functionality is currently experiencing serieus memory issues (metabase/metabase#5187).

@mjul
Copy link
Owner

mjul commented Aug 30, 2017 via email

@mjul
Copy link
Owner

mjul commented Sep 14, 2017

OK, I looked at it again.
My main concern is that it is not really integrated with the other library functions (it allows you to write a single-sheet workbook). However, If you add some test cases to show that it supports the regular cell data types I will add it as is to the version 1 branch, then we can work out a better API for V2 that allows it to play better with the rest of the library.

@thucnc
Copy link

thucnc commented Sep 25, 2017

@bagl and @milanvdmria, is there any way to move forward, as this issue still blocks Metabase download feature ? Please help, thank you.

@ghost
Copy link

ghost commented Sep 25, 2017

@thucnc
To be honest, we do not have any in-company knowledge on Clojure and do not have the bandwidth to work on this currently.

@paoliniluis
Copy link

Hi @mjul, Luis here from the Metabase team. We use your library to generate XLSX docs and this specific feature can make a big difference on the performance of our application. I believe this is incomplete and that's why it hasn't been merged. Is that correct?

@mjul
Copy link
Owner

mjul commented Jul 24, 2024

@paoliniluis yes, as mentioned above it lacks test cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants