Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heavy transitive dependencies #47

Closed
jhpoelen opened this issue Sep 28, 2018 · 5 comments
Closed

heavy transitive dependencies #47

jhpoelen opened this issue Sep 28, 2018 · 5 comments

Comments

@jhpoelen
Copy link

hey y'all -
Thanks for making dwca-io . . . worked pretty good for me so far!
I did notice that the Elton standalone jar significantly grew after adding the dwca-io. After some digger, I found some heavy dependencies like org.apache.poi etc. and found myself excluding dependencies to reduce the jar by 10s of MBs . I am sure that there's a good reason for including the dependencies, so I won't be offended if they stay were they are. Just wanted to share my findings.

Here's the exclusions I am using now:

<dependency>
            <groupId>org.gbif</groupId>
            <artifactId>dwca-io</artifactId>
            <version>2.2</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.poi</groupId>
                    <artifactId>poi</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.poi</groupId>
                    <artifactId>poi-ooxml</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.odftoolkit</groupId>
                    <artifactId>simple-odf</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.googlecode.owasp-java-html-sanitizer</groupId>
                    <artifactId>owasp-java-html-sanitizer</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.freemarker</groupId>
                    <artifactId>freemarker</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

Hope this helps and curious to hear thoughts.

@cgendreau
Copy link
Contributor

It was also discussed in Issue #39

@timrobertson100
Copy link
Member

Thanks @jhpoelen - firstly, at a glance those exclusions looks sensible (assuming you want read only, and no open data formats / excel support)

The dwc-io could definitely benefit from a fairly refactor to ensure it has a tiny footprint, rigorous selection of dependencies, vendoring all dependencies to avoid user pain and with a separate module for format transformations (e.g excel, open data formats etc).

With that said though, it's used everywhere and like you have discovered developers have generally found reasonable workarounds without too much effort. I suspect this is why it has not been acted upon quickly.

If you do find blocking issue that you can't easily get around please do let us know. If you have time and motivation to work on this we also welcome collaboration (on this I suggest a proposal for change needs documented and discussed first as it'll affect several products).

@jhpoelen
Copy link
Author

Thanks for your replies @timrobertson100 and @cgendreau .

Here's my proposal :

  1. keep dwca-io module to be kind of existing users that either need or are not too concerned about adding several 10's of MB of dependencies.
  2. make dwca-io module dependent on new modules like dwca-io-core, dwca-io-simple, dwca-excel, etc.
  3. dwca-io-core would include only the stuff needed to read from vanilla dwca archives

As far as doing the actual work, I'd be open to hearing your ideas.

@MattBlissett
Copy link
Member

I've moved the spreadsheet handling stuff out of gbif-common and into gbif-common-spreadsheet (released). I've removed BeanHtmlSanitizer and its three library dependencies, since I can't see anywhere it's used (committed, not released).

We have both Guava and Apache Commons utils, but I think we use non-basic bits of both (e.g. HTML entity decoder). Freemarker is used to write meta.xml files. We could rewrite things, but there's no longer obvious gains by splitting

jhpoelen pushed a commit to globalbioticinteractions/globalbioticinteractions that referenced this issue Oct 23, 2018
@jhpoelen
Copy link
Author

@MattBlissett thanks for making this happen!

I've upgraded to v2.3 and noticed the spreadsheet deps have disappeared. Am planning to remove exclusions of com.google.code.findbugs:jsr305, commons-beanutils:commons-beanutils, com.googlecode.owasp-java-html-sanitizer:owasp-java-html-sanitizer in future version given your commits gbif/gbif-common@ea2cf9f and b703caa after the v2.3 / v0.42 releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants