Conversation
|
Thanks for initiating this. I kind of fixed the tests in the paper/ directory. Looking at the repo size, it is not that extreme, IMHO. But, OK. The DBtext corpus is 38MB, and the video is 148MB. The video could be moved out to somewhere (where?) and the corpus might be compressed with FSST, reducing it to 15MB. It would be good to keep the paper/dbtext/ corpus as I would want to run tests on it in CI. Possibly using the targets in paper/ (e.g. make experiments) and including some verification that the compression/decompression yields identical files. Also, it would be good to do the same testing with FSST12. |
|
@peterboncz The problem is, because no releases are set up we need to add FSST as a submodule if we want to add it as a dependency. The repository size by itself may not be too bad, but we already have a lot of dependencies, and adding another 200MB dependency is of course not ideal. |
|
OK - in the CWI DA coding doctrine (see e.g. DuckDB) we heavily prefer inlining components, to avoid dependency hell. |
This PR adds a GitHub CI workflow that builds the project.
I can contribute additional CI coverage, such as building the project on other platforms and running tests as well as a release workflow.
You can see here that it passes: https://github.com/louwers/fsst/actions/runs/19102448516
Once merged it will run for all pull requests and pushes to the
masterbranch.