-
-
Notifications
You must be signed in to change notification settings - Fork 43
ZIM Fuse Filesystem #400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ZIM Fuse Filesystem #400
Conversation
setting up the project
Implemented a tree (trie-like) where a path is broken down per directories.
For example,
my/new/path1.jpg
my/new/path2.jpg
are represented as
/ (root)
my
new
path1.jpg
path2.jpg
Implemented some common FUSE functions. We can browse the ZIM file using ls, cd, etc. and open files in text editors or using commands such as "cat" now.
Updated the README to add information about zimfuse
|
I guess this would be an implementation of kiwix/overview#79 ? |
mappedNodes owns the nodes
Some entries had long filenames, this change reduces that to 100 characters which is will within the allowed limit. This introduces possible name collisions which will be fixed in the following commit
now if two files have the same name, the 2nd file is changed to "originalFileName(1)"
When reading ZIMs with a lot of files (for example: mediawiki_en_all), getting their filesize took a lot of time. There were 2 choices: 1. Read filesize while creating the tree - this gave fast subsequent responses (on commands such as ls) but the fuse initialization took a good amount of time. 2. Find filesize when requested by a command (and save it for future requests) - This provides fast initialization but the first read takes time (though, not as much as getting all filesizes - only the requested ones). They are later saved for subsequent requests. I went with option 2.
Yes. |
|
@juuz0 Thanks but most of the CI is broken? Can you fix it? |
|
CI fails because 'fuse3' is not available on the system...maybe something to add in kiwix-build later? |
mgautierfr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @juuz0.
But I have few general comments better addressed on the full PR than each commit:
- This kind of tree is a perfect use case for OOP and inheritance.
A basicNodewithname,parent. Then aDirnode (inheritingNode) withchildrenand aLeafnode. - You define a
Node::Ptrasunique_ptr, but theNodetree is composed ofNode*.Node::Ptris only used inTree::mappedNodes. This makeTreethe direct owner of all the nodes andNodesonly making "reference" to other nodes. It would make more sense to haveDirnodes the owners of their children. - The
Tree::mappedNodesis kind of global cache. But I'm not sure we need it. Either "directories" contains few entries and loop in thevectorshould be enough or not, and in this case, it would be better to transformNode::childreninto amap. We would have the O(1) access at node level and top of ensuring us the unity of children names. - The
Tree::statCachecould be removed by simply move the cachedstruct statin theNodeitself. - Then the
Treecould probably be removed as it is simply aDirnode without parent. - The
collisionCountis local to each node but "global" to all children. This means that four twinsfoo(x2) andbar(x2), you will find the sanitized namesfoo,foo(1),barandbar(2). It should bebar(1). - Redirects are not properly handle. You resolve the redirection and return the content of the target but it may break relative links in html. You should treat redirect as symlink (and so implement
readlink)
On top of that, I wonder about the memory usage of the tree.
wikipedia_fr_all_maxi contains 7029908 entries (leafs). And size_of(Node) is 144 (without counting the actual data store in it (names/fullPath/originalPath bytes, children ptr). So it means that we need at least 965MiB for the nodes only. Probably more that 2GiB if we add the path and children ptr.
We can reduce that by carefully define Node structure and what we store but at the end, we will always use a lot of data.
(We may simply don't care and tell user that mounting zim files need a lot of memory, at least for now)
| docopt_dep = dependency('docopt', static:static_linkage) | ||
|
|
||
| with_writer = host_machine.system() != 'windows' | ||
| with_writer_and_mount = host_machine.system() != 'windows' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to have a on_windows, or split this in two variables with_writer and with_mount.
We should not link writer and mount compilation together (at least not explicitly in one variable)
meson.build
Outdated
| with_writer = host_machine.system() != 'windows' | ||
| with_writer_and_mount = host_machine.system() != 'windows' | ||
|
|
||
| if with_writer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be renamed also.
Fixes kiwix/overview#79
Mount any ZIM file to your filesystem
Usage:
zimfuse zimFile.zim mountDirmountDir should exist before using zimfuse.