Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: layered store 2022 Q1 #1753

Closed
11 of 15 tasks
tomjridge opened this issue Feb 4, 2022 · 1 comment
Closed
11 of 15 tasks

Tracking: layered store 2022 Q1 #1753

tomjridge opened this issue Feb 4, 2022 · 1 comment
Labels
area/backend About the backend feature/layered-store Related to the Layered Store

Comments

@tomjridge
Copy link
Contributor

tomjridge commented Feb 4, 2022

This is an issue to track the new layered store implementation.

The current branch: https://github.com/tomjridge/irmin/tree/2022-04-22_layers_rebased_on_3.2.0

Older branches:

A recent tezos branch, with additional code to trigger gc every so often, is here: https://github.com/tomjridge/tezos/tree/2022-03-14_layers

Victor's branch, to integrate layers into Tezos properly, is here: https://gitlab.com/nomadic-labs/tezos/-/tree/vicall@tomjridge@layered_store

Todo (additional entries to be added when discovered):

  • Add clear documentation for IO.Unix interface used by pack_store.ml, so it is possible to work out what the semantics is
  • Implement external sorting and other external routines via mmaps
    • sorting
    • extent calculation
  • Port/rework prototype code from https://github.com/tomjridge/sparse-file/tree/master/src into a subdirectory under irmin-pack
  • Change the store pack file to use a control+objstore+suffix ("layers") rather than a plain file
    • X Identify the exact interface used by the pack_store
    • X Determine how to implement this interface on top of the layers
    • Implement a replacement IO, suitable for layers
  • Implement the missing part of the worker: the calculation of reachable objects from a commit
  • Implement a simple mechanism to trigger GC from a given commit
  • Proper integration with irmin APIs
    • X how to trigger GC
    • how to properly compute reachability from a commit (still needs looking at - want to avoid use of create_reach.exe)
  • Test, for example, by replaying some existing trace and periodically performing GC on a recent commit
    • X Get trace replay with GC every n commits working ; this is working
    • X Get tezos node bootstrapping with GC working
    • X Get baking node working, with RO irmin instances
    • X Test restart behaviour, when killing a process in the middle of bootstrapping (for instance); TJR: I tested this quite a bit, and things seemed ok; still likely there are errors, if we kill a process at an inopportune time; could do with more testing
  • Bug fixing (at 2022-04-21)
    • X RO implementation needs finishing
    • Unbounded memory usage when using layers, compared to main; TJR: after finishing RO impl, cannot reproduce this error
    • After stopping a node, restart attempts to read from gap; likely this is caused by some startup behaviour of a tezos-node e.g. it attempts to access an "old" commit, or the parent of the current GC commit; TJR: after finishing RO impl, cannot reproduce this error
  • Benchmarking; perhaps refinement of the code (eg calculation of reachable objects)
  • Proper testing and performance measuring for Tezos use case - they want to GC every cycle, but only keep the last 6 cycles; how does this affect timings for Repo.iter? What is the impact on IO? Also, what is the space overhead? (presumably we need an extra 6 cycles worth of storage if we are GC'ing from 6 cycles ago - this will be copied to the next suffix file; and on top of this we have the sparse file overhead for live objects from the commit, 3GB currently)
  • "Hardening" pass, where all the FIXMEs are addressed, corner cases fixed, etc.
  • Merging into main irmin repo
@maiste maiste added feature/layered-store Related to the Layered Store area/backend About the backend labels Mar 18, 2022
@tomjridge tomjridge changed the title Tracking: layered store 2022 Tracking: layered store 2022 Q1 Apr 28, 2022
@tomjridge
Copy link
Contributor Author

Closing in favour of #1824

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend About the backend feature/layered-store Related to the Layered Store
Projects
None yet
Development

No branches or pull requests

2 participants