Improve and formalize TensorBoard's Python filesystem abstraction

This is a generic tracking issue for improvements to TensorBoard's Python filesystem abstraction.  (This doesn't impact Rustboard, which today should be the read path for regular users, but there are still users who have to use the Python read path because they can't use Rustboard, and the write path for TensorBoard's summary APIs to write without TensorFlow is still entirely in Python.)

Currently, TensorBoard's Python code relies on the TensorFlow filesystem API via `tf.io.gfile`, which has built-in support for a number of protocols besides local disk.  When TensorFlow is not available, we fall back to a stub implementation of the `tf.io.gfile` API in [`tensorboard/compat/tensorflow_stub/io/gfile.py`](https://github.com/tensorflow/tensorboard/blob/master/tensorboard/compat/tensorflow_stub/io/gfile.py), which today only supports local disk and S3.  The stub implementation achieves some reuse between filesystem implementations by having an intermediate set of methods that both implementations expose, but this abstraction A) isn't even defined other than by virtue of common methods, i.e. there's no abstract base class or spec, and B) has a number of issues.

Here are a few of the issues with that abstraction:

- underspecified behavior in a number of places (e.g. what characters glob supports), which makes it harder to guarantee much in the way of consistency across implementations
- little clarity/documentation about how filenames are treated as bytes vs strings
- no "file" object at all, which is mostly ok for remote filesystems where everything is an RPC anyway, but is significantly worse for filesystems where there actually might be local state between, say, multiple reads or writes to a single file (e.g. local file descriptors, buffers, caches, etc.). Right now we ignore all of that, so it's conceivable that in cases we end up with quadratic behavior where each read at offset N is independent and must first seek past all the previous data.
- no `walk()` method for filesystems, just a function, as pointed out in https://github.com/tensorflow/tensorboard/pull/5248#issuecomment-906790802, so filesystems aren't able to take advantage of fast paths they might support

We already have a goal (bottom of https://github.com/tensorflow/tensorboard/issues/3666) to ultimately migrate away from the stub approach and instead properly inject a filesystem abstraction into places that do I/O.  As part of that, we should consider revisiting this abstraction and formalizing it properly.  One possibility is to just adopt the existing `fsspec` abstraction, since it's a whole project around this idea: https://filesystem-spec.readthedocs.io/en/latest/

If we did that, we'd likely still want to be able to use the TF filesystem APIs when available (so that existing TensorBoard users who also have TensorFlow and thus benefit from built-in support for cloud filesystems don't experience a regression), so one open question would be determining whether it's possible to shoehorn the TF filesystem support into this setup.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve and formalize TensorBoard's Python filesystem abstraction #5286

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Improve and formalize TensorBoard's Python filesystem abstraction #5286

Description

Activity

nfelt commented on Jan 7, 2022

vadimkantorov commented on May 11, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions