1.1.0 -- Iterate!
💫 Enhancements and new features
-
A new paradigm for subprocess execution is introduced. The main
workhorse isdatalad_next.runners.iter_subproc
. This is a
context manager that feeds input to subprocesses via iterables,
and also exposes their output as an iterable. The implementation
is based on https://github.com/uktrade/iterable-subprocess, and
a copy of it is now included in the sources. It has been modified
to work homogeneously on the Windows platform too.
This new implementation is leaner and more performant. Benchmarks
suggest that the execution of multi-step pipe connections of Git
and git-annex commands is within 5% of the runtime of their direct
shell-execution equivalent (outside Python).
See #538 (by @mih),
#547 (by @mih).With this change a number of additional features have been added,
and internal improvements have been made. For example, any
use ofThreadedRunner
has been discontinued. See
#539 (by @christian-monch),
#545 (by @christian-monch),
#550 (by @christian-monch),
#573 (by @christian-monch)-
A new
itertools
module was added. It provides implementations
of iterators that can be used in conjunction withiter_subproc
for standard tasks. This includes the itemization of output
(e.g., line-by-line) across chunks of bytes read from a process
(itemize
), output decoding (decode_bytes
), JSON-loading
(json_load
), and helpers to construct more complex data flows
(route_out
,route_in
). -
The
more_itertools
package has been added as a new dependency.
It is used fordatalad-next
iterator implementations, but is also
ideal for client code that employed this new functionality. -
A new
iter_annexworktree()
provides the analog ofiter_gitworktree()
for git-annex repositories. -
iter_gitworktree()
has been reimplemented arounditer_subproc
. The
performance is substantially improved. -
iter_gitworktree()
now also provides file pointers to
symlinked content. Fixes #553
via #555 (by @mih) -
iter_gitworktree()
anditer_annexworktree()
now support single
directory (i.e., non-recursive) reporting too.
See #552 -
A new
iter_gittree()
that wrapsgit ls-tree
for iterating over
the content of a Git tree-ish.
#580 (by @mih). -
A new
iter_gitdiff()
wrapsgit diff-tree|files
and provides a flexible
basis for iteration over changesets.
-
-
PathBasedItem
, a dataclass that is the bases for many item types yielded
by iterators now more strictly separatesname
property from path semantics.
The name is a plain string, and an additional, explicitpath
property
provides it in the form of aPath
. This simplifies code (the
_ZipFileDirPath
utility class became obsolete and was removed), and
improve performance.
Fixes #554 and
#581 via
#583 (by @mih) -
A collection of helpers for running Git command has been added at
datalad_next.runners.git
. Direct uses of datalad-core runners,
orsubprocess.run()
for this purpose have been replaced with call
to these utilities.
#585 (by @mih) -
The performance of
iter_gitworktree()
has been improved by about
10%. Fixes #540
via #544 (by @mih). -
New
EnsureHashAlgorithm
constraint to automatically expose
and verify algorithm labels fromhashlib.algorithms_guaranteed
Fixes #346 via
#492 (by @mslw @adswa) -
The
archivist
remote now supports archive type detection
from*E
-type annex keys for.tgz
archives too.
Fixes #517 via
#518 (by @mih) -
iter_zip()
uses a dedicated, internalPurePath
variant to report on
directories (_ZipFileDirPath
). This enables more straightforward
item.name in zip_archive
tests, which require a trailing/
for
directory-type archive members.
#430 (by @christian-monch) -
A new
ZipArchiveOperations
class added support for ZIP files, and enables
their use together with thearchivist
git-annex special remote.
#578 (by @christian-monch) -
datalad ls-file-collection
has learned additional collections types:-
The new
zipfile
collection type that enables uniform reporting on
the additional archive type. -
The new
annexworktree
collection that enhances thegitworktree
collection by also reporting on annexed content, using the new
iter_annexworktree()
implementation. It is about 15% faster than a
datalad --annex basic --untracked no -e no -t eval
. -
The new
gittree
collection for listing any Git tree-ish. -
A new
iter_gitstatus()
can replace the functionality of
GitRepo.diffstatus()
with a substantially faster implementation.
It also provides a novelmono
recursion mode that completely
hides the notion of submodules and presents deeply nested
hierarchies of datasets as a single "monorepo".
#592 (by @mih)
-
-
A new
next-status
command provides a substantially faster
alternative to the datalad-corestatus
command. It is closely
aligned togit status
semantics, only reports changes (not repository
listings), and supports type change detection. Moreover, it exposes
the "monorepo" recursion mode, and single-directory reporting options
ofiter_gitstatus()
. It is the first command to usedataclass
instances as result types, rather than the traditional dictionaries. -
SshUrlOperations
now supports non-standard SSH ports, non-default
user names, and custom identity file specifications.
Fixed #571 via
#570 (by @mih) -
A new
EnsureRemoteName
constraint improves the parameter validation
ofcreate-sibling-webdav
. Moreover, the command has been uplifted
to support uniform parameter validation also for the Python API.
Missing required remotes, or naming conflicts are now detected and
reported immediately before the actual command implementation runs.
Fixes #193 via
#577 (by @mih) -
datalad_next.repo_utils
provide a collection of implementations
for common operations on Git repositories. Unlike the datalad-core
Repo
classes, these implementations do no require a specific
data structure or object type beyond aPath
.
🐛 Bug Fixes
-
Add patch to fix
update
's target detection for adjusted mode datasets
that can crash under some circumstances.
See datalad/datalad#7507, fixed via
#509 (by @mih) -
Comparison with
is
and a literal was replaced with a proper construct.
While having no functional impact, it removes an uglySyntaxWarning
.
Fixed #526 via
#527 (by @mih)
📝 Documentation
- The API documentation has been substantially extended. More already
documented API components are now actually renderer, and more documentation
has been written.
🏠 Internal
-
Type annotations have been extended. The development workflows now inform
about type annotation issues for each proposed change. -
Constants have been migrated to
datalad_next.consts
.
#575 (by @mih)
🛡 Tests
-
A new test verifies compatibility with HTTP serves that do not report
download progress.
#369 (by @christian-monch) -
The overall noise-level in the test battery output has been reduced
substantially. INFO log messages are no longer shown, and command result
rendering is largely suppressed. New test fixtures make it easier
to maintain tidier output:reduce_logging
,no_result_rendering
.
The contribution guide has been adjusted encourage their use. -
Tests that require an unprivileged system account to run are now skipped
when executed as root. This fixes an issue of the Debian package.
#593 (by @adswa)
Full Changelog: 1.0.2...1.1.0