Description
I've been playing around with the CompressedList
subclasses for representing some complex data types and I've really come to like it. I've been thinking of ways to make it more generally usable by both end-users and other developers, and I've got a few wish-list elements:
Access to unlistData
and partitioning
. End-users would then be able to execute arbitrary unary operations on the underlying data while preserving the partitioning, like:
A <- DataFrame(X=LETTERS, Y=runif(26))
comp.list <- split(A,A$X)
# Attempt fails, for obvious reasons.
comp.list$Y <- log(comp.list$Y)
# Assuming we had a unlistData() method:
unlistData(comp.list)$Y <- log(unlistData(comp.list)$Y)
unlistData<-
could even be unlist<-
, if one were willing to introduce that concept. I don't mind if partitioning
is getter-only; this would still be very useful for downstream functions that need to be list-aware yet don't want to create an intermediate list
for efficiency purposes.
Non-virtual CompressedList
class. I don't understand the motivation for making CompressedList
virtual. From a representation perspective, a general concrete class would be useful if we could store any vector-like entity in unlistData
. In fact, I ran into the case where I wanted to store a CompressedCharacterList
as unlistData
, effectively making a CompressedCompressedCharacterListList
! I don't expect to be able to call many methods on this thing - other than the proposed unlistData
and partitioning
, and maybe unlist
- I just want to use it for storage without needing to write an explicit subclass. A general CompressedList
class would serve this purpose, and is better than the alternative of falling back to a SimpleList
(which takes a noticeable time to generate).
A more careful unlist
. If we do allow a general CompressedList
class, the unlist
method should probably take heed of recursive=TRUE
and apply unlist
on the unlistData
slot.
I'm happy to chip in with a PR if these sound like good ideas.