Skip to content

Making the CompressedList more widely usable #11

Open
@LTLA

Description

@LTLA

I've been playing around with the CompressedList subclasses for representing some complex data types and I've really come to like it. I've been thinking of ways to make it more generally usable by both end-users and other developers, and I've got a few wish-list elements:

Access to unlistData and partitioning. End-users would then be able to execute arbitrary unary operations on the underlying data while preserving the partitioning, like:

A <- DataFrame(X=LETTERS, Y=runif(26))
comp.list <- split(A,A$X)

# Attempt fails, for obvious reasons.
comp.list$Y <- log(comp.list$Y)

# Assuming we had a unlistData() method:
unlistData(comp.list)$Y <- log(unlistData(comp.list)$Y)

unlistData<- could even be unlist<-, if one were willing to introduce that concept. I don't mind if partitioning is getter-only; this would still be very useful for downstream functions that need to be list-aware yet don't want to create an intermediate list for efficiency purposes.

Non-virtual CompressedList class. I don't understand the motivation for making CompressedList virtual. From a representation perspective, a general concrete class would be useful if we could store any vector-like entity in unlistData. In fact, I ran into the case where I wanted to store a CompressedCharacterList as unlistData, effectively making a CompressedCompressedCharacterListList! I don't expect to be able to call many methods on this thing - other than the proposed unlistData and partitioning, and maybe unlist - I just want to use it for storage without needing to write an explicit subclass. A general CompressedList class would serve this purpose, and is better than the alternative of falling back to a SimpleList (which takes a noticeable time to generate).

A more careful unlist. If we do allow a general CompressedList class, the unlist method should probably take heed of recursive=TRUE and apply unlist on the unlistData slot.

I'm happy to chip in with a PR if these sound like good ideas.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions