Skip to content

Allowing extract_array calls to use pre-indexed grid information? #91

Open
@LTLA

Description

@LTLA

The recent conversation in theislab/zellkonverter#34 reminded me of some work I did in tatami-inc/beachmat#20. Briefly, the idea was to speed up row-based block processing of dgCMatrix by performing a single pass over the non-zero elements beforehand to identify the start and end of each row block in each column. This avoids the need for costly per-column binary searches when each row block is extracted in the usual way, and gives a ~10-fold speed-up in row-based processing of dgCMatrixes.

Now I'm wondering whether this approach can be generalized somehow so that other DelayedArray backends can benefit. Perhaps functions like rowAutoGrid() can decorate the grid object with extra information that allows extract_array to efficiently obtain the necessary bits and pieces, if a suitable object like a dgCMatrix is passed?

Happy to give this - or other ideas - a crack with a PR if there is some interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions