Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to implement a realization backend #37

Open
hpages opened this issue Oct 2, 2018 · 0 comments
Open

How to implement a realization backend #37

hpages opened this issue Oct 2, 2018 · 0 comments

Comments

@hpages
Copy link
Contributor

hpages commented Oct 2, 2018

The Implementing A DelayedArray Backend vignette only covers how to implement a backend for read access only. Backends that support saving DelayedArray objects (a.k.a. realization backends) are not covered yet. Reasons for this are various: no demand so far, exact procedure still kind of a work-in-progress and subject to changes, lack of time, etc...

In the meantime, I'm putting some material here (and will move it to the Implementing A DelayedArray Backend vignette as time allows).

Say we want to implement a realization backend for the ADS format (the imaginary format made up for the Implementing A DelayedArray Backend vignette), the 2 core things we need to implement are:

  1. A RealizationSink subclass for the ADS backend. RealizationSink is a virtual class defined in the DelayedArray package (in R/RealizationSink-class.R). By analogy with the HDF5RealizationSink class defined in the HDF5Array package (in R/writeHDF5Array.R), let's assume that the RealizationSink subclass for the ADS backend will be called ADSRealizationSink.

  2. Coercion methods from ADSRealizationSink to ADSArraySeed, ADSArray, and DelayedArray.

RealizationSink subclass

The purpose of an ADSRealizationSink object is to point to a new ADS dataset and allow writing blocks of data to it. The class definition for ADSRealizationSink would typically look something like:

setClass("ADSRealizationSink",
    contains="RealizationSink",
    representation(
        dim="integer",     # Naming this slot "dim" makes dim() work out of the box.
        dimnames="list",
        type="character",  # Single string.
        ## Additional slots would typically include the path or connection to a file....
        ...
    )
)

Then we need a constructor function for these objects. The constructor should be named as the class and its first 3 arguments should be dim, dimnames, and type. It can have more arguments but those are optional and calling ADSRealizationSink() with the first 3 arguments only (i.e. ADSRealizationSink(dim, dimnames, type)) should work. Furthermore, every call to ADSRealizationSink() should create a new dataset that is ready to be written to.

ADSRealizationSink objects must support the following operations (via defining appropriate methods):

  1. dim(), dimnames(), and type(). These should return the values that were passed to the call to ADSRealizationSink() that was used to create the object.
  2. write_block(). This is a generic defined in the DelayedArray package in R/read_block.R.
  3. close(). This base R S3 generic is promoted to S4 generic in the DelayedArray package in R/RealizationSink-class.R. A default method for RealizationSink objects is provided and does nothing (no-op). Implement a method for ADSRealizationSink objects only if some connection needs to be closed and/or other resources need to be released after writing the data is complete and before the ADSRealizationSink object can be turned into an ADSArraySeed object for reading.

Coercion methods from ADSRealizationSink to ADSArraySeed, ADSArray, and DelayedArray

From ADSRealizationSink to ADSArraySeed

Think of an ADSRealizationSink object as a "write" connection to an ADS data set. Think of an ADSArraySeed object as a "read only" connection to an ADS data set. The purpose of the coercion from ADSRealizationSink to ADSArraySeed is to change the nature of this connection from "write" to "read only" and to produce an object that can be wrapped into a DelayedArray object.

From ADSRealizationSink to ADSArray and DelayedArray

setAs("ADSRealizationSink", "ADSArray",
    function(from) DelayedArray(as(from, "ADSArraySeed"))
)

setAs("ADSRealizationSink", "DelayedArray", function(from) as(from, "ADSArray"))

A basic example

Once we have the above (i.e. ADSRealizationSink objects, ADSRealizationSink() constructor, and the 3 coercion methods), we can realize an arbitrary DelayedArray object x as a new pristine ADSArray object x2 by using the simple code below:

realize_as_ADSArray <- function(x)
{
    sink <- ADSRealizationSink(dim(x), dimnames(x), type(x))
    DelayedArray:::BLOCK_write_to_sink(x, sink) 
    close(sink)
    as(sink, "DelayedArray")  # a pristine ADSArray object semantically equivalent to `x`
}

DelayedArray:::BLOCK_write_to_sink() reads blocks from x, realizes them in memory, and writes them to sink (with write_block()).
Note that realize_as_ADSArray() also works on an ordinary array or any array-like object that supports extract_array(), not just on a DelayedArray object.

Add some convenience

Now we can build some convenience on top of this.

One basic convenience is a coercion method from ANY to ADSArray that just does what realize_as_ADSArray() does:

setAs("ANY", "ADSArray", function(from) realize_as_ADSArray(from))

Unfortunately, trying to coerce a DelayedArray or DelayedMatrix object to ADSArray would produce a broken object if we didn't also have the following coercion methods:

setAs("DelayedArray", "ADSArray", function(from) realize_as_ADSArray(from))
setAs("DelayedMatrix", "ADSArray", function(from) realize_as_ADSArray(from))

So in the same way that an array-like object x (ordinary array or DelayedArray object) can be realized as an HDF5Array or RleArray object with as(x, "HDF5Array") or as(x, "RleArray"), now it can also be realized as an ADSArray object with as(x, "ADSArray").

Real examples of realization backends

Refer to R/writeHDF5Array.R and R/writeTENxMatrix.R in the HDF5Array package for the implementation of the HDF5Array and TENxMatrix realization backends.

Note that you can use supportedRealizationBackends() to see the list of realization backends currently supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant