adds `read_npz` and `write_npz`, convenience wrappers #46

jonathanstrong · 2021-03-04T18:32:17Z

these work like read_npy and write_npy but write compressed .npz files instead. I wanted this functionality for writing ephemeral array files to be able to check something later if needed without taking up too much disk space.

in comparing to read_npy/write_npy, there is one major difference: since a npz file can contain multiple named arrays/files, this picks a default name for the single array it writes with write_npz, while allowing the user to specify the name to extract with read_npz. this may not be the best choice, but it seemed less than ideal to not permit specifying the name in read_npz, and I wanted write_npz to remain as simple as possible.

I picked the default name for write_npz based on what numpy does in savez_compressed ("arr_0.npy"). however, I think there is a divergence there. using np.load, you will get a dict-like object that allows you to access the arrays without the .npy extension (i.e. at key arr_0). however, using NpzReader, you need to use the full arr_0.npy name to retrieve the same array. just wanted to flag as this tripped me up a bit.

thanks for your consideration of this pull request.

… and `NpzReader` similar to `read_npy` and `write_npy`

jturner314 · 2021-03-05T03:55:14Z

using np.load, you will get a dict-like object that allows you to access the arrays without the .npy extension (i.e. at key arr_0). however, using NpzReader, you need to use the full arr_0.npy name to retrieve the same array. just wanted to flag as this tripped me up a bit.

Thanks for pointing this out. I've created #48 to track this issue.

Thanks also for the PR. There are a few things about the proposed API which are unsatisfying to me:

The inconsistency regarding the name parameter in read_npz/write_npz. I'd prefer for either both to accept a name parameter or neither to accept a name. I think it would be better for both to accept a name.
The names read_npz/write_npz would IMO be more appropriate for functions which read/write general .npz files, rather than functions which read/write .npz files containing only a single array. For these functions, I'd prefer names more like read_npz_array/write_npz_array_compressed.
It would be simpler for the name parameter to have type &str rather than N: Into<String>.

Creating a .npz archive for a single array seems somewhat awkward. I wonder if you'd be happier using a single-file compression format (such as .gz, .xz, .bz2, or .zst) applied to a .npy file instead of using a .zip/.npz archive. This would avoid the problem of choosing a name for the array in the archive and would avoid the complexity of the .zip format. For example, to write/read a .npy.gz file using ndarray-npy, you could do this:

use flate2::{bufread::GzDecoder, write::GzEncoder, Compression};
use ndarray::{array, Array2};
use ndarray_npy::{ReadNpyError, ReadNpyExt, WriteNpyError, WriteNpyExt};
use std::fs::File;
use std::io::{BufReader, BufWriter, Write};
use std::path::Path;

fn write_npy_gz<P, T>(path: P, array: &T) -> Result<(), WriteNpyError>
where
    P: AsRef<Path>,
    T: WriteNpyExt,
{
    // Note: I'm not sure if the `BufWriter` actually helps or not.
    let mut writer = GzEncoder::new(BufWriter::new(File::create(path)?), Compression::default());
    array.write_npy(&mut writer)?;
    writer.finish()?.flush()?;
    Ok(())
}

fn read_npy_gz<P, T>(path: P) -> Result<T, ReadNpyError>
where
    P: AsRef<Path>,
    T: ReadNpyExt,
{
    // Note: I'm not sure if the `BufReader` actually helps or not.
    T::read_npy(GzDecoder::new(BufReader::new(File::open(path)?)))
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let arr1 = array![[1, 2, 3], [4, 5, 6]];

    // Write the array.
    write_npy_gz("foo.npy.gz", &arr1)?;

    // Read it back.
    let arr2: Array2<i32> = read_npy_gz("foo.npy.gz")?;

    println!("arr1:\n{}", arr1);
    println!("arr2:\n{}", arr2);
    assert_eq!(arr1, arr2);

    Ok(())
}

To read it with NumPy, you could do this:

import numpy as np
import gzip

def load_npy_gz(path):
    with gzip.open(path) as f:
        return np.load(f)

arr = load_npy_gz('foo.npy.gz')
print(arr)

(You could also decompress .npy.gz files at the command line using gunzip.)

adds read_npz and write_npz, convenience wrappers for NpzWriter…

4da2a9c

… and `NpzReader` similar to `read_npy` and `write_npy`

jturner314 mentioned this pull request Mar 5, 2021

Match NumPy behavior regarding ".npy" extension for names in .npz files #48

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds `read_npz` and `write_npz`, convenience wrappers #46

adds `read_npz` and `write_npz`, convenience wrappers #46

jonathanstrong commented Mar 4, 2021

jturner314 commented Mar 5, 2021 •

edited

Loading

adds read_npz and write_npz, convenience wrappers #46

Are you sure you want to change the base?

adds read_npz and write_npz, convenience wrappers #46

Conversation

jonathanstrong commented Mar 4, 2021

jturner314 commented Mar 5, 2021 • edited Loading

adds `read_npz` and `write_npz`, convenience wrappers #46

adds `read_npz` and `write_npz`, convenience wrappers #46

jturner314 commented Mar 5, 2021 •

edited

Loading