Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove mention that 1-bit images use 1 byte per pixel #8777

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hchargois
Copy link

@hchargois hchargois commented Feb 27, 2025

(1-bit pixels, black and white, stored with one pixel per byte)

That line in the documentation implies that a 1-bit (W, H)-sized image would occupy W*H bytes in raw form, at least that's how I understand it.

But it doesn't seem to be the case in Pillow 11.1.0:

>>> Image.new("1", (3, 2), 1).tobytes()
b'\xe0\xe0'

It seems that it now stores strips of (up to) 8 horizontal pixels. Since that line in the documentation was last edited 12 years ago, I guess it was true at the time but the format has been changed for efficiency since. Which is great, but the documentation is misleading.

I suggest to simply remove the 1-byte-per-pixel part.

@wiredfool
Copy link
Member

wiredfool commented Feb 27, 2025

Internally, storage for 1 bit images is 1 byte per pixel:

if (strcmp(mode, "1") == 0) {
/* 1-bit images */
im->bands = im->pixelsize = 1;
im->linesize = xsize;

1 bit images are shuffled at import/export to put them in this format.

@hchargois
Copy link
Author

Interesting! I didn't expect tobytes to return something other than how the data is stored internally. The documentation of tobytes does say so:

This method returns the raw image data from the internal storage

That documentation also says to look at _imaging.c so I followed the white rabbit up to Pack.c and found out that even the raw encoder can actually use multiple packing functions, that's super cool! I've seen cases where custom serialization of individual channels or postprocessing of tobytes was done to encode e.g. pixels in BGR format, while tobytes("raw", "BGR") would have been enough and much more efficient. I guess documenting all of these special packers would be daunting; but pointing at Pack.c and giving a couple examples could be very useful.

And in any case, I still think the two sentences (the "one pixel per byte" and the "tobytes returns raw data") are very misleading when taken together. I would still suggest to remove the first sentence. I think it's better to not give any expectations than provide false expectations.

@wiredfool
Copy link
Member

Yeah, tobytes has a lot behind it because of the packer infrastructure. "Returns raw image data" is mostly true, at least for most modes in the default raw configuration. It's more true than running image.save into a BytesIO object.

I suspect that the 1 bit thing is just one of those historical choices -- it's a lot easier to do math in 1 pixel per byte than one per bit, especially when you already have that math implemented for other modes. Historically it's tripped up people with the array/numpy integration, and this is effectively the one way you can get 1 bit/pixel out of a mode 1 image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants