Skip to content

Conversation

@thetarnav
Copy link
Contributor

Add extra byte to the allocated read_entire_file buffer so it could be safely cast to a cstring.
Just so you can do cstring(raw_data(file_data)) without copying.

@Kelimion
Copy link
Member

Kelimion commented Sep 2, 2025

It would be very surprising for read_entire_file to return more than the file, and can lead to problems with binary file formats, or cause sanity checks to now fail on the file length or content hash.

It would be better to have a complementary read_entire_file_as_cstring that zero pads and returns as cstring instead of []u8. And if you add two zero bytes, you can then also cheaply cast it to utf-16.

@JesseRMeyer
Copy link

Alternatively, a proc group entry could be added that accepts a user provided buffer to write the file data into, which would be pre-sized to accommodate the null terminated character.

@gingerBill
Copy link
Member

I think this is probably a bad idea, especially to do by default. If anything, we should have a separate procedure as @Kelimion states like read_entire_file_as_cstring or something just have a buffer to read into with that NUL terminator.

@Kelimion
Copy link
Member

Kelimion commented Sep 2, 2025

I think this is probably a bad idea, especially to do by default. If anything, we should have a separate procedure as @Kelimion states like read_entire_file_as_cstring or something just have a buffer to read into with that NUL terminator.

I'm leaning toward the buffer for its flexibility. Something like read_file that takes a file descriptor and buffer and returns the number of bytes read and an OS error. Allows you to read into a buffer with 1 extra byte, two extra bytes, or even read several files into a larger buffer at an increasing offset to concatenate them.

@thetarnav
Copy link
Contributor Author

It would be very surprising for read_entire_file to return more than the file, and can lead to problems with binary file formats, or cause sanity checks to now fail on the file length or content hash.

The returned data is unchanged, a null byte is only added to the backing memory, after the file data. Which in a lot of cases is already there due to aligning. This just guarantees that it's there so you can cast safely.

It would be better to have a complementary read_entire_file_as_cstring that zero pads and returns as cstring instead of []u8. And if you add two zero bytes, you can then also cheaply cast it to utf-16.

That's fine as well. I only modified read_entire_file as it was a small change.

I think this is probably a bad idea, especially to do by default. If anything, we should have a separate procedure as @Kelimion states like read_entire_file_as_cstring or something just have a buffer to read into with that NUL terminator.

I'm leaning toward the buffer for its flexibility. Something like read_file that takes a file descriptor and buffer and returns the number of bytes read and an OS error. Allows you to read into a buffer with 1 extra byte, two extra bytes, or even read several files into a larger buffer at an increasing offset to concatenate them.

Isn't that just read?

@Kelimion
Copy link
Member

Kelimion commented Sep 2, 2025

I'm leaning toward the buffer for its flexibility. Something like read_file that takes a file descriptor and buffer and returns the number of bytes read and an OS error. Allows you to read into a buffer with 1 extra byte, two extra bytes, or even read several files into a larger buffer at an increasing offset to concatenate them.

Isn't that just read?

read_entire_file loops on read because it may not read the entire file at once, and "read_file" would do the same and thus differ from plain read.

It's not a bad idea to make read_entire_file a procedure group with the original procedure, and a "read_file" or read_entire_file_into_buffer that also loops until read fails (with EOF or otherwise) or the buffer is full.

The returned data is unchanged, a null byte is only added to the backing memory, after the file data. Which in a lot of cases is already there due to aligning. This just guarantees that it's there so you can cast safely.

I'm curious to know what @gingerBill thinks about this distinction. I still feel like it should be its own procedure.

@thetarnav
Copy link
Contributor Author

read_entire_file loops on read because it may not read the entire file at once, and "read_file" would do the same and thus differ from plain read.

It's not a bad idea to make read_entire_file a procedure group with the original procedure, and a "read_file" or read_entire_file_into_buffer that also loops until read fails (with EOF or otherwise) or the buffer is full.

So the input buffer param would be ^[dynamic]u8 or some large enough []u8?
read_entire_file uses both methods depending on if the file size can be obtained or not.
If it's ^[dynamic]u8 then I don't mind. Although at this point it might as well accept a string builder :D

Should read_entire_file_as_cstring return the size as well? I guess if I need a cstring to pass it to some c funciton, I don't need a size, but it could be useful if I also want to use the string somewhere else and the size calculated anyway.
The patched read_entire_file is nice enough to provide both.

@gingerBill
Copy link
Member

Another issues: adding an extra NUL terminator does cause issues with things like the mem.Tracking_Allocator. So it'll result in many false positives in those cases.

@thetarnav thetarnav force-pushed the read_entire_file-null-terminated branch from 235825e to 816696e Compare September 14, 2025 10:11
@thetarnav thetarnav changed the title Make read_entire_file buffer null terminated Add read_entire_file_as_cstring to os2 Sep 14, 2025
@thetarnav
Copy link
Contributor Author

I've changed it to add an explicit read_entire_file_as_cstring procedure, which seems to work fine with a tracking allocator.

@Tetralux
Copy link
Contributor

One concern I have here is that if there are any null bytes in the middle of the file then this is stuffed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants