Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOF is not working as intended #96

Open
umutoztunc opened this issue Feb 11, 2021 · 16 comments
Open

EOF is not working as intended #96

umutoztunc opened this issue Feb 11, 2021 · 16 comments

Comments

@umutoztunc
Copy link
Contributor

While I was trying to investigate some issues with fgets implementation, I've noticed that the source of the issue is EOF definition itself.

int c = getchar();
if (c < 0) {
  printf("%x\n", c);
  printf("%x\n", EOF);
}

After running the code snippet above, I have got the following result:

-1
ffffff

As you can see, EOF is 24-bit instead of 32-bit. Therefore, if (c == EOF) is always false. This can be seen in the generated EIR code as well:
image
16777215 is simply 0xffffff. This causes some bugs. Even though interpreting the EIR output with eli works fine for some cases, it still fails if you try to convert EIR to Whitespace for instance.

Since I haven't been able to pinpoint issue yet, I have made temporary fix by changing the condition from c == EOF to c < 0. I will make a pull request soon.

By the way, printf might have issues with hexadecimal representation as well since it literally printed back -1 instead of ffffffff.

@shinh
Copy link
Owner

shinh commented Feb 12, 2021

ELVM has a stupid ABI. All integer types are unsigned and have the same size (i.e., sizeof(char) == sizeof(int) == 1). 1 "byte" is 24 bits in most backends (everything is originally designed to run brainfuck compiled from C fast). So in ELVM, EOF is 0xffffff, I agree it sounds very strange, though. There is even a test: https://github.com/shinh/elvm/blob/master/test/eof.c

Since there is no negative number, if (c < 0) is not a right condition and it should be if (c == EOF). If it does not work in some backends, it's a bug of the backend translator.

#include <stdio.h>

int main() {
    int c = getchar();
    printf("%x\n", c);
    printf("%x\n", EOF);
}

showed

ffffff
ffffff

by eli, elc -c, and elc -rb. If -1 is somehow shown, the whitespace backend has an issue? For some reason, whitespace is not working in my environment right now, so haven't confirmed it yet.

@umutoztunc
Copy link
Contributor Author

I see. I have confirmed that the issue is related to whitespace backend. If I run the code with eli, it prints ffffff, but if I run it with the whitespace interpreter, it prints -1 instead.

By the way, is there a reason fgets return type is int instead of char*? Also fgets doesn't use the given fp variable, it acts like gets with a character limit. I am aware that most backends don't support file i/o operations. Still, can't we call fgetc(fp) instead of getchar() without breaking anything?

shinh added a commit that referenced this issue Feb 12, 2021
It should return char*, not int. Pointed out in #96.
@shinh
Copy link
Owner

shinh commented Feb 12, 2021

Wow, I haven't realized fgets is returning int! It's definitely a bug. It looks like there is no user of this function and that's why this issue wasn't found. I also agree fgets should call fgetc for the EOF hack for whitespace in fgetc.

I think #97 fixes the both of the issues? Thanks for pointing this out!

@umutoztunc
Copy link
Contributor Author

You are so fast! Yes, this fixes both. Actually, it fixes another bug that I haven't mentioned yet :D. The previous version was copying EOF to buffer, just like \n. Also, it seems we both fixed it almost the same way. My local patch was like this:
image

The only difference in our approaches, your version does nothing and return null pointer if size is 1. Mine puts a null byte at the beginning of the buffer and returns the buffer. I have referred to this implementation to decide that.

@shinh
Copy link
Owner

shinh commented Feb 12, 2021

Ah, yours is better! Could you send a PR to re-correct the implementation? If you don't have time for it, I'll do it this night.

@umutoztunc
Copy link
Contributor Author

Sure! By the way, there is one more bug. All i/o functions stop when they encounter a null byte in ELVM's implementation. I don't know the reason, but it seems getc, getchar, fgetc are all returning EOF instead of 0 for null byte. Since we use them to implement fgets, it also cannot read an input such as "foo\x00bar". Maybe, the bug is related to 8cc.

On a side note, using fgetc in fgets does not fix the issue with whitespace backend. So, there is still something wrong about it.

@shinh
Copy link
Owner

shinh commented Feb 12, 2021

Yeah, it's unfortunate we cannot distinguish an EOF and a null byte and it's impossible to handle binary files without preprocessing.

If it's a bug rather than a limitation, it's due to the definition of ELVM's GETC instruction. When it reaches to EOF, ELVM implementation should fill 0 instead of -1:

elvm/ir/eli.c

Line 169 in c30d33f

regs[inst->dst.reg] = c == EOF ? 0 : c;

I think I chose this definition because some backend esolangs do not distinguish EOF from 0 anyway. For example, many brainfuck implementation fills -1 or 0 when , reaches to EOF: https://esolangs.org/wiki/Brainfuck#EOF I think current BF backend supports both -1 and 0 (

bf_ifzero_begin(1); {
), but this means programs compiled to BF cannot handle 0 and 255 in binary files.

Some backends have more limitations in their inputs. Please check https://github.com/shinh/elvm/blob/master/tools/runtex.sh for example. It converts /dev/stdin to another format.

@umutoztunc
Copy link
Contributor Author

Oh I see. Maybe we can choose some defaults and provide other stuff as options while converting to backends. For example, we can set EOF as -1. However, the user should be able to provide a flag to change it if he wants such as:
out/elc -bf -eof=0 infile.eir > outfile.bf

Still, it might make things too complicated. I am not sure if that's worth it, just an idea.

@shinh
Copy link
Owner

shinh commented Feb 12, 2021

I think fgets was still wrong. I didn't expect fgets has such a corner case :) We should have returned nullptr if nothing is read while size > 1: #99

As for -eof=0, it makes sense, but I'd rather revisit the definition of GETC instruction if I have time.

@umutoztunc
Copy link
Contributor Author

umutoztunc commented Feb 13, 2021

This looks good to me.

Should I create another issue for the Whitespace backend? So, you said that all numbers should be 24-bit and unsigned. Doesn't this contradict with Whitespace language itself? From my understanding, the Whitespace language should be able to use negative numbers and there is no bit limit for the numbers. So, I should be able to work with a 64-bit number.

@shinh
Copy link
Owner

shinh commented Feb 14, 2021

The EIR to WS translator should have no issue as it emits code which limits integers to 24bit range:

ws_emit(WS_PUSH); ws_emit_uint_mod_ws();
I agree it does not fully utilize capability of whitespace language, but the translation should be valid.

It's fine to define a 64bit or 32bit dialect of ELVM IR and support a mode like elc -ws -64bit. As for whitespace, I guess removing (x+(1<<24))%(1<<24) I mentioned and setting 1<<24 to SP to keep 24 bit address space would work? I think the initial value of SP is set here:

ws_emit_store(i, 0);

@umutoztunc
Copy link
Contributor Author

I have just tested it again with our latest patch, it already works as intended. Both whitespace interpreter and eli shows that the buffer is filled with zeros instead of -1s. I guess switching from getchar to fgetc does solved the issue since there is a whitespace special trick in fgetc implementation.

Also, you are right that the translation is still valid even though numbers are 24-bit. I guess everything is working now? :D

@umutoztunc
Copy link
Contributor Author

Oh sorry, the bug is still there. I forgot that it works fine if you provide \n in the input. I will try to find a solution. I just don't understand how we get -1 if all numbers are unsigned.

@umutoztunc
Copy link
Contributor Author

By the way, tests for Whitespace backend crashes. This might be related to the issue.

➜ make test-ws
Skip building kx due to lack of kinx
Skip building wasi due to lack of wasmtime
Skip building wasm due to lack of wat2wasm
Skip building el due to lack of emacs
Skip building cl due to lack of sbcl
Skip building scala due to lack of scalac
Skip building swift due to lack of swiftc
Skip building cr due to lack of crystal
Skip building cs due to lack of
Skip building i due to lack of ick
Skip building forth due to lack of gforth
Skip building fs due to lack of
Skip building lua due to lack of lua
Skip building ll due to lack of lli
Skip building scm_sr due to lack of gosh
Skip building hs due to lack of ghc
Skip building oct due to lack of octave
Skip building scratch3 due to lack of
Skip building j due to lack of jconsole
./runtest.sh out/elc.c.eir.ws.out tools/runws.sh out/elc.c.eir.ws
Buffer overflow!
make: *** [build.mk:5: out/elc.c.eir.ws.out] Error 1

@shinh
Copy link
Owner

shinh commented Feb 21, 2021

As for the "Buffer overflow!", I believe I found the cause and fixed it by: 8d75083 You probably need to re-build whitespace interpreter by touch Whitespace/whitespace.c && make test-ws

As for -1, perhaps #100 may fix the issue?

@umutoztunc
Copy link
Contributor Author

Cool! Both are fixed now. There is no overflow and -1s are clearly gone. Great job mate, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants