EOF is not working as intended #96

umutoztunc · 2021-02-11T23:10:08Z

While I was trying to investigate some issues with fgets implementation, I've noticed that the source of the issue is EOF definition itself.

int c = getchar();
if (c < 0) {
  printf("%x\n", c);
  printf("%x\n", EOF);
}

After running the code snippet above, I have got the following result:

-1
ffffff

As you can see, EOF is 24-bit instead of 32-bit. Therefore, if (c == EOF) is always false. This can be seen in the generated EIR code as well:

16777215 is simply 0xffffff. This causes some bugs. Even though interpreting the EIR output with eli works fine for some cases, it still fails if you try to convert EIR to Whitespace for instance.

Since I haven't been able to pinpoint issue yet, I have made temporary fix by changing the condition from c == EOF to c < 0. I will make a pull request soon.

By the way, printf might have issues with hexadecimal representation as well since it literally printed back -1 instead of ffffffff.

The text was updated successfully, but these errors were encountered:

shinh · 2021-02-12T00:54:37Z

ELVM has a stupid ABI. All integer types are unsigned and have the same size (i.e., sizeof(char) == sizeof(int) == 1). 1 "byte" is 24 bits in most backends (everything is originally designed to run brainfuck compiled from C fast). So in ELVM, EOF is 0xffffff, I agree it sounds very strange, though. There is even a test: https://github.com/shinh/elvm/blob/master/test/eof.c

Since there is no negative number, if (c < 0) is not a right condition and it should be if (c == EOF). If it does not work in some backends, it's a bug of the backend translator.

#include <stdio.h>

int main() {
    int c = getchar();
    printf("%x\n", c);
    printf("%x\n", EOF);
}

showed

ffffff
ffffff

by eli, elc -c, and elc -rb. If -1 is somehow shown, the whitespace backend has an issue? For some reason, whitespace is not working in my environment right now, so haven't confirmed it yet.

umutoztunc · 2021-02-12T01:13:18Z

I see. I have confirmed that the issue is related to whitespace backend. If I run the code with eli, it prints ffffff, but if I run it with the whitespace interpreter, it prints -1 instead.

By the way, is there a reason fgets return type is int instead of char*? Also fgets doesn't use the given fp variable, it acts like gets with a character limit. I am aware that most backends don't support file i/o operations. Still, can't we call fgetc(fp) instead of getchar() without breaking anything?

It should return char*, not int. Pointed out in #96.

shinh · 2021-02-12T01:56:19Z

Wow, I haven't realized fgets is returning int! It's definitely a bug. It looks like there is no user of this function and that's why this issue wasn't found. I also agree fgets should call fgetc for the EOF hack for whitespace in fgetc.

I think #97 fixes the both of the issues? Thanks for pointing this out!

umutoztunc · 2021-02-12T02:20:36Z

You are so fast! Yes, this fixes both. Actually, it fixes another bug that I haven't mentioned yet :D. The previous version was copying EOF to buffer, just like \n. Also, it seems we both fixed it almost the same way. My local patch was like this:

The only difference in our approaches, your version does nothing and return null pointer if size is 1. Mine puts a null byte at the beginning of the buffer and returns the buffer. I have referred to this implementation to decide that.

shinh · 2021-02-12T02:43:56Z

Ah, yours is better! Could you send a PR to re-correct the implementation? If you don't have time for it, I'll do it this night.

umutoztunc · 2021-02-12T02:58:03Z

Sure! By the way, there is one more bug. All i/o functions stop when they encounter a null byte in ELVM's implementation. I don't know the reason, but it seems getc, getchar, fgetc are all returning EOF instead of 0 for null byte. Since we use them to implement fgets, it also cannot read an input such as "foo\x00bar". Maybe, the bug is related to 8cc.

On a side note, using fgetc in fgets does not fix the issue with whitespace backend. So, there is still something wrong about it.

shinh · 2021-02-12T03:49:57Z

Yeah, it's unfortunate we cannot distinguish an EOF and a null byte and it's impossible to handle binary files without preprocessing.

If it's a bug rather than a limitation, it's due to the definition of ELVM's GETC instruction. When it reaches to EOF, ELVM implementation should fill 0 instead of -1:

elvm/ir/eli.c

Line 169 in c30d33f

regs[inst->dst.reg] = c == EOF ? 0 : c;

I think I chose this definition because some backend esolangs do not distinguish EOF from 0 anyway. For example, many brainfuck implementation fills -1 or 0 when , reaches to EOF: https://esolangs.org/wiki/Brainfuck#EOF I think current BF backend supports both -1 and 0 (

elvm/target/bf.c

Line 529 in c30d33f

bf_ifzero_begin(1); {

), but this means programs compiled to BF cannot handle 0 and 255 in binary files.

Some backends have more limitations in their inputs. Please check https://github.com/shinh/elvm/blob/master/tools/runtex.sh for example. It converts /dev/stdin to another format.

umutoztunc · 2021-02-12T04:05:20Z

Oh I see. Maybe we can choose some defaults and provide other stuff as options while converting to backends. For example, we can set EOF as -1. However, the user should be able to provide a flag to change it if he wants such as:
out/elc -bf -eof=0 infile.eir > outfile.bf

Still, it might make things too complicated. I am not sure if that's worth it, just an idea.

shinh · 2021-02-12T04:52:05Z

I think fgets was still wrong. I didn't expect fgets has such a corner case :) We should have returned nullptr if nothing is read while size > 1: #99

As for -eof=0, it makes sense, but I'd rather revisit the definition of GETC instruction if I have time.

umutoztunc · 2021-02-13T10:31:55Z

This looks good to me.

Should I create another issue for the Whitespace backend? So, you said that all numbers should be 24-bit and unsigned. Doesn't this contradict with Whitespace language itself? From my understanding, the Whitespace language should be able to use negative numbers and there is no bit limit for the numbers. So, I should be able to work with a 64-bit number.

shinh · 2021-02-14T06:06:42Z

The EIR to WS translator should have no issue as it emits code which limits integers to 24bit range:

elvm/target/ws.c

Line 145 in 18b45bc

ws_emit(WS_PUSH); ws_emit_uint_mod_ws();

I agree it does not fully utilize capability of whitespace language, but the translation should be valid.

It's fine to define a 64bit or 32bit dialect of ELVM IR and support a mode like elc -ws -64bit. As for whitespace, I guess removing (x+(1<<24))%(1<<24) I mentioned and setting 1<<24 to SP to keep 24 bit address space would work? I think the initial value of SP is set here:

elvm/target/ws.c

Line 232 in 18b45bc

ws_emit_store(i, 0);

umutoztunc · 2021-02-20T01:44:40Z

I have just tested it again with our latest patch, it already works as intended. Both whitespace interpreter and eli shows that the buffer is filled with zeros instead of -1s. I guess switching from getchar to fgetc does solved the issue since there is a whitespace special trick in fgetc implementation.

Also, you are right that the translation is still valid even though numbers are 24-bit. I guess everything is working now? :D

umutoztunc · 2021-02-20T01:48:20Z

Oh sorry, the bug is still there. I forgot that it works fine if you provide \n in the input. I will try to find a solution. I just don't understand how we get -1 if all numbers are unsigned.

umutoztunc · 2021-02-20T02:01:28Z

By the way, tests for Whitespace backend crashes. This might be related to the issue.

➜ make test-ws
Skip building kx due to lack of kinx
Skip building wasi due to lack of wasmtime
Skip building wasm due to lack of wat2wasm
Skip building el due to lack of emacs
Skip building cl due to lack of sbcl
Skip building scala due to lack of scalac
Skip building swift due to lack of swiftc
Skip building cr due to lack of crystal
Skip building cs due to lack of
Skip building i due to lack of ick
Skip building forth due to lack of gforth
Skip building fs due to lack of
Skip building lua due to lack of lua
Skip building ll due to lack of lli
Skip building scm_sr due to lack of gosh
Skip building hs due to lack of ghc
Skip building oct due to lack of octave
Skip building scratch3 due to lack of
Skip building j due to lack of jconsole
./runtest.sh out/elc.c.eir.ws.out tools/runws.sh out/elc.c.eir.ws
Buffer overflow!
make: *** [build.mk:5: out/elc.c.eir.ws.out] Error 1

shinh · 2021-02-21T11:41:51Z

As for the "Buffer overflow!", I believe I found the cause and fixed it by: 8d75083 You probably need to re-build whitespace interpreter by touch Whitespace/whitespace.c && make test-ws

As for -1, perhaps #100 may fix the issue?

umutoztunc · 2021-02-24T00:02:54Z

Cool! Both are fixed now. There is no overflow and -1s are clearly gone. Great job mate, thanks!

shinh added a commit that referenced this issue Feb 12, 2021

Fix implementation of fgets

e36a410

It should return char*, not int. Pointed out in #96.

shinh mentioned this issue Feb 12, 2021

Fix implementation of fgets #97

Merged

umutoztunc mentioned this issue Feb 12, 2021

Fix fgets implementation #98

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EOF is not working as intended #96

EOF is not working as intended #96

umutoztunc commented Feb 11, 2021

shinh commented Feb 12, 2021

umutoztunc commented Feb 12, 2021

shinh commented Feb 12, 2021

umutoztunc commented Feb 12, 2021

shinh commented Feb 12, 2021

umutoztunc commented Feb 12, 2021

shinh commented Feb 12, 2021

umutoztunc commented Feb 12, 2021

shinh commented Feb 12, 2021

umutoztunc commented Feb 13, 2021 •

edited

Loading

shinh commented Feb 14, 2021 •

edited

Loading

umutoztunc commented Feb 20, 2021

umutoztunc commented Feb 20, 2021

umutoztunc commented Feb 20, 2021

shinh commented Feb 21, 2021

umutoztunc commented Feb 24, 2021

EOF is not working as intended #96

EOF is not working as intended #96

Comments

umutoztunc commented Feb 11, 2021

shinh commented Feb 12, 2021

umutoztunc commented Feb 12, 2021

shinh commented Feb 12, 2021

umutoztunc commented Feb 12, 2021

shinh commented Feb 12, 2021

umutoztunc commented Feb 12, 2021

shinh commented Feb 12, 2021

umutoztunc commented Feb 12, 2021

shinh commented Feb 12, 2021

umutoztunc commented Feb 13, 2021 • edited Loading

shinh commented Feb 14, 2021 • edited Loading

umutoztunc commented Feb 20, 2021

umutoztunc commented Feb 20, 2021

umutoztunc commented Feb 20, 2021

shinh commented Feb 21, 2021

umutoztunc commented Feb 24, 2021

umutoztunc commented Feb 13, 2021 •

edited

Loading

shinh commented Feb 14, 2021 •

edited

Loading