Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor fixes for dump printing #2342

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rvagg
Copy link
Contributor

@rvagg rvagg commented Dec 1, 2023

  • uint64's should use an unsigned print so as not to cast at the edges
  • FloatParser<T>::ParseHex should be able to handle nan, nan:* and ±inf too
  • remove extraneous . in the float hex format, 0x1.p+0 is not correct
  • added some tests of known-good actual values to augment the round-tripping tests; these match C++ output except for the signalling NaN extra data, they also match Go except for NaN and Inf casing and an additional zero pad Go adds on the significand for some odd reason.

@rvagg rvagg force-pushed the rvagg/dump-fixes branch 2 times, most recently from eeaec87 to 2e1b9b5 Compare December 1, 2023 13:41
00004b: 80 7f |
00004d: 1a | drop
00004e: 42 7f | i64.const -1
00004e: 42 7f | i64.const 18446744073709551615
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it makes more sense to show these as unsigned?

The numbers are read signed values in the binary reader so maybe it makes sense to show them as signed in the disassembly too?

CHECK_RESULT(ReadS64Leb128(&value, "i64.const value"));

Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some kind of bug in the literal parser that you are fixing here? If so, I wonder why it was not triggered by any of the spec tests? Are we missing a spec test maybe?

@rvagg
Copy link
Contributor Author

rvagg commented Dec 2, 2023

This is coming from my unorthodox usage of wabt—I'm doing some binary parsing and using the dump output as part of my test fixturing to make sure I can produce the same output for the same binary and in reimplementing some of the printing I'm seeing some of the rough edges here.

  1. The 0x1.p... hex printing, which I believe is a simple flaw, I don't see this in anyone else's interpretation of hex float representations. The current tests for hex printing simply involve round-tripping floats through the printer and back through the parser and the extraneous . doesn't trip anything up (reasonable for a non-strict parser). It's just when you're testing for specific output it becomes a problem. Hence my inclusion of some tests that go from bytes to specific text output for the hex printing, just to lock it in for the main cases and edges.

  2. The nan and infinity inclusion in ParseFloat isn't being used here because there's no reason to parse hex floats in here, I included it for completeness of the implementation, and to get my spec tests working cleanly.

  3. The i64.const printing: I get the argument that they could be signed, it's probably not a strong argument either way since they are technically uninterpreted; maybe the strongest argument is that they can be signed coming in from the text format?

Either way, it's the const printing inconsistency that's the problem because i32.const are printed as unsigned already, while i64.const are signed:

LogOpcode("%u", value);

i.e. current test fixtures for the dump even show this discrepancy - see input at the top vs output at the bottom:

$ git grep -E 'i(32|64)\.const ' -- test/dump/const.txt
test/dump/const.txt:    i32.const 0
test/dump/const.txt:    i32.const -2147483648
test/dump/const.txt:    i32.const 4294967295
test/dump/const.txt:    i32.const -0x80000000
test/dump/const.txt:    i32.const 0xffffffff
test/dump/const.txt:    i64.const 0
test/dump/const.txt:    i64.const -9223372036854775808
test/dump/const.txt:    i64.const 18446744073709551615
test/dump/const.txt:    i64.const -0x8000000000000000
test/dump/const.txt:    i64.const 0xffffffffffffffff
test/dump/const.txt: 000019: 41 00                      | i32.const 0
test/dump/const.txt: 00001c: 41 80 80 80 80 78          | i32.const 2147483648
test/dump/const.txt: 000023: 41 7f                      | i32.const 4294967295
test/dump/const.txt: 000026: 41 80 80 80 80 78          | i32.const 2147483648
test/dump/const.txt: 00002d: 41 7f                      | i32.const 4294967295
test/dump/const.txt: 000030: 42 00                      | i64.const 0
test/dump/const.txt: 000033: 42 80 80 80 80 80 80 80 80 | i64.const -9223372036854775808
test/dump/const.txt: 00003f: 42 7f                      | i64.const -1
test/dump/const.txt: 000042: 42 80 80 80 80 80 80 80 80 | i64.const -9223372036854775808
test/dump/const.txt: 00004e: 42 7f                      | i64.const -1

The one minor wrinkle in printing them signed is that they are represented unsigned internally and the path to printing is a straightforward one through BinaryReaderObjdumpDisassemble::OnOpcodeUint32 and BinaryReaderObjdumpDisassemble::OnOpcodeUint64. If we do them signed then they'd both need special cases for the consts, which is doable, just messes with the model a bit, as per spec:

In the abstract syntax, they are represented as unsigned values.

Although, for the text spec we have:

For the other integer instructions, the use of two’s complement for the signed interpretation means that they behave the same regardless of signedness.

So 🤷 are we text or abstract/binary at the point of dumping?

@sbc100
Copy link
Member

sbc100 commented Dec 2, 2023

Could you upload a separate PR for the i64.const printing? I agree it should be consistent with i32.const.

I'm not sure I understand the other two items yet..

rvagg added a commit to rvagg/wabt that referenced this pull request Dec 2, 2023
@rvagg
Copy link
Contributor Author

rvagg commented Dec 2, 2023

Done, leaving this with just the ParseHex change and the extraneous . in the hex format; if it makes it easier to stomach I can back out the inf and nan stuff, that just seemed like a logical thing to do because the method is called "ParseHex" but it can't parse all of the hex that WriteHex produces, hence asymmetry. It'll just mean that my test cases will need to special case "nan*" and "*inf" cases to get the job done.

Going deeper into the . thing in the hexfloat format, the case being addressed here is specifically about subnormals (sorry, I wasn't clear on that to begin with).

The . is technically allowed by the spec I believe, so the parser being lax is fine, but the printer is inconsistent, both within itself and across other implementations (i.e. I've not found another implementation yet that does this). I believe that the formalised grammar defined for IEEE 745 looks like this: https://observablehq.com/@jrus/hexfloat#cell-631 - {hexDigit}* "." {hexDigit}+ | {hexDigit}+ "." | {hexDigit}+ and the WebAssembly spec appears to suggest roughly the same thing wrt ., with the ? indicator:

Screenshot 2023-12-02 at 1 02 32 pm

What's happening here, and why it's not being picked up by existing tests, is that this branch is only encountered for subnormals, which aren't covered in the dump tests. They are covered in the const.wast spec tests, but not as hexfloat output. I've added a new commit to this branch that passes with the current main version of literal.cc but will fail on this fixed version. You can see in there the 0x1.p entries in there now, looking out of place, because further up the non-subnormal cases without exponent have just 0x1p. So in the branch I'm addressing in my fix it comes straight after the subnormal adjustment of sig.

Clear as mud I'm sure.

sbc100 pushed a commit that referenced this pull request Dec 2, 2023
This matches the behaviour of i32 printing.

Ref: #2342
@rvagg
Copy link
Contributor Author

rvagg commented Dec 2, 2023

Digging further into the parse hex thing I've done here, after tracing the normal path of incoming tokens, I can see nan and inf being handled entirely separately in the text lexer and coming in to literal.cc with an indicator that separates the parsing path. So, I've backed out that change that adds extra logic inside ParseHex, leaving just the . fix and the test changes in here. I hope that simplifies things a little bit. The only material change now is omitting . for subnormals that have no significand.

And I can flip those failing tests around when you're happy with the change, I just want to demonstrate what it's doing now with those.

rvagg added a commit to rvagg/kasm that referenced this pull request Dec 9, 2023
with the help of some wabt upstream fixes: WebAssembly/wabt#2342
@rvagg
Copy link
Contributor Author

rvagg commented Dec 9, 2023

@sbc100 I've fixed the tests with the new subnormal additions so that it passes now. Below is a capture of what it looks like with subnormals added but printing using the format that it comes out if you run the version on main. The change here is now very minimal aside from test additions. PTAL.

Capture of test failure with subnormals included
- test/dump/hexfloat_f32.txt
  STDOUT MISMATCH:
  --- expected
  +++ actual
  @@ -35,17 +35,17 @@
    000078: 1a                         | drop
    000079: 43 80 80 7f c0             | f32.const -0x1.ff01p+1
    00007e: 1a                         | drop
  - 00007f: 43 01 00 00 00             | f32.const 0x1.p-149
  + 00007f: 43 01 00 00 00             | f32.const 0x1p-149
    000084: 1a                         | drop
  - 000085: 43 01 00 00 80             | f32.const -0x1.p-149
  + 000085: 43 01 00 00 80             | f32.const -0x1p-149
    00008a: 1a                         | drop
  - 00008b: 43 01 00 00 00             | f32.const 0x1.p-149
  + 00008b: 43 01 00 00 00             | f32.const 0x1p-149
    000090: 1a                         | drop
  - 000091: 43 01 00 00 80             | f32.const -0x1.p-149
  + 000091: 43 01 00 00 80             | f32.const -0x1p-149
    000096: 1a                         | drop
  - 000097: 43 02 00 00 00             | f32.const 0x1.p-148
  + 000097: 43 02 00 00 00             | f32.const 0x1p-148
    00009c: 1a                         | drop
  - 00009d: 43 02 00 00 80             | f32.const -0x1.p-148
  + 00009d: 43 02 00 00 80             | f32.const -0x1p-148
    0000a2: 1a                         | drop
    0000a3: 43 03 00 00 00             | f32.const 0x1.8p-148
    0000a8: 1a                         | drop

- test/dump/hexfloat_f64.txt
  STDOUT MISMATCH:
  --- expected
  +++ actual
  @@ -39,41 +39,41 @@
    0000cc: 1a                         | drop
    0000cd: 44 00 00 00 00 00 00 00 80 | f64.const -0x0p+0
    0000d6: 1a                         | drop
  - 0000d7: 44 01 00 00 00 00 00 00 00 | f64.const 0x1.p-1074
  + 0000d7: 44 01 00 00 00 00 00 00 00 | f64.const 0x1p-1074
    0000e0: 1a                         | drop
  - 0000e1: 44 01 00 00 00 00 00 00 80 | f64.const -0x1.p-1074
  + 0000e1: 44 01 00 00 00 00 00 00 80 | f64.const -0x1p-1074
    0000ea: 1a                         | drop
  - 0000eb: 44 01 00 00 00 00 00 00 00 | f64.const 0x1.p-1074
  + 0000eb: 44 01 00 00 00 00 00 00 00 | f64.const 0x1p-1074
    0000f4: 1a                         | drop
  - 0000f5: 44 01 00 00 00 00 00 00 80 | f64.const -0x1.p-1074
  + 0000f5: 44 01 00 00 00 00 00 00 80 | f64.const -0x1p-1074
    0000fe: 1a                         | drop
  - 0000ff: 44 01 00 00 00 00 00 00 00 | f64.const 0x1.p-1074
  + 0000ff: 44 01 00 00 00 00 00 00 00 | f64.const 0x1p-1074
    000108: 1a                         | drop
  - 000109: 44 01 00 00 00 00 00 00 80 | f64.const -0x1.p-1074
  + 000109: 44 01 00 00 00 00 00 00 80 | f64.const -0x1p-1074
    000112: 1a                         | drop
  - 000113: 44 01 00 00 00 00 00 00 00 | f64.const 0x1.p-1074
  + 000113: 44 01 00 00 00 00 00 00 00 | f64.const 0x1p-1074
    00011c: 1a                         | drop
  - 00011d: 44 01 00 00 00 00 00 00 80 | f64.const -0x1.p-1074
  + 00011d: 44 01 00 00 00 00 00 00 80 | f64.const -0x1p-1074
    000126: 1a                         | drop
  - 000127: 44 01 00 00 00 00 00 00 00 | f64.const 0x1.p-1074
  + 000127: 44 01 00 00 00 00 00 00 00 | f64.const 0x1p-1074
    000130: 1a                         | drop
  - 000131: 44 01 00 00 00 00 00 00 80 | f64.const -0x1.p-1074
  + 000131: 44 01 00 00 00 00 00 00 80 | f64.const -0x1p-1074
    00013a: 1a                         | drop
  - 00013b: 44 02 00 00 00 00 00 00 00 | f64.const 0x1.p-1073
  + 00013b: 44 02 00 00 00 00 00 00 00 | f64.const 0x1p-1073
    000144: 1a                         | drop
  - 000145: 44 02 00 00 00 00 00 00 80 | f64.const -0x1.p-1073
  + 000145: 44 02 00 00 00 00 00 00 80 | f64.const -0x1p-1073
    00014e: 1a                         | drop
  - 00014f: 44 02 00 00 00 00 00 00 00 | f64.const 0x1.p-1073
  + 00014f: 44 02 00 00 00 00 00 00 00 | f64.const 0x1p-1073
    000158: 1a                         | drop
  - 000159: 44 02 00 00 00 00 00 00 80 | f64.const -0x1.p-1073
  + 000159: 44 02 00 00 00 00 00 00 80 | f64.const -0x1p-1073
    000162: 1a                         | drop
  - 000163: 44 02 00 00 00 00 00 00 00 | f64.const 0x1.p-1073
  + 000163: 44 02 00 00 00 00 00 00 00 | f64.const 0x1p-1073
    00016c: 1a                         | drop
  - 00016d: 44 02 00 00 00 00 00 00 80 | f64.const -0x1.p-1073
  + 00016d: 44 02 00 00 00 00 00 00 80 | f64.const -0x1p-1073
    000176: 1a                         | drop
  - 000177: 44 02 00 00 00 00 00 00 00 | f64.const 0x1.p-1073
  + 000177: 44 02 00 00 00 00 00 00 00 | f64.const 0x1p-1073
    000180: 1a                         | drop
  - 000181: 44 02 00 00 00 00 00 00 80 | f64.const -0x1.p-1073
  + 000181: 44 02 00 00 00 00 00 00 80 | f64.const -0x1p-1073
    00018a: 1a                         | drop
    00018b: 44 03 00 00 00 00 00 10 00 | f64.const 0x1.0000000000003p-1022
    000[194](https://github.com/WebAssembly/wabt/actions/runs/7068323834/job/19242756027?pr=2342#step:14:195): 1a                         | drop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants