Why should RLP(0) be 0x80

swaldmanswaldman Member Posts: 2
So, I'm sure I'm just being dumb. But that's never stopped me before.

In the tests for RLP (defined here), the RLP encoding for a single byte 0 is expected to be hex 0x80.

The yellow paper indicates (eqs 151 & 152) that for a byte sequence of length 1 where the value of the byte (treated, I think, as an unsigned value) is less than 128, the RLP encoding of the byte is just the byte itself. The tests for 1, 16, 79, and 127 are consistent with that, but not 0. Why shouldn't the RLP encoding of a single byte 0 be 0?

The only explanation I can think of would be taken from this:

"If RLP is used to encode a scalar, defined only as a positive integer, it must be specified as the shortest byte array such that the big-endian interpretation of it is equal...When interpreting RLP data, if an expected fragment is decoded as a scalar and leading zeroes are found in the byte sequence, clients are required to consider it non-canonical and treat it in the same manner as otherwise invalid RLP data, dismissing it completely."

If the zero byte itself is treated as a superfluous (and therefore invalid) leading zero, then it could be stripped to an empty byte string, whose value would be the 0x80 expected by the test. But this seems wrong to me, it's peculiar to represent treat a single 0 byte as a superfluous leading zero, and doing so renders it impossible to distinguish the encoding of integer 0 from a zero string, and renders the encoding of integer zero distinct from the encoding of a length-one byte string containing the byte zero, which should still be byte 0.

Am I misunderstanding something? Should one just force an implementation to conform to the tests?
Sign In or Register to comment.