Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offsets should not be allowed in expressions #640

Closed
mikir opened this issue Jun 20, 2024 · 11 comments
Closed

Offsets should not be allowed in expressions #640

mikir opened this issue Jun 20, 2024 · 11 comments
Assignees
Labels
bug Something isn't working core Zserio core module
Milestone

Comments

@mikir
Copy link
Contributor

mikir commented Jun 20, 2024

Consider the following schema:

struct WrongSchema
{
    uint32 myOffset;
    uint8 optionalValue if myOffset == 4;
myOffset:
   string stringValue;
};

Such schema is not possible to fill by data correctly. Consider how the myOffset will be calculated and what will happen during write and read operation.

  1. Let's say that myOffset will be initialized to 0.
  2. Let's say that optionalValue will be set to 0.
  3. Then myOffset will be set to 4 (4 bytes because of uint32 type + 0 because of optional uint8 which is not there 0 != 4).
  4. Writer will write myOffset == 4, optionalValue = 0, strigValue = "something".
  5. Reader will fail because myOffset == 4 but it should 5 because optional is suddenly there.

We should reconsider usage of offsets:

One easy solution could be just to disable offsets in any Zserio expression.

Another solution could be that expressions with offsets can be used only after offset label. This is quite complicated and it is a question if this is usefull at all. For example, the following schema is ok:

struct CorrectSchema
{
    uint32 myOffset;
myOffset:
   string stringValue;
    uint8 optionalValue if myOffset == 4;
};
@mikir mikir added bug Something isn't working core Zserio core module labels Jun 20, 2024
@mikir mikir added this to the 2.15 milestone Jun 20, 2024
@fklebert
Copy link
Contributor

I fully support the idea to not use offsets in expressions at all. I can not think of any meaningful use case that would require this.

In addition I want to drop an idea:
Actually I think that an offset should be a dedicated read-only type with an own keyword.
So something like:

struct Schema
{
    offset myOffset;
    uint8 someValue;
    varsize someVariableSizedValue;
myOffset:
   string stringValue;
};

Any offset would be read-only in the API and only would have a getter. During writing the offset would be initialized correctly (without me having to call initializeOffsets() or anything like that).

@Mi-La
Copy link
Contributor

Mi-La commented Jun 20, 2024

What about offset arrays?

struct Schema
{
    offset myOffsetArray[];
    uint8 someValue;
myOffsetArray[@index]:
    string stringArray[];
};

What should zserio return before writing? Is an empty array ok?

@mikir
Copy link
Contributor Author

mikir commented Jun 21, 2024

Actually, the requirement that during writing the offsets should be initialized correctly and automatically together with no setters is very strict.

Consider this:

struct Schema
{
    uint32 myOffsetArray[];
    uint8 someValue;
myOffsetArray[@index % someValue]:
    string stringArray[];
};

To calculate automatically the size of myOffsetArray is not trivial considering that user can invent any expression in offset label. And such calculation must be done because then we are not able to calculate the offset of stringArray.

If we need such strict requirement which it probably makes sense from the user point view, we must be very strict in the usage of the offsets:

  1. Only myOffsetArray[@index] will be allowed. No other expressions in brackets [] will be allowed.
  2. Each offset can be used only once as offset label. This is something which is not checked at all currently in Zserio. And implementation of such checking without these restrictions is very difficult. For example, consider this legal usage:
struct Schema
{
    uint32 myOffsetArray[2];
    uint8 someValue;
myOffsetArray[0]:
    string stringValue1;
myOffsetArray[1]:
    string stringValue2;
};

Putting all together, we must implement a huge restrictions for usage of offsets at the beginning if we want to improve usability. This could potentially break some schemas but we probably might to afford it together with change of the documentation that such usage of offsets is illegal.

Regarding new offset keyword, I support this idea but I can see one drawback. Currently, user can optimize the memory for offsets knowing the maximum size of the blob. For example by using of the type uint16. This won't be possible by new keyword. We will use probably uint32 for offset allowing 4GB blobs as the maximum. Of course, we can invent something like offset16, offset32, offset64 but not sure if this is not too complicated. Besides of that, we cannot disable usage of normal types for offsets like uint16 myOffset because of the backward compatibility.

@fklebert
Copy link
Contributor

I was thinking about using varsize as the underlying datatype for offset. Still never below 1 byte, but better than hardcoded 4 bytes.
I also support the idea of not allowing expressions in offset labels.

What about offset arrays?

struct Schema
{
    offset myOffsetArray[];
    uint8 someValue;
myOffsetArray[@index]:
    string stringArray[];
};

What should zserio return before writing? Is an empty array ok?

I guess empty array would be OK.

@mikir
Copy link
Contributor Author

mikir commented Jun 21, 2024

Actually, varsize is evil and it has been already disabled in Zserio. The problem was that you are not possible to calculate bit size of varsize in advance without knowing the value of the offset. So, you are not able to calculate the offset at all. In another words, you must have 'stable' types for the offsets.

@fklebert
Copy link
Contributor

Actually, varsize is evil and it has been already disabled in Zserio. The problem was that you are not possible to calculate bit size of varsize in advance without knowing the value of the offset. So, you are not able to calculate the offset at all. In another words, you must have 'stable' types for the offsets.

Ah... I see... makes sense.

@Mi-La
Copy link
Contributor

Mi-La commented Jun 21, 2024

Only myOffsetArray[@Index] will be allowed. No other expressions in brackets [] will be allowed.

Then we could even skip the @index keyword and write just the brackets - or we could just use the array without brackets at all :-):

    uint32 myOffsetArray[];
myOffsetArray[]:
    string stringArray[];

or yet better:

    uint32 myOffsetArray[];
myOffsetArray:
    string stringArray[];

@MisterGC
Copy link
Contributor

MisterGC commented Jun 21, 2024

Dedicated Offset Types: I support the idea of explicit types and don't see a problem with having different variants. @mikir, why do you think this could be too complicated? Are there implementation concerns? Users need to choose the offset field size anyway. We also need to consider backward compatibility. Using plain numeric types could generate a warning like, "Using offset fields with plain numeric fields like uint32 is not recommended; use offset32 instead."

Improved Syntax: I like the improved syntax suggestion. For compatibility, @index should remain optional. I prefer the following format:

    uint32 myOffsetArray[];
myOffsetArray[]:
    string stringArray[];

This way, it's clear at first glance whether it's an offset to the stringArray or an offset array to each element, even if the field definition is not close to the offset label.

@mikir
Copy link
Contributor Author

mikir commented Jun 21, 2024

Dedicated Offset Types: I support the idea of explicit types and don't see a problem with having different variants. @mikir, why do you think this could be too complicated? Are there implementation concerns? Users need to choose the offset field size anyway. We also need to consider backward compatibility. Using plain numeric types could generate a warning like, "Using offset fields with plain numeric fields like uint32 is not recommended; use offset32 instead."

Actually, I had two minor things in my mind:

  • Schema evolution
    If user decided to use for example offset16 and after schema evolution the blob will be significantly increased, then incompatible change to offset32 will be needed.
  • Offset overflow checking
    We will probably need to check the offset overflow properly in case of lower size offset types.

As you pointed out, we have currently the same problems with offsets anyway, so probably this is not a big issue.

Improved Syntax: I like the improved syntax suggestion. For compatibility, @index should remain optional. I prefer the following format:

    uint32 myOffsetArray[];
myOffsetArray[]:
    string stringArray[];

We should just check the abnormal usage:

    varsize myOffsetArraySize;
    varsize myStringArraySize;
    uint32 myOffsetArray[myOffsetArraySize];
myOffsetArray[]:
     string stringArray[myStringArraySize];

@mikir
Copy link
Contributor Author

mikir commented Jun 21, 2024

Because we have discussed 4 different ideas in this issue, I have created three new issues where we can continue discussions:

#641 - Offset expressions should be disabled
#642 - Consider to simplify indexed offsets syntax
#643 - Consider to introduce new zserio type for offsets

@mikir
Copy link
Contributor Author

mikir commented Jul 2, 2024

Don't forget to update documentation with some more detailed description of the offsets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core Zserio core module
Projects
None yet
Development

No branches or pull requests

4 participants