[BUG]: Missing TP segments corrupt all following TP messages #737

siggie0815 · 2024-07-10T13:52:10Z

vSomeip Version

v3.4.10

Boost Version

any

Environment

All

Describe the bug

When we tested communication with our sensors, we faced problems with bad E2E CRC checks, occasionally. When the problem occurs, it remains until communication is reset. The messages are notifications, segmented using TP and protected with E2E.

My understanding of the problem is the following:
If a single TP segment gets lost, the vsomeip tp-reassembler cannot finish this message. So far so good.

However, now next message is received, segment by segment. The old message is still there waiting to be completed. So for the first few segments we might get a duplicate segement error. As soon as the missing segment from the old message is received, the message is regarded as complete and returned. Then the E2E check is being processed. As we have reassembled the message from segments from actually two consecutive messages the CRC check fails and we have garbage data.

From now on, all messages will be reassembled from mixed segments without a duplicate segment error on the log. Hence the CRC will fail for all messages and the data is actually garbage.

e.g.:

message consists of 6 segments (0...5)
receive segments 0,2,3,4,5 and loose segment 1 from the first message
receive segment 0 of second message -> duplicate segment error
receive segment 1 of second message -> message is complete and returned --> CRC error
segment 2,3,4,5 added to new tp message
receive segment 0 an 1 from third message -> added to previous message, complete and return --> CRC error
...

Reproduction Steps

It's hard to reproduce. Somehow remove one TP segment from the communication.

Expected behaviour

In my opinion a missing segment should not invalidate all upcoming traffic.

The problem could be resolved by various actions:

Lower the message reassembling timeout to less than the message frequency. So incomplete messages will be deleted before the next message arrives. The timeout is 5 seconds hardcoded at the moment and hence not very helpful.
Force start of a new TP message by segments with offset zero. Remaining incomplete messages will be discarded. Other segments cannot start a new tp message.

I think the first solution is the better one, as it does not introduce as many implications on the order of the segments arriving.

Logs and Screenshots

No response

siggie0815 · 2024-07-10T14:37:57Z

After revising the SOME/IP TP Spec, I changed my mind: The spec is quite particular about when message reassembly should be interrupted and a new message should start. In other word vsomeip does not obey the specs in this regard.

https://www.autosar.org/fileadmin/standards/R20-11/CP/AUTOSAR_SWS_SOMEIPTransportProtocol.pdf

In section 7.3.1 it says that a message with offset 0 shall start a new disassembly session.
In section 7.3.3 it clearly makes sure that the segments have to be received in order.

I will try if I can fix the problems and file a pull request.

duartenfonseca · 2024-10-15T14:41:52Z

hi @siggie0815 could you try and test with this PR: #783 and see if this fixes the issue. thanks!

siggie0815 added the bug label Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Missing TP segments corrupt all following TP messages #737

[BUG]: Missing TP segments corrupt all following TP messages #737

siggie0815 commented Jul 10, 2024

siggie0815 commented Jul 10, 2024

duartenfonseca commented Oct 15, 2024

[BUG]: Missing TP segments corrupt all following TP messages #737

[BUG]: Missing TP segments corrupt all following TP messages #737

Comments

siggie0815 commented Jul 10, 2024

vSomeip Version

Boost Version

Environment

Describe the bug

Reproduction Steps

Expected behaviour

Logs and Screenshots

siggie0815 commented Jul 10, 2024

duartenfonseca commented Oct 15, 2024