Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serial RX Latency and Frame Timing per LIN Specification #27

Open
kbader94 opened this issue Aug 9, 2024 · 3 comments
Open

Serial RX Latency and Frame Timing per LIN Specification #27

kbader94 opened this issue Aug 9, 2024 · 3 comments
Labels

Comments

@kbader94
Copy link
Contributor

kbader94 commented Aug 9, 2024

Although it's been discussed elsewhere, I'm opening an issue here dedicated to discussing our timing requirements for publishing slave node responses so we have a hard target to work towards.

The Specification

Taken from the LIN v2.2 spec - Section 2.3.2:

The maximum space between the bytes is additional 40% duration compared to the
nominal transmission time. The additional duration is split between the header (the
master task) and the frame response (a slave task). This yields:
THeader_Maximum = 1.4 * THeader_Nominal (6)
TResponse_Maximum = 1.4 * TResponse_Nominal (7)
TFrame_Maximum = THeader_Maximum + TResponse_Maximum (8)
The maximum length of the header, response and frame is based on the nominal time
for a frame (based on the FNom as defined in section 6.3). Therefore the bit tolerances
are included in the maximum length.
Example: A master node that is 0.5% slower than FNom will have to be within
1.4*THeader_Nominal.
All subscribing nodes shall be able to receive a frame that has a zero overhead, i.e.
that is TFrame_Nominal long.
Tools and tests shall check the TFrame_Maximum. Nodes shall not check this time. The
receiving node of the frame shall accept the frame up to the next frame slot (i.e. next
break field), even if it is longer then TFrame_Maximum

The relevant part being:

TResponse_Maximum = 1.4 * TResponse_Nominal

My Calculations

The nominal tranmission time for one byte(10 bits w start and stop bits) at 19200 bps is:
10/19200 = 521 microseconds

Therfore, the maximum LIN frame response time for a one byte response is:
521 * 1.4 = 729 microseconds

RFC

Any feedback regarding my interpretation of the specification or accompanying calculations would be much appreciated!

@ppisa
Copy link
Member

ppisa commented Aug 9, 2024

Thanks for elaboration, but I am not sure if you propose some change. Take in account that on master side the slLIN can be and should much more tolerant because different kinds of UARTs have quite significant timeout latency to push Rx received bytes when FIFO is not filled above threshold.

On slave side, we have problem on slLIN side on many UARTs. There sometimes option to disable FIFO or set threshold to 1 to fulfill specification. But in general the inter-operation with master taking standard strictly is problematic. generic API to allow request shorter timeout or threshold 1 generic way to low level serial drivers would be big these forward on these, where hardware allows tuning or switch FIFO off. Se proposals documented in #13. But I have only little time for this project and have much higher priorities in another pending project. If part of something funded it can help to look even for some student etc.

It is holiday time now, so I will not be available on-line next week even till end of September can be missing for week or more.

@kbader94
Copy link
Contributor Author

kbader94 commented Aug 9, 2024

I am not proposing any changes yet, only seeking clarification. Reason being is I am working on implementing a generic kernel level ioctl for setting rx trigger levels. The problem I have is that my testing reveals that even setting the rx trigger level to one byte does not get our latency low enough. I get a 5000 microsecond RTT without setting the rx trigger. Setting the fifo rx trigger to one byte gets us down to about 1400 microseconds, but this is still double what I calculate it should be(if my understanding and calculations are correct).

We may need further changes in the kernel to reduce latency to within spec. After briefly reviewing the kernel, I believe I have some ideas to further reduce serial rx latency. In newer kernels(since 3.12) The low_latency flag doesn't do anything in tty drivers anymore (it used to immediately push
rx data to the flip buffer). Instead the processing of the flip buffer is placed into a workqueue, which my testing reveals adds about 200 microseconds on my system, per interrupt. My testing reveals this can be reduced to 8 microseconds by using a dedicated high prio thread to process and flush the work queue. I believe this issue may also be further compounded by the 8250 drivers irq handling, which for some reason loops through and calls the handler for every port, therefore scheduling workqueue tasks for every other port ahead of the one which actually received the data.

Again, I'm not proposing any changes yet, just looking to discuss these issues at the moment. I do understand you are busy with other work. I don't intend to take you away from your other priorities but I certainly do appreciate your feedback! I do plan on digging into these issues more in depth and will post back if I find anything else. I plan on publishing some of these tests and results in another repo in the short future, but in the meantime I hope you enjoy your holidays!

@ppisa
Copy link
Member

ppisa commented Aug 9, 2024

Thanks for analysis and big kudos that you try to propose the FIFO and timeout control into serial drivers. As for the timing then I would not strive to be so strict. The specification states

Tools_ and tests shall check the TFrame_Maximum. Nodes shall not check this time. The receiving node of the frame shall accept the frame up to the next frame slot (i.e. next break field), even if it is longer then TFrame_Maximum.

Which I consider too loose and the actual PEAK's and other converters had some limit from my former experience. But it is not so critical. Master side should be OK as it is implemented, Tx is attempted in one go, Rx is collected whole including header and some delay on master to process is no problem and there has to be reserve. Problematic is slave side and when we consider that header is sent typically without gaps and there is TFrame_Maximum 41.410 bit times and we can consider that actual data are sent in 310 bit times for header and 1*10 bit times for response then there is 16 bit times for gap between header and response. It is strange that specification does not specify maximal time between header and response which would be very helpful to detect missing slave early, but it is not there.

So my wish for API down to drivers is simplified down to https://marc.info/?l=linux-serial&m=164259988122122&w=2

  int (*rx_trigger)(struct uart_port *, int mode, int *rx_trigger_bytes,
                  int *rx_trigger_iddle_time)

Where the rx_trigger_bytes equal to 0 would mean switch FIFO off. Trigger time can be in nanoseconds and would be recomputed to number of characters for some, most of the HW.

The mode is one of

  • UART_RX_TRIGGER_MODE_SET
  • UART_RX_TRIGGER_MODE_CHECK_ROUND_DOWN
  • UART_RX_TRIGGER_MODE_CHECK_ROUND_UP
  • UART_RX_TRIGGER_MODE_GET

When you attempt something unsupported by UART_RX_TRIGGER_MODE_SET, you receive error, but you can ask hardware to round your request such way that it will pass for next attempt. If the argument is time (ns) based, then the rule is that triggers resets by each baudrate change and has to be set again by the call.

If we add some reserve and go down to 16 bit times gap to 10 bit times gap then we are on that 500 usec for 19200. This is reasonable. I would not to try hard to achieve something better. Yes that is right that fully preemptive kernels on x86 archive great results in timming, around even below 20 usec for highest priority task under load id you do not have broken SMI in BIOS, so 30 kHz sampling from our Simulink target has been achievable on x86 box. But that is timing of pure tasks without any HW interaction. If you start to consider that single I/O memory read or input instruction round trip is up to 2 usec on PCIe systems etc. There are DMA bus loads and contention then it is really problematic to depend on faster timing. On moderate ARMs it is even much worse with kernel latency even that actual I/O operations are faster. Yes, you can play with FIQ or directly raw IRQ even on RT fully-preemptive kernel but that would be nightmare to make it portable. We have no problems with 2 kHz with reserve even under on Zynq-7000 and different Raspberry Pi systems. For kernel tasks it is better because there is no memory context switch to high prio task. We have success to process 20 kHz software incremental encoder processing on RT kernel on Raspberry Pi 2 years ago. But you cannot trust such solution at this border case. So I would consider response in 500 usec from slave as reasonable goal.

So I would suggest this goal. Byte time equivalent timeout is achievable on these UARTs which has timeout configurable in byte times, there are some exceptions where shorter times can be specified, ns unit is future proof... Anyway on most UARTs even trigger level is configurable only in FIFO size quarters, so only chance for LIN is to switch FIFO off completely. Some have option for one character trigger level with FIFO on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants