Queue Disciplines and Linux SocketCAN

The purpose of this topic is to discuss what to do about frame queuing on Linux for UAVCAN.

The current libuavcan implementation provides a single priority queue in user-space that sits in front of SocketCAN which is an interesting choice since most CAN devices appear to default to using the pfifo_fast qdisc. In this configuration significant latency can be added as a frame goes first through the user-space queue and then through the unprioritized pfifo_fast queues* and finally through any queues in the peripheral itself (which may also be unprioritized FIFO queues). For broadcast frames this is a fairly linear but still significant amount of latency that comes with priority inversions that are not supposed to happen with CAN (for service call latency see the research on bufferbloat for a more nuanced evaluation). Because of this problem, as I work on libuavcan v1 I am struggling to provide sane defaults for code that is supposed to be consistent between bare-metal and OS integrations. There are a few approaches I can see. Please provide feedback on these options or suggest other options I may not be considering:

Do everything in user-space. Require systems to configure themselves properly.

In this option we would provide priority queues as part of the common media layer implementation. Linux systems would need to set the can device’s qdisc to noqueue. All platforms would need to ensure that their CAN driver did not employ overly large queues, that these internal queues were prioritized, and that the queues were adequately sized to prevent buffer under-run.

Require the platform to be optimal.

In this option libuavcan would act naively assuming that the system’s APIs provided optimal queueing behaviour. System integrators would need to understand how to implement or select proper queues, how to avoid the bufferbloat problem for CAN, and how to prevent buffer under-run at the peripheral level.

Provide software to do queuing in user-space but allow/require a system to configure this in as needed.

This is a hybrid of the two previous options where we implement a naive media layer but provide components and documentation for how to optimally assemble a system integration.

* My assumption is that because pfifo_fast choses buckets based on TOS bits that this is either undefined or simply not functional when given CAN frames?

Codel

An interesting aside: the bufferbloat community seems to be converging on CoDel as the state-of-the-art for linux queueing disciplines (specifically fq_codel) and many linux distros are changing to this as a default. This means a SocketCAN device can get CoDel as a default which is a problem as CoDel drops some frames as a normal part of the algorithm’s operation. Most of the kernel experts suggest noqueue or pfifo_fast as the best default for CAN devices but there are bound to be bugs where this is not applied to CAN devices and a system is dropping CAN frames in the kernel severely degrading the expected performance of UAVCAN on the system. Because of this we should be proactive about discussing and documenting SocketCAN queueing for users of libuavcan.

References

Patch Series to make pfifo_fast the default for CAN
SocketCAN
pfifo_fast on tldp.org
bufferbloat.net
Codel on Bufferbloat.net
Van Jacobson, “Controlling Queue Delay”, 2012
M. Sojka, R. Lisov´y, P. P´ıˇsa, “SocketCAN and queueing disciplines”, 2012
M. Sojka, P. P´ıˇsa, “Timing Analysis of Linux CAN Drivers”

In the interest of portability we should probably minimize the number of non-trivial assumptions made about the underlying platform, so I would suggest following the option #2 Require the platform to be optimal.

Because we just ran into this I thought I’d C&P the linux commands for changing the qdisc

 sudo tc qdisc replace dev can0 root noqueue
 sudo tc qdisc replace dev can0 root pfifo_fast

pyuavcan v0 crashes with noqueue so you’ll want to use pfifo_fast. Libuavcan v1 should work with noqueue if you want to minimize latency at the risk of blocking tx calls

1 Like

I think I am occasionally observing frame reordering with vcan with qdisc=noqueue (using PyUAVCAN). My understanding was that noqueue cannot cause reordering since by its definition it does not queue frames, hence the problem is likely elsewhere. Are you aware of any other possible culprit?