Version: 21.05 (which is not available in the Bugzilla version field) When testing with ConnectX-6 we recently became aware that a lot of missed packets did not show up in the statistics. After some further testing it looks like this is somehow related to using a large maximum packet size (which I believe makes the driver use a different scatter-gather receive path). The following testpmd invocations can be used to demonstrate this behavior. First an example for a "normal" run without the large maximum packet size: ./app/dpdk-testpmd -a c1:00.0 -a c1:00.1 -n 4 --legacy-mem -- --total-num-mbufs=2000000 --rx-offloads=0x2800 --mbuf-size=2331 --rxd=4096 The xstats for one port look like this: rx_good_packets: 540138902 tx_good_packets: 537739805 rx_good_bytes: 298771912199 tx_good_bytes: 171950534158 rx_missed_errors: 572790 rx_errors: 0 tx_errors: 0 rx_mbuf_allocation_errors: 0 rx_q0_packets: 540138902 rx_q0_bytes: 298771912199 rx_q0_errors: 0 tx_q0_packets: 537739805 tx_q0_bytes: 171950534158 rx_wqe_errors: 0 rx_unicast_packets: 540711692 rx_unicast_bytes: 301254119575 tx_unicast_packets: 537739805 tx_unicast_bytes: 171950534158 rx_multicast_packets: 0 rx_multicast_bytes: 0 tx_multicast_packets: 0 tx_multicast_bytes: 0 rx_broadcast_packets: 0 rx_broadcast_bytes: 0 tx_broadcast_packets: 0 tx_broadcast_bytes: 0 tx_phy_packets: 537739805 rx_phy_packets: 540719221 rx_phy_crc_errors: 0 tx_phy_bytes: 174101493378 rx_phy_bytes: 301258662491 rx_phy_in_range_len_errors: 0 rx_phy_symbol_errors: 0 rx_phy_discard_packets: 7529 tx_phy_discard_packets: 0 tx_phy_errors: 0 rx_out_of_buffer: 0 tx_pp_missed_interrupt_errors: 0 tx_pp_rearm_queue_errors: 0 tx_pp_clock_queue_errors: 0 tx_pp_timestamp_past_errors: 0 tx_pp_timestamp_future_errors: 0 tx_pp_jitter: 0 tx_pp_wander: 0 tx_pp_sync_lost: 0 For this particular testcase the sum of rx_good_packets, rx_missed_errors and rx_phy_discard_packets is always the expected total packet count of 540719221. If however testpmd is invoked like this: ./app/dpdk-testpmd -a c1:00.0 -a c1:00.1 -n 4 --legacy-mem -- --total-num-mbufs=2000000 --max-pkt-len=15360 --rx-offloads=0x2800 --mbuf-size=2331 --rxd=4096 The xstats after the testcase run look like this: rx_good_packets: 521670616 tx_good_packets: 522641593 rx_good_bytes: 288980135079 tx_good_bytes: 167591285708 rx_missed_errors: 879662 rx_errors: 0 tx_errors: 0 rx_mbuf_allocation_errors: 0 rx_q0_packets: 521670616 rx_q0_bytes: 288980135079 rx_q0_errors: 0 tx_q0_packets: 522641593 tx_q0_bytes: 167591285708 rx_wqe_errors: 0 rx_unicast_packets: 522550278 rx_unicast_bytes: 291559156800 tx_unicast_packets: 522641593 tx_unicast_bytes: 167591285708 rx_multicast_packets: 0 rx_multicast_bytes: 0 tx_multicast_packets: 0 tx_multicast_bytes: 0 rx_broadcast_packets: 0 rx_broadcast_bytes: 0 tx_broadcast_packets: 0 tx_broadcast_bytes: 0 tx_phy_packets: 522641593 rx_phy_packets: 540719221 rx_phy_crc_errors: 0 tx_phy_bytes: 169681852080 rx_phy_bytes: 301258662491 rx_phy_in_range_len_errors: 0 rx_phy_symbol_errors: 0 rx_phy_discard_packets: 30665 tx_phy_discard_packets: 0 tx_phy_errors: 0 rx_out_of_buffer: 0 tx_pp_missed_interrupt_errors: 0 tx_pp_rearm_queue_errors: 0 tx_pp_clock_queue_errors: 0 tx_pp_timestamp_past_errors: 0 tx_pp_timestamp_future_errors: 0 tx_pp_jitter: 0 tx_pp_wander: 0 tx_pp_sync_lost: 0 The rx_good_packets, rx_missed_errors and rx_phy_discard_packets counters never sum up to the expected packet count: 521670616 + 879662 + 30665 = 522580943 540719221 - 522580943 = 18138278 (packets not accounted for)
I had to modify this ticket as we did some further testing and it looks like the difference between the two tests in my previous comment just came from the different performance characteristics of the testpmd configurations. If we increase the amount of traffic the NIC will always drop packets which are not accounted for in the statistics. Even with a testpmd invocation like ./app/dpdk-testpmd -a c1:00.0 -a c1:00.1 -n 4 --legacy-mem -- --total-num-mbufs=2000000 And with high loads there is a large number of packets missing in the statistics. In one instance where 2705797321 packets were generated 808668458 packets were missing in the statistics. The ConnectX-6 is running the latest firmware version 22.30.1004.
We made another interesting discovery: If you list the interface statistics using ethtool like e.g. `ethtool -S eth2` there is a counter called rx_prio0_buf_discard and this counter exactly sums up the number of packets that we are missing. But it looks like this counter does not reflect the actual interface but rather the whole card as it sums up the missing packets of both interfaces and the counter is the same for (in our case) eth2 and eth3. What does the counter rx_prio0_buf_discard mean and how can those misses be avoided? Why is this not reflected in the DPDK statistics?
Hi Martin, I was trying to replicate this issue on my side, but I was unable to do it. Do you still experience this issue? If so, could you please provide more details regarding: - DPDK version (if you moved to version newer than 21.05 and the issue still occurs), - OFED version, - kernel command line settings, - specific NIC model (output of mlxfwmanager command should be enough), - traffic specifics (throughput, packet structure, etc.). > What does the counter rx_prio0_buf_discard mean and how can those misses be > avoided? > > Why is this not reflected in the DPDK statistics? I need to verify the details about this counter, but let me get back to you later with more information.
I've observed this issue on a new ConnectX-6 DX recently installed on our server: . Image type: FS4 FW Version: 22.32.2004 FW Release Date: 13.1.2022 Product Version: 22.32.2004 Rom Info: type=UEFI version=14.25.18 cpu=AMD64,AARCH64 type=PXE version=3.6.502 cpu=AMD64 Description: UID GuidsNumber Base GUID: 08c0eb0300a5614a 4 Base MAC: 08c0eba5614a 4 Image VSD: N/A Device VSD: N/A PSID: MT_0000000359 Security Attributes: N/A . . DEVICE_TYPE MST PCI RDMA NET NUMA ConnectX6DX(rev:0) /dev/mst/mt4125_pciconf0.1 81:00.1 mlx5_1 net-enp129s0f1 -1 ConnectX6DX(rev:0) /dev/mst/mt4125_pciconf0 81:00.0 mlx5_0 net-enp129s0f0 -1 . I'm observing packets being dropped and the only counter increasing is rx_prio0_buf_discard, which is visible in ethtool but not in any DPDK counter. DPDK reports no error, no missed packets, no mbuf allocation failures nor any other error, but packets are lost - The only way to tell in DPDK that something is lost is to compare rx_good_packets and rx_phy_packets (That are not equal) but the error/miss counters are all 0. Would be good to have error/miss counters to reflect the drops.
Hello Filip and Martin, In DPDK 22.07 we implemented the following: http://patches.dpdk.org/project/dpdk/patch/20220526024941.1296966-1-rongweil@nvidia.com/ Add two kinds of Rx drop counters to DPDK xstats which are physical port scope. 1. rx_prio[0-7]_buf_discard The number of unicast packets dropped due to lack of shared buffer resources. 2. rx_prio[0-7]_cong_discard The number of packets that is dropped by the Weighted Random Early Detection (WRED) function. Prio[0-7] is determined by VLAN PCP value which is 0 by default. Both counters are retrieved from kernel ethtool API which calls PRM command finally.