-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Hi Team,
one of the customer using our solution, a custom image which is based on Linux kernel 4.14.51 and Centos 7.8. Customer is facing random traffic loss in production on netvsc interfaces (non-accelerated) . P.S. deployment size is ~600 VM instances.
- on problematic instances 'rx_comp_busy' is always non zero(=1) and a high of 'tx_send_full' as shown below
-bash-4.2# ethtool -S eth2
NIC statistics:
tx_scattered: 0
tx_no_memory: 0
tx_no_space: 0
tx_too_big: 0
tx_busy: 0
tx_send_full: 75776 <<<<<<<<<<<<<
rx_comp_busy: 1 <<<<<<<<<<<<<<<<
vf_rx_packets: 0
vf_rx_bytes: 0
vf_tx_packets: 0
vf_tx_bytes: 0
vf_tx_dropped: 0
tx_queue_0_packets: 48323650
tx_queue_0_bytes: 9856533412
rx_queue_0_packets: 70704892
rx_queue_0_bytes: 6523868834
tx_queue_1_packets: 44242587
tx_queue_1_bytes: 9561505139
rx_queue_1_packets: 67683390
rx_queue_1_bytes: 6248204528
tx_queue_2_packets: 45780035
tx_queue_2_bytes: 10119440310
rx_queue_2_packets: 69738233
rx_queue_2_bytes: 6443619208
tx_queue_3_packets: 44413637
tx_queue_3_bytes: 9640385380
rx_queue_3_packets: 69258427
rx_queue_3_bytes: 6396199857
tx_queue_4_packets: 96161043
tx_queue_4_bytes: 43152567515
rx_queue_4_packets: 68506662
rx_queue_4_bytes: 6329763902
tx_queue_5_packets: 42685859
tx_queue_5_bytes: 9232930840
rx_queue_5_packets: 68869195
rx_queue_5_bytes: 6360734718
tx_queue_6_packets: 44105935
tx_queue_6_bytes: 9641517238
rx_queue_6_packets: 71297219
rx_queue_6_bytes: 6568436535
tx_queue_7_packets: 44680296
tx_queue_7_bytes: 9764630663
rx_queue_7_packets: 70747471
(2) we have rebuild the kernel with below 2-patches as this symptom (napi gets disable when ring is temporary busy ) is similar to issue mentioned in #36
(1) hv_netvsc: Fix napi reschedule while receive completion is busy
(2) hv_netvsc: fix race that may miss tx queue wakeup
(3) now with these patches there is some improvement in the sense few instances are getting into this problem , but issue still persists(~5 out of ~200) . On these bad instances ethtool stats shows very high 'rx_comp_busy & tx_send_full' as shown below. I think super high 'rx_comp_busy' is expected after these patches
-bash-4.2# ethtool -S eth2
NIC statistics:
tx_scattered: 0
tx_no_memory: 0
tx_no_space: 0
tx_too_big: 0
tx_busy: 0
tx_send_full: 417979<<<<<<<<<<<<<<<<<<<
rx_comp_busy: 36978379935<<<<<<<<<<<<<< rapid fast increments
vf_rx_packets: 0
vf_rx_bytes: 0
vf_tx_packets: 0
vf_tx_bytes: 0
vf_tx_dropped: 0
tx_queue_0_packets: 22487545
tx_queue_0_bytes: 4594218563
rx_queue_0_packets: 33816104
rx_queue_0_bytes: 3148800004
tx_queue_1_packets: 23095847
tx_queue_1_bytes: 4629433827
rx_queue_1_packets: 34169457
rx_queue_1_bytes: 3198473995
tx_queue_2_packets: 22235899
tx_queue_2_bytes: 4554101089
rx_queue_2_packets: 35447873
rx_queue_2_bytes: 3306351633
tx_queue_3_packets: 22655564
tx_queue_3_bytes: 4658776077
rx_queue_3_packets: 34320559
rx_queue_3_bytes: 3200636386
tx_queue_4_packets: 43152346
tx_queue_4_bytes: 17461777045
rx_queue_4_packets: 34941411
rx_queue_4_bytes: 3240195702
tx_queue_5_packets: 22992696
tx_queue_5_bytes: 4613837166
rx_queue_5_packets: 32975505
rx_queue_5_bytes: 3079512739
tx_queue_6_packets: 22535083
tx_queue_6_bytes: 4672503110
rx_queue_6_packets: 33796904
rx_queue_6_bytes: 3159691807
tx_queue_7_packets: 22452840
tx_queue_7_bytes: 4584966389
rx_queue_7_packets: 33860772
rx_queue_7_bytes: 3155304090
rx_queue_7_bytes: 6525418289
I would request azure team to provide list of patches that we can try with 4.14.51 kernel as LIS option is not applicable to us.
please let me know if I can provide any additional details.