TSO GRO RSS Features
This guide documents the features for Service Engine groups, such as TSO, GRO, RSS and mulitple dispatchers and queues.
TCP Segmentation Offload (TSO)
TCP segmentation offload is used to reduce the CPU overhead of TCP/IP on fast networks. A host with TSO-enabled hardware sends TCP data to the NIC (Network Interface Card) without segmenting the data in software. This type of offload relies on the NIC to segment the data and then add the TCP, IP, and data link layer headers to each segment.
TSO Support in Routing
With routing support enabled in SE, GRO (Generic Receive Offload) feature cannot be utilised because routing is stateless and SE will not be able to segment the large GRO coalesced packet, if the packets are not allowed to be IP fragmented. Hence with the support of this feature GRO can be utilised for the routed traffic where SE will be able to segment the larger packets into smaller TCP segments either by using the TSO if supported by the interface or the routing layer in SE.
During the three-way handshake both client and server advertise their respective MSS so that the peers will not send TCP segments larger than the MSS. This is enabled by default.
Generic Receive Offload (GRO)
Generic Receive Offload (GRO) is a software technique for increasing inbound throughput of high-bandwidth network connections by reducing CPU overhead. It works by aggregating multiple incoming packets from a single flow into a larger packet chain before they are passed higher up the networking stack, thus reducing the number of packets that have to be processed.
The benefits of GRO are only seen if multiple packets for the same flow are received in a short span of time. If the incoming packets belong to different flows, then the benefits of having GRO enabled might not be seen.
Dispatcher on Avi Vantage is responsible for fetching the incoming packets from a NIC, sending them to the appropriate core for proxy work and sending back the outgoing packets to the NIC. A 40G NIC or even a 10G NIC receiving traffic at a high packet per second or PPS (for instance, in case of small UDP packets) might not be efficiently processed by a single-core dispatcher. This problem can be solved by distributing traffic from a single physical NIC across multiple queues where each queue gets processed by a dispatcher on a different core. Receive Side Scaling (RSS) enables the use of multiple queues on a single physical NIC.
Receive Side Scaling (RSS)
When RSS is enabled on Avi Vantage, NICs make use of multiple queues in the receive path. The NIC pins flow to queues and put packets belonging to the same flow to be used in the same queue. This helps the driver to spread packet processing across multiple CPUs thereby improving efficiency. On an Avi SE, the multi-queue feature is also enabled on the transmit side, ie., different flows are pinned to different queues (packets belonging to the same flow in the same queue) to distribute the packet processing among CPUs.
Note: The multi-queue feature (RSS) is not supported along with IPv6 addresses. If RSS is enabled, then IPv6 address cannot be configured for Avi Service Engine interfaces. Similarly, if the IPv6 address is already configured on Avi Service Engine interfaces, the multi-queue feature (RSS) cannot be enabled on those interfaces.
Multiple Dispatcher and Queues per NIC
Depending on the traffic processed by Avi Service Engine, the number of dispatchers can be configured with one or more than one core. Systems with high PPS load are configured with high number of dispatchers whereas proxy heavy load such as SSL workloads may not need high number of dispatchers.
Also, queues per NIC can be set for each dispatcher core for better performance. Avi Service Engine tries to detect best settings automatically for each environment.
Service Engine Datapath Isolation mode
Avi Service Engines can dedicate one or more service engine core for non se-dp tasks. This configuration particularly helps if service engines are hosting latency sensitive applications. Also, this will have a penalty on overall service engine performance as one or more core are dedicated for non se-dp tasks.
Hybrid RSS Mode
The SE hybrid RSS mode works only for DPDK mode with RSS configured and allows each SE vCPU to function as an independent unit, allowing every core to handle the dispatch and proxy job simultaneously and also disallowing the cross-core punting of the packets, i.e. For a 2 Core Service Engine, with each core tagged as (dispatcher-0, proxy-0), (dispatcher-1, proxy-1) of vCPU0 and vCPU1 respectively - any ingress flow on dispatcher-0 is egressed via proxy-0 and not punted to proxy-1 and vice versa.
The hybrid mode is brought in as a configurable property and aims at achieving higher performances on low core SE, especially 1 core and 2 core SE on vCenter / NSX-T cloud.