diff options
Diffstat (limited to 'docs/mellanox.txt')
-rw-r--r-- | docs/mellanox.txt | 88 |
1 files changed, 88 insertions, 0 deletions
diff --git a/docs/mellanox.txt b/docs/mellanox.txt new file mode 100644 index 0000000..ed20048 --- /dev/null +++ b/docs/mellanox.txt @@ -0,0 +1,88 @@ +Terminology +=========== + - Send/Receive Queues + QP (Queue Pair): Combines RQ and SQ. Generally, irrelevant for the following + RQ (Receive Queue): + SQ (Send Queue): + CQ (Completion Queue): Completed operations reported here + EQ (Event Queue): Completions generate events (at specified rate) which in turn generate IRQs + WR/WQ (Work Request Queue): This is basically buffers (SG-lists) which should be either send or used for data reception + *QE (* Queue Event) + + Flow: WQE --submit work--> WQ --execute--> SQ/RQ --on completion-> CQ --signal--> EQ -> IRQ + * Completion Event Moderation: Redeuce amount of reported events (EQ) + + - Ofloads + RSS (Receive Side Scalling): Distribute load across CPU cores + LRO (Large Receive Offload): Group packets and deliver to user-space as a large single grouped packet [ ethtool -K shows if LRO on/off ] + + - Various + AEV (Asynchronous Event): Errors,etc. + SRQ (Shared Receive Queue): + ICM (Interconnect Context Memory): Address Translation Tables, Control Objects, User Access Region (registers) + MPT (Memory Protection Table): + RMP (Receive Memory Pool): + TIR (Transport Interface Receive): + RQT (RQ Table): + MCG (Multicast Group): + +Driver +====== + - Network packets is/are streamed to ring buffers (with all Ethernet, IP, UDP/TCP headers). + The number of ring buffers dependents on VMA_RING_ALLOCATION parameter: + 0 - per network interface + 1 - per IP + => 10 - per socket + 20 - per thread (which was used to create the socket) + 30 - per core + 31 - per core (with some affinity of threads to cores) + + - The memory for ring buffer is allocated based on VMA_MEM_ALLOC_TYPE: + 0 - malloc (this will be very slow if large buffers are requested) + 1 - contigous + => 2 - HugePages + + - The number of buffers per ring is controlled with VMA_RX_BUFS (this is total in all rings) + * Each buffer VMA_MTU bytes + * Recommended: VMA_RX_BUFS ~ #rings * VMA_RX_WRE (number of WRE allocated on all interfaces) + +LibVMA +====== + There is 3 interfaces: + - MP-RQ (Multi-packet Receive Queue): vma_cyclic_buffer_read + This is useful for processing data streams when packet size stays contant and the packet flow doesn't change + drastically over time. Requires ConntextX-5 or newer. + + * Use 'vma_add_ring_profile' to configure the size of ring buffer (specifies buffer size & the packet size) + * Set per-socket SO_VMA_RING_ALLOC_LOGIC using setsockopt + * Call 'vma_cyclic_buffer_read' to access raw ring buffer, specifies minimum and maximum packets to return + + * The returned 'completion' structure referencing the position in the ring buffer. Packets in ring buffer + include all headers (ethernet - 14 bytes, ip - 20 bytes, udp - 8 bytes). + * New packets meanwhile are written in the remaining part of the ring buffer (until the linear end of the + buffer - consequently the returned data is not overwritten). + * The buffer rewinded only on call to 'vma_cyclic_buffer_read'. Less than the specified minimum amount of + packets can be returned if currently near the end of buffer and not enough space to fullfil the minimum + requirement. + + * To ensure enough space for the follow up packets, synchronization between buffer size and min/max packet + is required. It should never happen that the space for only few packets is left when end of the buffer is + close. + + - SocketXtreme: socketxtreme_poll + More complex interface allowing more control over process particularly processing packets with varing size. + Requires ConnectX-5 or newer. + + * Get ring buffers associated with socket 'get_socket_rings_num' and 'get_socket_rings_fds' + * Get ready completions on the specified ring buffer with 'socketxtreme_poll' (pass 'fd' returned with 'get_socket_rings_fds') + * Two types of completions: 'VMA_SOCKETXTREME_NEW_CONNECTION_ACCEPTED' and 'VMA_SOCKETXTREME_PACKET'. + * For the second type, process an associated list of buffers and keep reference counting with 'socketxtreme_ref_vma_buf', + 'socketxtreme_free_vma_buf'. + * Clean/unreference received packets with socketxtreme_free_vma_packets + + - Zero Copy: recvfrom_zcopy + The simplest interface working with ConnectX-3 cards. The packet is still written to ring-buffers. The data is not copied out + of ring buffers. This interface provides a way to get pointers to locations in ring buffer. There is a slight overhead compared + to MP-RQ approach to prepare list of packet pointers. + + |