SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
Receive side scaling (RSS) with
eBPF in QEMU and virtio-net
Yan Vugenfirer - CEO, Daynix
Agenda
• What is RSS?


• History: RSS and virtio-net


• What is eBPF?


• Using eBPF for packet steering (RSS) in
virtio-net
What is RSS?
• Receive side scaling - distribution of packets’ processing
among CPUs


• A NIC uses a hashing function to compute a hash value
over a defined area


• Hash value is used to index an indirection table


• The values in the indirection table are used to assign the
received data to a CPU


• With MSI support, a NIC can also interrupt the associated
CPU
CPU1 CPU2 CPU3
CPU0
What is RSS?
NIC
RX	handling RX	handling RX	handling RX	handling
NIC	Driver
HW	
queue
HW	
queue
HW	
queue
HW	
queue
Packet	classification
What is RSS?
Packet	
header
Hash	
function	
(Toeplitz)
QueueCPU	
number
Redirection	table	(up	to	128	entries)
History: RSS and virtio-net
• Let’s use RSS with virtio-net!


• Utilisation CPUs for packet processing


• Cash locality for network applications


• Microsoft WHQL requirement for high speed
devices
History: RSS and virtio-net
• No multi-queue in virtio


• SW implementation in Windows guest driver (similar to RFS in Linux)
CPU1 CPU2 CPU3
CPU0
virtio-net
RX	
handling
RX	
handling
RX	
handling
NetKVM	
RX	virt-queue
Packet	classification
RX	handling
History: RSS and virtio-net
• virtio-net became multi queue device


• Due to Windows requirements - hybrid
model. Interrupt received on specific CPU
core, but could be rescheduled to another


• Works good for TCP


• Might not work for UDP


• Support legacy interrupts for old OSes
History: RSS and virtio-net
CPU1
CPU0
virtio-net
NetKVM	
RX	virt-queue
Packet	classification
RX	handling
RX	virt-queue
Packet	classification
RX	handling
History: RSS and virtio-net
• virtio spec changes


• Set steering mode


• Pass the device redirection tables


• Set hash value in virtio-net-hdr


• No inter-processor interrupts due to re-scheduling


• Vision: HW will do all the heavy work


• Implementations


• SW only POC in QEMU


• eBPF
virtio spec changes - capabilities
• VIRTIO_NET_F_RSS


• VIRTIO_NET_F_MQ must be set
virtio spec changes - device configuration
• virtio_net_config
struct virtio_net_con
fi
g
{

u8 mac[6]
;

le16 status
;

le16 max_virtqueue_pairs
;

le16 mtu
;

le32 speed
;

u8 duplex
;

u8 rss_max_key_size
;

le16 rss_max_indirection_table_length
;

le32 supported_hash_types
;

};
virtio spec changes - setting RSS parameters
• VIRTIO_NET_CTRL_MQ_RSS_CONFIG
struct virtio_net_rss_con
fi
g
{

le32 hash_types
;

le16 indirection_table_mask
;

le16 unclassi
fi
ed_queue
;

le16 indirection_table[indirection_table_length]
;

le16 max_tx_vq
;

u8 hash_key_length
;

u8 hash_key_data[hash_key_length]
;

};
virtio spec changes - virtio-net-hdr
struct virtio_net_hdr
{

u8
fl
ags
;

u8 gso_type
;

le16 hdr_len
;

le16 gso_size
;

le16 csum_start
;

le16 csum_offset
;

le16 num_buffers
;

le32 hash_value; (Only if
VIRTIO_NET_F_HASH_REPORT negotiated
)

le16 hash_report; (Only if
VIRTIO_NET_F_HASH_REPORT negotiated
)

le16 padding_reserved; (Only if
VIRTIO_NET_F_HASH_REPORT negotiated
)

};
What is eBPF?
• Enable running sandboxed code in Linux
kernel


• The code can be loaded at run time


• Used for security, tracing, networking,
observability
How can eBPF help us?
• Calculate the RSS hash and return the
queue index for incoming packets


• Populate the hash value in virtio_net_hdr
(work in progress)
The “magic”
• Loading eBPF program using IOCTL
TUNSETSTEERINGEBPF


• tun_struct has steering_prog field


• If eBPF program for steering is loaded,
tun_select_queue will call it with
bpf_prog_run_clear_cb
Hash population (work in progress)
• Population from eBPF program


• virtio_net_hdr with additional fields


• Work in progress in kernel


• Enlarge virtio_net_hdr in all kernel
modules


• Keep calculated hash in SKB and copy
it to virtio_net_hdr
eBPF program source in QEMU
• tun_rss_steering_prog


• tools/ebpf/rss.bpf.c


• Use clang to compile


• tools/ebpf/Makefile.ebpf
eBPF program skeleton
• During QEMU compilation include file is populated
with the compiled binary


• bpftool gen skeleton rss.bpf.o > rss.bpf.skeleton.h


• Helpers to initialise maps (mechanism to share data
between eBPF program and kerneluserspace)


• Some changes to support libvirt - mmapping the
shared data structure to user space (3 maps in
current main branch without mmaping, 1 map in
pending patches)
Configuration map
• The configuration map is BPF array map that
contains everything required for RSS:


• Supported hash flows: IPv4, TCPv4, UDPv4,
IPv6, IPv6ex, TCPv6, UDPv6


• Indirections table size (max 128)


• Default queue


• Toeplitz hash key - 40 bytes


• Indirections table - 128 entries
Loading eBPF program
• Two mechanisms


• QEMU using function in skeleton file.
Calling bpf syscall


• eBPF helper program (with libvirt) -
QEMU gets file descriptors from libvirt
with already loaded ebpf program and
mapping of the ebpf map (patches under
review)
Loading eBPF program
• Possible load failures


• Kernel support. Current solution requires 5.8+


• Without helper


• QEMU process capabilities: CAP_BPF,
CAP_NET_ADMIN


• sysctl kernel.unprivileged_bpf_disabled=1


• libbpf not present


• In case of helper usage - mismatch between helper and
QEMU


• Stamp is a hash of skeleton include file
Fallback
• Built it QEMU RSS steering


• Can be triggered also by live migration


• Hash population is enabled in QEMU
command line, because there is still not
hash population from eBPF program
Live migration
• Known issue: migrating to old kernel


• eBPF load failure


• Fallback to in-QEMU RSS steering
QEMU command line
• Multi-queue should be enabled


• -smp with vCPU for each queue-pair


• -device virtio-net-pci,
rss=on,hash=on,ebpf_rss_fds=<fd0,fd1>
QEMU command line
• rss=on


• Try to load eBPF from skeleton or by using
provided file descriptors


• Fallback to “built-in” RSS steering in QEMU if
cannot load eBPF program


• hash=on


• Populate hash in virtio_net_hdr


• ebpf_rss_fds - optional, provide file descriptors for
eBPF program and map
libvirt integration
• QEMU should run with least possible privileges


• eBPF helper


• Stamping the helper during compilation time


• Redirection table mapping


• Additional command line options to provide file
descriptors to QEMU


• Patches under review
Current status
• Initial support was merged to QEMU


• libvirt integration patches in QEMU and
libvirt are under discussion on mailing lists


• Hash population by eBPF program -
pending additional work for next set of
patches
Pending patches
• QEMU libvirt integration: https://lists.nongnu.org/
archive/html/qemu-devel/2021-07/msg03535.html


• libvirt patches: https://listman.redhat.com/archives/
libvir-list/2021-July/msg00836.html


• RSS support in Linux virtio-net driver: https://
lists.linuxfoundation.org/pipermail/virtualization/
2021-August/055940.html


• In kernel hash calculation reporting to guest driver:
https://lkml.org/lkml/2021/1/12/1329
virtio-net and eBPF future
• Packet filtering with vhost


• Security?
yan@daynix.com
Links
• https://www.kernel.org/doc/Documentation/networking/scaling.txt


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/
introduction-to-receive-side-scaling


• https://ebpf.io


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-
with-a-single-hardware-receive-queue


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-
with-hardware-queuing


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-
with-message-signaled-interrupts


• https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-
hashing-functions

Mais conteúdo relacionado

Mais procurados

Open vSwitch Offload: Conntrack and the Upstream Kernel
Open vSwitch Offload: Conntrack and the Upstream KernelOpen vSwitch Offload: Conntrack and the Upstream Kernel
Open vSwitch Offload: Conntrack and the Upstream KernelNetronome
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingMichelle Holley
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsHisaki Ohara
 
Kdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisKdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisBuland Singh
 
20150511 jun lee_openstack neutron 분석 (최종)
20150511 jun lee_openstack neutron 분석 (최종)20150511 jun lee_openstack neutron 분석 (최종)
20150511 jun lee_openstack neutron 분석 (최종)rootfs32
 
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118Wei Fu
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchTe-Yen Liu
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDPlcplcp1
 
Dataplane programming with eBPF: architecture and tools
Dataplane programming with eBPF: architecture and toolsDataplane programming with eBPF: architecture and tools
Dataplane programming with eBPF: architecture and toolsStefano Salsano
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumScyllaDB
 
The e820 trap of Linux kernel hibernation
The e820 trap of Linux kernel hibernationThe e820 trap of Linux kernel hibernation
The e820 trap of Linux kernel hibernationjoeylikernel
 
Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM ShapeBlue
 
BPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabBPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabTaeung Song
 

Mais procurados (20)

Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Open vSwitch Offload: Conntrack and the Upstream Kernel
Open vSwitch Offload: Conntrack and the Upstream KernelOpen vSwitch Offload: Conntrack and the Upstream Kernel
Open vSwitch Offload: Conntrack and the Upstream Kernel
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
 
Kdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisKdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysis
 
20150511 jun lee_openstack neutron 분석 (최종)
20150511 jun lee_openstack neutron 분석 (최종)20150511 jun lee_openstack neutron 분석 (최종)
20150511 jun lee_openstack neutron 분석 (최종)
 
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118
 
Introduction to DPDK RIB library
Introduction to DPDK RIB libraryIntroduction to DPDK RIB library
Introduction to DPDK RIB library
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitch
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
Dataplane programming with eBPF: architecture and tools
Dataplane programming with eBPF: architecture and toolsDataplane programming with eBPF: architecture and tools
Dataplane programming with eBPF: architecture and tools
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux Kernel
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in Cilium
 
The e820 trap of Linux kernel hibernation
The e820 trap of Linux kernel hibernationThe e820 trap of Linux kernel hibernation
The e820 trap of Linux kernel hibernation
 
Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM
 
BPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabBPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLab
 

Semelhante a Receive side scaling (RSS) with eBPF in QEMU and virtio-net

Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket LinxiaofengMichael Zhang
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Michelle Holley
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxAkshitAgiwal1
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] IO Visor Project
 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!Linaro
 
Load Balancing
Load BalancingLoad Balancing
Load Balancingoptalink
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformRedge Technologies
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
Introduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technologyIntroduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technology義洋 顏
 
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...Bruno Castelucci
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
 

Semelhante a Receive side scaling (RSS) with eBPF in QEMU and virtio-net (20)

Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket Linxiaofeng
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
Linux Kernel Live Patching
Linux Kernel Live PatchingLinux Kernel Live Patching
Linux Kernel Live Patching
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep Dive
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
15 ia64
15 ia6415 ia64
15 ia64
 
ch2.pptx
ch2.pptxch2.pptx
ch2.pptx
 
Ebpf ovsconf-2016
Ebpf ovsconf-2016Ebpf ovsconf-2016
Ebpf ovsconf-2016
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!
 
Load Balancing
Load BalancingLoad Balancing
Load Balancing
 
Building a Router
Building a RouterBuilding a Router
Building a Router
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platform
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Introduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technologyIntroduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technology
 
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 

Mais de Yan Vugenfirer

HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...
HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...
HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...Yan Vugenfirer
 
Implementing SR-IOv failover for Windows guests during live migration
Implementing SR-IOv failover for Windows guests during live migrationImplementing SR-IOv failover for Windows guests during live migration
Implementing SR-IOv failover for Windows guests during live migrationYan Vugenfirer
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototypingYan Vugenfirer
 
Windows network teaming
Windows network teamingWindows network teaming
Windows network teamingYan Vugenfirer
 
Rebuild presentation - IoT Israel MeetUp
Rebuild presentation - IoT Israel MeetUpRebuild presentation - IoT Israel MeetUp
Rebuild presentation - IoT Israel MeetUpYan Vugenfirer
 
Rebuild presentation during Docker's Birthday party
Rebuild presentation during Docker's Birthday partyRebuild presentation during Docker's Birthday party
Rebuild presentation during Docker's Birthday partyYan Vugenfirer
 
Contributing to open source using Git
Contributing to open source using GitContributing to open source using Git
Contributing to open source using GitYan Vugenfirer
 
Microsoft Hardware Certification Kit (HCK) setup
Microsoft Hardware Certification Kit (HCK) setupMicrosoft Hardware Certification Kit (HCK) setup
Microsoft Hardware Certification Kit (HCK) setupYan Vugenfirer
 
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...Yan Vugenfirer
 
Advanced NDISTest options
Advanced NDISTest optionsAdvanced NDISTest options
Advanced NDISTest optionsYan Vugenfirer
 
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...Yan Vugenfirer
 
Windows guest debugging presentation from KVM Forum 2012
Windows guest debugging presentation from KVM Forum 2012Windows guest debugging presentation from KVM Forum 2012
Windows guest debugging presentation from KVM Forum 2012Yan Vugenfirer
 

Mais de Yan Vugenfirer (14)

HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...
HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...
HCK-CI: Enabling CI for Windows Guest Paravirtualized Drivers - Kostiantyn Ko...
 
Implementing SR-IOv failover for Windows guests during live migration
Implementing SR-IOv failover for Windows guests during live migrationImplementing SR-IOv failover for Windows guests during live migration
Implementing SR-IOv failover for Windows guests during live migration
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototyping
 
Windows network teaming
Windows network teamingWindows network teaming
Windows network teaming
 
Rebuild presentation - IoT Israel MeetUp
Rebuild presentation - IoT Israel MeetUpRebuild presentation - IoT Israel MeetUp
Rebuild presentation - IoT Israel MeetUp
 
Rebuild presentation during Docker's Birthday party
Rebuild presentation during Docker's Birthday partyRebuild presentation during Docker's Birthday party
Rebuild presentation during Docker's Birthday party
 
Contributing to open source using Git
Contributing to open source using GitContributing to open source using Git
Contributing to open source using Git
 
Introduction to Git
Introduction to GitIntroduction to Git
Introduction to Git
 
Microsoft Hardware Certification Kit (HCK) setup
Microsoft Hardware Certification Kit (HCK) setupMicrosoft Hardware Certification Kit (HCK) setup
Microsoft Hardware Certification Kit (HCK) setup
 
UsbDk at a Glance 
UsbDk at a Glance UsbDk at a Glance 
UsbDk at a Glance 
 
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...
Building “old” Windows drivers (XP, Vista, 2003 and 2008) with Visual Studio ...
 
Advanced NDISTest options
Advanced NDISTest optionsAdvanced NDISTest options
Advanced NDISTest options
 
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...
QEMU Development and Testing Automation Using MS HCK - Anton Nayshtut and Yan...
 
Windows guest debugging presentation from KVM Forum 2012
Windows guest debugging presentation from KVM Forum 2012Windows guest debugging presentation from KVM Forum 2012
Windows guest debugging presentation from KVM Forum 2012
 

Último

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 

Último (20)

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 

Receive side scaling (RSS) with eBPF in QEMU and virtio-net

  • 1. Receive side scaling (RSS) with eBPF in QEMU and virtio-net Yan Vugenfirer - CEO, Daynix
  • 2. Agenda • What is RSS? • History: RSS and virtio-net • What is eBPF? • Using eBPF for packet steering (RSS) in virtio-net
  • 3. What is RSS? • Receive side scaling - distribution of packets’ processing among CPUs • A NIC uses a hashing function to compute a hash value over a defined area • Hash value is used to index an indirection table • The values in the indirection table are used to assign the received data to a CPU • With MSI support, a NIC can also interrupt the associated CPU
  • 4. CPU1 CPU2 CPU3 CPU0 What is RSS? NIC RX handling RX handling RX handling RX handling NIC Driver HW queue HW queue HW queue HW queue Packet classification
  • 6. History: RSS and virtio-net • Let’s use RSS with virtio-net! • Utilisation CPUs for packet processing • Cash locality for network applications • Microsoft WHQL requirement for high speed devices
  • 7. History: RSS and virtio-net • No multi-queue in virtio • SW implementation in Windows guest driver (similar to RFS in Linux) CPU1 CPU2 CPU3 CPU0 virtio-net RX handling RX handling RX handling NetKVM RX virt-queue Packet classification RX handling
  • 8. History: RSS and virtio-net • virtio-net became multi queue device • Due to Windows requirements - hybrid model. Interrupt received on specific CPU core, but could be rescheduled to another • Works good for TCP • Might not work for UDP • Support legacy interrupts for old OSes
  • 9. History: RSS and virtio-net CPU1 CPU0 virtio-net NetKVM RX virt-queue Packet classification RX handling RX virt-queue Packet classification RX handling
  • 10. History: RSS and virtio-net • virtio spec changes • Set steering mode • Pass the device redirection tables • Set hash value in virtio-net-hdr • No inter-processor interrupts due to re-scheduling • Vision: HW will do all the heavy work • Implementations • SW only POC in QEMU • eBPF
  • 11. virtio spec changes - capabilities • VIRTIO_NET_F_RSS • VIRTIO_NET_F_MQ must be set
  • 12. virtio spec changes - device configuration • virtio_net_config struct virtio_net_con fi g { u8 mac[6] ; le16 status ; le16 max_virtqueue_pairs ; le16 mtu ; le32 speed ; u8 duplex ; u8 rss_max_key_size ; le16 rss_max_indirection_table_length ; le32 supported_hash_types ; };
  • 13. virtio spec changes - setting RSS parameters • VIRTIO_NET_CTRL_MQ_RSS_CONFIG struct virtio_net_rss_con fi g { le32 hash_types ; le16 indirection_table_mask ; le16 unclassi fi ed_queue ; le16 indirection_table[indirection_table_length] ; le16 max_tx_vq ; u8 hash_key_length ; u8 hash_key_data[hash_key_length] ; };
  • 14. virtio spec changes - virtio-net-hdr struct virtio_net_hdr { u8 fl ags ; u8 gso_type ; le16 hdr_len ; le16 gso_size ; le16 csum_start ; le16 csum_offset ; le16 num_buffers ; le32 hash_value; (Only if VIRTIO_NET_F_HASH_REPORT negotiated ) le16 hash_report; (Only if VIRTIO_NET_F_HASH_REPORT negotiated ) le16 padding_reserved; (Only if VIRTIO_NET_F_HASH_REPORT negotiated ) };
  • 15. What is eBPF? • Enable running sandboxed code in Linux kernel • The code can be loaded at run time • Used for security, tracing, networking, observability
  • 16. How can eBPF help us? • Calculate the RSS hash and return the queue index for incoming packets • Populate the hash value in virtio_net_hdr (work in progress)
  • 17. The “magic” • Loading eBPF program using IOCTL TUNSETSTEERINGEBPF • tun_struct has steering_prog field • If eBPF program for steering is loaded, tun_select_queue will call it with bpf_prog_run_clear_cb
  • 18. Hash population (work in progress) • Population from eBPF program • virtio_net_hdr with additional fields • Work in progress in kernel • Enlarge virtio_net_hdr in all kernel modules • Keep calculated hash in SKB and copy it to virtio_net_hdr
  • 19. eBPF program source in QEMU • tun_rss_steering_prog • tools/ebpf/rss.bpf.c • Use clang to compile • tools/ebpf/Makefile.ebpf
  • 20. eBPF program skeleton • During QEMU compilation include file is populated with the compiled binary • bpftool gen skeleton rss.bpf.o > rss.bpf.skeleton.h • Helpers to initialise maps (mechanism to share data between eBPF program and kerneluserspace) • Some changes to support libvirt - mmapping the shared data structure to user space (3 maps in current main branch without mmaping, 1 map in pending patches)
  • 21. Configuration map • The configuration map is BPF array map that contains everything required for RSS: • Supported hash flows: IPv4, TCPv4, UDPv4, IPv6, IPv6ex, TCPv6, UDPv6 • Indirections table size (max 128) • Default queue • Toeplitz hash key - 40 bytes • Indirections table - 128 entries
  • 22. Loading eBPF program • Two mechanisms • QEMU using function in skeleton file. Calling bpf syscall • eBPF helper program (with libvirt) - QEMU gets file descriptors from libvirt with already loaded ebpf program and mapping of the ebpf map (patches under review)
  • 23. Loading eBPF program • Possible load failures • Kernel support. Current solution requires 5.8+ • Without helper • QEMU process capabilities: CAP_BPF, CAP_NET_ADMIN • sysctl kernel.unprivileged_bpf_disabled=1 • libbpf not present • In case of helper usage - mismatch between helper and QEMU • Stamp is a hash of skeleton include file
  • 24. Fallback • Built it QEMU RSS steering • Can be triggered also by live migration • Hash population is enabled in QEMU command line, because there is still not hash population from eBPF program
  • 25. Live migration • Known issue: migrating to old kernel • eBPF load failure • Fallback to in-QEMU RSS steering
  • 26. QEMU command line • Multi-queue should be enabled • -smp with vCPU for each queue-pair • -device virtio-net-pci, rss=on,hash=on,ebpf_rss_fds=<fd0,fd1>
  • 27. QEMU command line • rss=on • Try to load eBPF from skeleton or by using provided file descriptors • Fallback to “built-in” RSS steering in QEMU if cannot load eBPF program • hash=on • Populate hash in virtio_net_hdr • ebpf_rss_fds - optional, provide file descriptors for eBPF program and map
  • 28. libvirt integration • QEMU should run with least possible privileges • eBPF helper • Stamping the helper during compilation time • Redirection table mapping • Additional command line options to provide file descriptors to QEMU • Patches under review
  • 29. Current status • Initial support was merged to QEMU • libvirt integration patches in QEMU and libvirt are under discussion on mailing lists • Hash population by eBPF program - pending additional work for next set of patches
  • 30. Pending patches • QEMU libvirt integration: https://lists.nongnu.org/ archive/html/qemu-devel/2021-07/msg03535.html • libvirt patches: https://listman.redhat.com/archives/ libvir-list/2021-July/msg00836.html • RSS support in Linux virtio-net driver: https:// lists.linuxfoundation.org/pipermail/virtualization/ 2021-August/055940.html • In kernel hash calculation reporting to guest driver: https://lkml.org/lkml/2021/1/12/1329
  • 31. virtio-net and eBPF future • Packet filtering with vhost • Security?
  • 33.
  • 34. Links • https://www.kernel.org/doc/Documentation/networking/scaling.txt • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/ introduction-to-receive-side-scaling • https://ebpf.io • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss- with-a-single-hardware-receive-queue • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss- with-hardware-queuing • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss- with-message-signaled-interrupts • https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss- hashing-functions