I want to collapse all my little boxes into one powerful box. Ram is super pricy so I built a rig based on the ryzen 5800xt (DDR4 is cheaper and should be fine here) and bought a motherboard that cant PCI passthru the NIC by mistake.
Before ordering another motherboard that can passthru the NIC, I booted up bare metal and compared the performance to how it ran virtualized in fedora server. it was better, but still not hitting line level.
direct macbook to cable modem: 916/40
opnsense virtualized (with vlans and rules): 699/41
opnsense bare metal (with vlans and rules): 816/39
opnsense bare metal (with vlans and rules and hardware offload fully enabled): 824/40
the only rules in place were the defaults, the rule to block vlans from talking to eachother, and the rule to pass traffic to WAN. when virtualized, I cannot get PCI passthru so I was using macvtap interfaces and virtuio drivers with 4 threads and 4 pinned CPU threads.
CPU is a ryzen 5800XT NIC is a dual port intel I226V when virtualized, it was running under fedora server with QEMU/KVM q35 and given 8gigs of ram with hugepage memory and tested in both 2 and 4 thread resource allocation (all confirmed to be on the same 1 or 2 physical cores as the threads) and eventually even giving 4 threads to the virtuio driver (it was only claiming 1 thread before).
Bare metal IS definitely helping, so it looks like I need to swap out for a motherboard that can do proper PCI passthru of the NIC (now that I understand the limitations of IOMMU groups they specs of the board dont tell you about I hate them all the more.) but it still cant hit line rates. Theres no IDS or suricata or any of the fanciness turned on yet though, so I just dont understand why its this slow even bare metal.


The performance drop from virtualizing nics shouldn’t be nearly as big. How are you passing the vlans to the VM? are you passing all over one virtio nic or one virtio nic for each.
The setup I ran for multiple years was basicly a bridge interface on the host for each vlan and a seperate virtio nic to the opnsense VM for each, I got almost 10 gbit/s like that with 8gigs of ram for opnsense and 4 or 8 cores (I cant remember) with hyperthreading of a 2nd gen epyc. I didn’t do any optimisations for virtio.