Software network functions (NFs) trade-off flexibility and ease of deployment for an increased challenge of performance. The traditional way to increase NF performance is by distributing traffic to multiple CPU cores, but this poses a significant challenge: how to parallelize an NF without breaking its semantics? We propose Maestro, a tool that analyzes a sequential implementation of an NF and automatically generates an enhanced parallel version that carefully configures the NIC’s Receive Side Scaling mechanism to distribute traffic across cores, while preserving semantics. When possible, Maestro orchestrates a shared-nothing architecture, with each core operating independently without shared memory coordination, maximizing performance. Otherwise, Maestro choreographs a fine-grained read-write locking mechanism that optimizes operation for typical Internet traffic. We parallelized 8 software NFs and show that they generally scale-up linearly until bottlenecked by PCIe when using small packets or by 100 Gbps line-rate with typical Internet traffic. Maestro further outperforms modern hardware-based transactional memory mechanisms, even for challenging parallel-unfriendly workloads.
2023
poster
Poster: In-Network ML Feature Computation for Malicious Traffic Detection
We present Peregrine, a malicious traffic detector that offloads part of its computation to a programmable switch. The idea is to partition detection, by moving the ML feature computation module from a middlebox server to a switch data plane. The key innovation unlocked—computing the ML input features over all traffic—results in a significant improvement in detection performance: in our evaluation, up to 5.7x over the state of the art.
2022
article
Automatic Generation of Network Function Accelerators Using Component-Based Synthesis
Designing networked systems that take best advantage of heterogeneous dataplanes - e.g., dividing packet processing across both a PISA switch and an x86 CPU - can improve performance, efficiency, and resource consumption. However, programming for multiple hardware targets remains challenging because developers must learn platform-specific languages and skills. While some ’write-once, run-anywhere’ compilers exist, they are unable to consider a range of implementation options to tune the NF to meet performance objectives. In this short paper, we explore preliminary ideas towards a compiler that explores a large search space of different mappings of functionality to hardware. This exploration can be tuned for a programmer-specified objective, such as minimizing memory consumption or maximizing network throughput. Our initial prototype, SyNAPSE, is based on a methodology called component-based synthesis and supports deployments across x86 and Tofino platforms. Relative to a baseline compiler which only generates one deployment decision, SyNAPSE uncovers thousands of deployment options - including a deployment which reduces the amount of controller traffic by an order of magnitude, and another deployment which halves memory usage.