## **Automatic Parallelization of Software Network Functions**

Francisco Pereira, Fernando Ramos, Luis Pedrosa





# Middleboxes are pervasive in today's networks



#### **Trading performance for flexibility**



Fixed-function closed-source appliances

Software middleboxes

#### **Trading performance for flexibility**



#### Line-rates just keep increasing



time



#### Line-rates just keep increasing



#### Line-rates just keep increasing



#### **Parallelization in a nutshell**



#### There is no time for synchronization



Avoiding inter-core coordination is paramount to achieving high performance in parallel implementations

#### **Shared-nothing architecture**



## Let's use a firewall as an example



#### **Firewall NF**

















Finding the right sharding solution

How should we shard our 🗮 ?





Finding the right sharding solution

Finding the right NIC configuration





Which packet fields & key enforce the required sharding solution?

Finding the right sharding solution

Finding the right NIC configuration









### Typical constraints found on NFs makes automatic parallelization possible



### We propose Maestro, a solution for automatic parallelization

#### **Automatic parallelization**

#### **Push-button parallelization**



Favors shared-nothing architectures

Provides a highly-optimized lock-based alternative

Can also generate parallel implementations using hardware transactional memory (HTM)

#### The 3 ideas supporting Maestro



#### **Maestro's pipeline**












Key equality



**Subsumption** 



**Disjoint Dependencies** 



**Incompatible Dependencies** 



**Interchangeable Constraints** 



Key equality



**Subsumption** 

**R3** Disjoint Dependencies

4 Incompatible Dependencies

Interchangeable Constraints









 $p_0$  and  $p_1$  are sent to the same core if

p<sub>0</sub>[flow] = p<sub>1</sub>[flow]

R1 Key equality



Subsumption

map\_put({src\_ip, dst\_ip}, v)

map\_put(dst\_ip, v)





**R2** 



 $p_0$  and  $p_1$  are sent to the same core if:

p<sub>0</sub>[dst\_ip] = p<sub>1</sub>[dst\_ip]





















$$p_{0}[flow] = p_{1}[flow] \rightarrow hash(p_{0}) = hash(p_{1})$$

$$\wedge$$

$$p_{0}[inv_flow] = p_{1}[inv_flow] \rightarrow hash(p_{0}) = hash(p_{1})$$

$$\wedge$$

$$p_{0}[flow] = p_{1}[inv_flow] \rightarrow hash(p_{0}) = hash(p_{1})$$







# **Code generator**



# **Code generator**



# **Evaluation**

- How does performance scale with the number of cores
  - Shared nothing vs Lock-based vs HTM
  - Varying traffic patterns
  - Packet size
  - Churn
- How does it fare against other parallel frameworks?
  - Vector Packet Processing (VPP)

# **Evaluation**

- How does performance scale with the number of cores
  - Shared nothing vs Lock-based vs HTM
  - Varying traffic patterns
  - Packet size
  - Churn



- How does it fare against other parallel frameworks?
  - Vector Packet Processing (VPP)

















NOP

SBridge

DBridge

Policer

₹

NAT

5

PSD

Щ





# **Scalability**
















## Conclusion



Maestro is a push-to-parallelize system that automatically parallelizes software NFs.

Generates **shared-nothing** parallel solutions whenever possible, and **lock-based** solutions otherwise.

Maestro's shared-nothing NFs scale linearly with cores.



**Contact:** francisco.chamica.pereira@tecnico.ulisboa.pt

Web: maestro.inesc-id.pt