Current assignment

I’m now working as a post-doc assigned to the ULTRA project, an ERC consolidated grant awarded to Dejan Kostic.

The goal of the ULTRA project is to build internet services with ultra-low latency. We aim to make the Internet services work at the true speed of the underlying hardware, a bit of which started with Metron. The services built by ULTRA will be an enabler for emerging applications such as intelligent transportation systems, the Internet of Things and e-health. To reduce tail latency of servers, and consolidate load to reduce costs and reduce energy consumption, we built RSS++, an intra-server load balancer that is both load and state aware. Similarly, but with very different approaches, we ensured a uniform load-balancing with Cheetah, while not breaking connections even when adding and removing servers.  When not possible, we revisited high-speed software connection tracking on modern servers and how SmartNICs can help with rules offloading. To avoid wasting precious resources we then developed PacketMill, a series of optimization to accelerate software packet processing. beyond what is possible today, handling > 100Gbps of network traffic with a single CPU core.


I started my PhD in 2013 inside the RUN team, in the EpI project supervised by Laurent Mathy.

Our research project was all about creating fast software middleboxes, and more generally fast virtual network functions (VNFs). To handle middleboxes features like IDS, Firewall, DPI on a datacenter or near a core router, one has to use either multiple general-purpose processors, or fast boxes mostly based on NPU or FPGA which are not really upgradable. Our goal is to come with a software architecture that would be able to handle very fast speed (~100Gbits/seconds) for any kind of VNFs on commodity hardware.

The first part of my work has been to find a strong basis for high-speed I/O to build upon. We decided to use the Click Modular Router and extend it to do flow processing and use it as a “Click Modular Middlebox”. However, after some months we found that many things could be improved regarding the usage of underlying frameworks like DPDK and Netmap, usage of batching (both I/O and compute batching) and multi-queue, leading to a first ANCS 2015 paper. A year later, I did an internship at Cisco Meraki, where I tried FastClick techniques on their product, uncovering new problems and leading to new discoveries.

Since then, we extended FastClick to unify the classification, session mappings and stack services on behalf of the VNFs. This does not only lead to convenient services for VNFs developers, it also allows to minimize and factorize the classification, avoiding redundant operations across VMs. The stack allows for on-the-fly modification of any flow (such as HTTP or TCP flows), managing SEQs and ACKs on behalf of the user. A presentation poster has been accepted at EuroSys 2018. A subsequent invited paper has been presented at HPSR 2018. The codename of the implementation is MiddleClick.

To enable efficient usage of the infrastructure around the dataplane itself, I collaborated with people at the KTH Institute of Technology to come up with Metron. Metron is a controller that enables to offload classification inside SDN switches and use NIC’s capabilities to directly deliver packets to the right FastClick process, avoiding any inter-core switch. “Metron: NFV Service Chains at the True Speed of the Underlying Hardware” was presented at NSDI 2018.

After my PhD graduation, I joined the NSLab team at KTH in July 2018, to work on Metron‘s next phase, towards a global, low-latency Internet.

In December 2019, we then published RSS++  at CoNEXT 2019. We observed the exponential growth of both Ethernet speeds and the number of CPU cores called for a new processing model for high-speed networking. Our new approach, RSS++, aims to answer the key question in this domain: which CPU core should get an incoming packet? RSS++ achieves very good load balancing over multiple CPU cores by exploiting opportunistic and controlled flow migration (utilizing a new design that enables lockless and zero-copy migration of state between CPU cores).

After addressing the problem of intra-server load-balancing, it was natural to address inter-server load-balancing at NSDI 2020. We built Cheetah, a new load balancer that solves the challenge of remembering which connection was sent to which server without the traditional trade-off between uniform load balancing and efficiency. Cheetah is up to 5 times faster than stateful load balancers and can support advanced balancing mechanisms that reduce the flow completion time by a factor of 2 to 3x without breaking connections, even while adding and removing servers.

In our recent PacketMill paper presented at ASPLOS’21 we showed the limits of current kernel bypass solutions such as DPDK and propose a new buffering model that has improved memory locality. Combined with a pipeline of source-to-source compilation and LLVM passes, the throughput increases by up to 70% for memory intensive network functions. While those improvements are generic, applied to FastClick it becomes the fastest than all the open-source packet processing frameworks publicly available. The extended abstract is already available.

A lot of stateful high-speed applications rely on connection tracking. We verfore revisited high-speed software connection tracking on modern servers, using various hash-tables implementations. On top of being a general survey, our paper also study the impact of maintainance, that is deleting connections after some time which is a often a forgotten, but very important aspect of tracking. We’ll present that work at HPSR’21.

We then studied SmartNIC could help with rules offloading, for connection tracking but also other scenario, as it was used in Metron and RSS++ which led to a paper presented at PAM’21.


Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.