JVM network servers backed by io_uring
Do you know how your software processes network calls ?
Unless you use a low level programming language you most probably didn’t need to care about such details. If you are a web developer then most probably you just start Apache Tomcat or Netty server and let them do the low level work, while your code deals with the application business logic (high level work). To squeeze as much performance of your OS and hardware as possible Netty itself implements the plumbing with the OS network stack in its native transport implementations. But Apache Tomcat and almost any other JVM application that needs to implement a network server uses the Java APIs in java.nio.channels package. That is, the JDK provides the respective implementation for your OS. For example for NIO (New Input Output, introduced in JDK 1.4):
- EPollSelectorProvider for Linux — based on epoll system call
- KQueueSelectorProvider for BSDs (including MacOS) — based on kqueue
- PollSelectorProvider for any other Unix-es — based on poll
- WindowsSelectorProvider for Windows — based on IOCP.
For NIO2 (introduced in JDK 1.7) the provider classes are:
- LinuxAsynchronousChannelProvider for Linux (epoll)
- BsdAsynchronousChannelProvider for BSD/MacOS (kqueue)
- WindowsAsynchronousChannelProvider for Windows (IOCP)
In the last several months there is some work on a new approach to communicate with the Linux kernel — io_uring.
io_uring tries to improve the performance by reducing the number of syscalls to the kernel by setting up two ring buffers:
- submittion queue (SQ) — here the user programs append new tasks for the kernel
- completion queue (CQ) — here the user programs receive responses from the kernel for the submitted tasks
Another improvement comes from the way the data is shared between userland and kernel since there is less copying.
Last but not least io_uring is trying to avoid the known issues (1, 2)in epoll.
Apart from improving the state in the Linux networking (i.e. sockets) io_uring also tries to provide good asynchronous File IO API!
Almost all mainstream languages have open tickets in their issue trackers to provide integration with io_uring (network and/or file operations):
- .NET Core
- Node.js, Libuv
- Golang
- Rust — the language developers do not plan to add it but suggest to use third party libraries (crates), like Tokio Mio
- Java — the only mention of using io_uring in the standard library I was able to find is from Project Loom (lightweight user-mode threads!) — they are going to use io_uring for async File IO, but there is nothing about networking.
Netty project recently announced an experimental module for io_uring with quite impressive benchmark results! For version 0.0.1.Final-SNAPSHOT it was available only for x86_64 systems but since 0.0.3.Final also for aarch64!
I’ve tested it with my HTTP2 load tests but the results were not that spectacular as in their benchmark results but still better than epoll:
- Netty epoll - x86_64: 24688 reqs/sec, aarch64: 25344
- Netty io_uring - x86_64: 26731, aarch64: 28443
I wondered how one can use io_uring in Apache Tomcat ?!
My first idea was that the Netty implementation could be reused in a new Tomcat Protocol, similar to Http11AprProtocol that also uses JNI to make use of Apache APR. But the drawback would be that this implementation would be usable only by Apache Tomcat itself.
My second idea was to implement it as a custom java.nio.channels.SelectorProvider! This way any Java application could use it! Tomcat could use it with its Http11NioProtocol implementation. All one has to do it to register the custom SelectorProvider before Tomcat tries to create its ServerSocketChannel, e.g. by using the special system property -Djava.nio.channels.spi.SelectorProvider=… or by using the ServiceLoader API, i.e. by having /META-INF/services/java.nio.channels.spi.SelectorProvider in the classpath.
An initial attempt to do it can be found in a feature branch of my HTTP2 benchmark application. It has the plumbing to setup the JNI build step and the setup of the SelectorProvider but there is a lot of more work for the actual implementation of (Server)SocketChannel classes based on io_uring. Help is welcome!
I’ve shared my idea with the Apache Tomcat team since they have a lot of experience with these Java APIs but it seems there is no much interest in the community at the moment. Actually the Tomcat team wants to get rid of the native code in Tomcat (the APR connector) and don’t want to add a new one. The best would be the OpenJDK team to do the integration but I guess this won’t happen soon.
io_uring is a relatively new technology. It has been introduced in Linux 5.1 and new features and bug fixes are being added in newer versions. Many Linux distros still use old LTS kernels, so using it in production is only for the more adventurous! But now is the time to test it and to provide feedback!