Comparing the performance of several HTTP2 web servers

Martin Grigorov
4 min readOct 6, 2020

Recently there were few questions/complains about performance of HTTP2 support in Apache Tomcat web server. So I’ve decided to check for myself and see what could be improved!

To enable HTTP2 support in Apache Tomcat one should just add

<UpgradeProtocol className=”org.apache.coyote.http2.Http2Protocol” />

to a Connector. Here is a more complete Connector config:

The web browsers support HTTP2 only over TLS encrypted connection! That’s the reason for the SSLHostConfig.

If you don’t specify <SSLHostConfig> then Tomcat will enable HTTP2 over clear text — the so called h2c. But h2c can be consumed only with non-browser http clients.

To make it easier for me to run different configurations of Tomcat — HTTP2 vs H2C, NIO vs NIO2 vs APR, I have created a small application that uses embedded Tomcat. Depending on some system properties it will run different protocols (NIO, NIO2, APR), with or without TLS and HTTP2.

As a load test client I use Vegeta because wrk does not support HTTP2 and because it has a bug in the way it calculates the statistics.

The Vegeta command looks like:

echo '{"method":"GET", "url":"$SCHEME://$TARGET:$PORT/$PATH"}' | eval vegeta attack -http2 $H2C -format=json -rate 0 -max-workers 128 -insecure -duration 30s | vegeta encode | vegeta report --type json > http2-result.json

The $H2C variable value could be either -h2c or "”(empty). When -h2c is enabled then the $SCHEME is HTTP , otherwise HTTPS .

I have executed the above Vegeta command against the following HTTP2 enabled web servers:

Each server provides a GET endpoint that just writes “Hello world!” back to the client!

Instances of these servers run on two Ubuntu 20.04.1 VMs — one x86_64 and one aarch64, same as in my previous articles. And the Vegeta client runs on a third VM in the same network.

The tests were run several times and the results with the highest throughput (reqs/sec) for each web server were:

  • Tomcat NIO HTTP/2: (x86_64: 17485, aarch64: 14173)
  • Tomcat NIO h2c: (x86_64: 21675, aarch64: 17495)
  • Tomcat NIO2 HTTP/2: (x86_64: 16507, aarch64: 13966)
  • Tomcat NIO2 h2c: (x86_64: 19967, aarch64: 16605)
  • Tomcat APR HTTP/2: (x86_64: 18856, aarch64: 15866)
  • Tomcat APR h2c: (x86_64: 21514, aarch64: 16654)
  • Netty HTTP/2 : (x86_64: 24688, aarch64: 25344)
  • Golang HTTP/2: (x86_64: 23862, aarch64: 20489)
  • Node.js APR HTTP/2: (x86_64: 23808, aarch64: 14221)
  • Rust APR HTTP/2: (x86_64: 22897, aarch64: 23122)
  • .NET Core 3.1 HTTP/2: (x86_64: 11842, aarch64: 17158)
  • .NET Core 5.0 HTTP/2: (x86_64: 21512, aarch64: 21709)

The full Vegeta results could be seen here.

Above we see that Netty gives the best throughput, followed by Golang, then by Node.js, Rust, Apache Tomcat (NIO, then APR and NIO2) and finally .NET Core.

Netty, Rust and .NET also gave better results on aarch64 CPU!

In the beginning of my tests Apache Tomcat had some problems. The HTTP2 specification states that the web server should keep the closed streams for some time, without specifying how long that could be. Due to a bug (?!) in Vegeta/Golang net standard library Vegeta was sending RST_STREAM with error CANCEL after receiving the response. Tomcat was too aggressive in pruning the closed streams and this lead to many exceptions at the server side and closing the HTTP2 connection completely. Initially the throughput was around 2000 reqs/sec! Mark Thomas reduced the memory footprint of the closed streams and this allowed to keep many closed ones for a longer period of time, and this solved the issue with RST_STREAM+CANCEL and improved the throughput several times!

The improvements will be available with Apache Tomcat 10.0.0-M9 and 9.0.39.

Why other web servers give better throughput ?

With the help of Wireshark I’ve checked what kind of TCP packets are sent by each web server.

In the screenshot below we can see that Netty sends the HTTP2 SETTINGS, HEADERS and DATA frames in one TCP packet.

while Tomcat sends a separate TCP packet for each kind of HTTP2 frame (SETTINGS, PING, HEADERS, DATA, PRIORITY, WINDOW_UPDATE):

Golang, Node.js and Rust also employ this optimization. I’ve seen only packets with at most two frames inside them, but still it is better than sending the frames one by one.

Refactoring Tomcat to do the same would require some work but my guess is that this is the main difference.

HTTP 1.1 vs HTTP2

Using HTTP 1.1 would give you even better throughput for this particular benchmark setup!

HTTP2 would give an advantage to your web page/application if you make use of its multiplexing — with one request you can get several resources (e.g. an HTML page with all its static resources (CSS, JS, images, etc.)).

HTTP2 requires the server to keep state for each connection — the open and closed streams for this connection, while HTTP 1.1 is stateless.

--

--