Compare Varnish Cache performance on x86_64 and aarch64 CPU architectures

6 min readJul 28, 2020

What is Varnish Cache ?

From project’s Wiki page:

Varnish is very flexible and as such can be used as a caching engine , a load balancer , a web application firewall, an edge authentication and authorization mechanism. Other use cases are HTTP routing, hotlinking protection, DDoS attack defender and, a lot, more.
Varnish is a reverse-proxy HTTP accelerator designed for heavily consumed API endpoints and also for dynamic, heavy-content, high-traffic websites.

From Wikipedia:

Varnish is an HTTP accelerator designed for content-heavy dynamic web sites as well as APIs

To compare its performance I am going to use it as a HTTP accelerator in front of a simple REST API — a GET endpoint written in Golang that just returns “Hello World” without reading/writing from/to disk or to the network:

The VMs I am going to use are the same from my previous similar posts:

x86_64:

aarch64

Note: the VMs are as close as possible in their hardware capabilities — same type and amount of RAM, same disks, network cards and bandwidth. Also the CPUs are as similar as possible but there are some differences:

the CPU frequency: 3000 MHz (x86_64) vs 2400 MHz (aarch64)
BogoMIPS: 6000 (x86_64) vs 200 (aarch64)
Level 1 caches: 128 KiB (x86_64) vs 512 KiB (aarch64)

Both VMs run Ubuntu 20.04 with latest software updates.

Varnish Cache is built from source for the master branch!

As load testing client I will use Vegeta. The client application runs on a third VM in the same network with the two above!

The command I use to run Vegeta is:

$ echo "GET http://192.168.0.232:8080" | vegeta attack -rate infinity -max-workers 128 -insecure -duration 300s | vegeta encode > report.json
$ vegeta report report.json

Usually Vegeta is used to measure the latency while using a constant rate/throughput — by using -rate N/s, where N is some positive number. As explained here I am going to use -rate infinity -max-workers M, where M is empirically found positive number that will load the load client VM CPU at 80–85%. By using -rate infinity I want to find the highest throughput the backend can serve.

Note: In the first version of this test I’ve used WRK as a HTTP load testing client but as noticed by the Varnish community it seems there is a bug in the calculation of its latency statistics — the standard deviation is bigger than the average, e.g.:

Running 30s test @ http://192.168.0.232:8080
 8 threads and 96 connections
 Thread Stats Avg Stdev Max +/- Stdev
 Latency 655.40us 798.70us 28.43ms 90.52%
 Req/Sec 20.95k 1.92k 28.68k 68.25%
 5010594 requests in 30.07s, 611.64MB read
Requests/sec: 166625.40
Transfer/sec: 20.34MB

This could happen only if some requests have negative latency. Highly unlikely!

To set a baseline I will execute the load client directly against the Golang based service running on both VMs:

env PORT=8080 go run http-server.go (Fish shell syntax)

aarch64

Host: http://192.168.0.232:8080
Requests      [total, rate, throughput]         11104754, 37015.67, 37015.18
Duration      [total, attack, wait]             5m0s, 5m0s, 4.016ms
Latencies     [min, mean, 50, 90, 95, 99, max]  144.242µs, 1.695ms, 1.109ms, 3.971ms, 5.105ms, 7.641ms, 42.098ms
Bytes In      [total, mean]                     122152294, 11.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:11104754
Error Set:

x86_64

Host: http://192.168.0.206:8080
Requests      [total, rate, throughput]         11123691, 37078.70, 37078.60
Duration      [total, attack, wait]             5m0s, 5m0s, 835.171µs
Latencies     [min, mean, 50, 90, 95, 99, max]  553.758µs, 2.019ms, 1.421ms, 4.212ms, 5.234ms, 7.511ms, 33.186ms
Bytes In      [total, mean]                     122360601, 11.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:11123691
Error Set:

So far the aarch64 VM gives slightly smaller throughput (37015.18 vs 37078.60) but also slightly better mean latency (1.695ms vs 2.019ms)!

Let’s involve Varnish Cache into the game!

I will stop the Golang based HTTP server running on port 8080, start a new one on port 8081 and start Varnish:

$ varnish-cache/sbin/varnishd -V
varnishd (varnish-trunk revision 6c3ce87fc182c6bde3a6ca30c7fcf58b0cf58504)
Copyright (c) 2006 Verdens Gang AS
Copyright (c) 2006-2020 Varnish Software AS$ varnish-cache/sbin/varnishd \
     -a http=:8080,HTTP \
     -f /home/ubuntu/varnish/etc/varnish.vcl \
     -s malloc,256m \
     -t 100000 \
     -n /home/ubuntu/varnish/work \
     -P /home/ubuntu/varnish/varnish.pid \
     -F

And varnish.vcl looks like:

vcl 4.1;

backend default {
  .host = "192.168.0.XYZ";
  .port = "8081";
  .connect_timeout = 60s;
  .first_byte_timeout = 300s;
}

where XYZ is the IP of the other VM, i.e. Varnish Cache running on aarch64 points to Golang HTTP server running on x86_64, and vise versa. This is not really important because Varnish hits the backend server just once and from there on serves the response from its cache, so a backend server running on the same host won’t extra load the system in this specific test.

The results from running Vegeta 5 times against Varnish are:

aarch64

Host: http://192.168.0.232:8080
Requests      [total, rate, throughput]         9474728, 31582.41, 31582.36
Duration      [total, attack, wait]             5m0s, 5m0s, 477.776µs
Latencies     [min, mean, 50, 90, 95, 99, max]  164.242µs, 1.711ms, 1.119ms, 3.99ms, 5.176ms, 7.928ms, 29.044ms
Bytes In      [total, mean]                     104222008, 11.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9474728
Error Set:Host: http://192.168.0.232:8080
Requests      [total, rate, throughput]         9533023, 31776.42, 31776.39
Duration      [total, attack, wait]             5m0s, 5m0s, 240.763µs
Latencies     [min, mean, 50, 90, 95, 99, max]  164.992µs, 1.676ms, 1.086ms, 3.938ms, 5.115ms, 7.797ms, 37.838ms
Bytes In      [total, mean]                     104863253, 11.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9533023  
Error Set:Host: http://192.168.0.232:8080
Requests      [total, rate, throughput]         9283363, 30916.00, 30915.94
Duration      [total, attack, wait]             5m0s, 5m0s, 544.057µs
Latencies     [min, mean, 50, 90, 95, 99, max]  168.873µs, 1.704ms, 1.108ms, 3.99ms, 5.176ms, 7.936ms, 36.926ms
Bytes In      [total, mean]                     102116993, 11.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9283363  
Error Set:Host: http://192.168.0.232:8080
Requests      [total, rate, throughput]         9357522, 31191.71, 31191.68
Duration      [total, attack, wait]             5m0s, 5m0s, 277.304µs
Latencies     [min, mean, 50, 90, 95, 99, max]  164.382µs, 1.704ms, 1.108ms, 3.987ms, 5.157ms, 7.953ms, 34.94ms
Bytes In      [total, mean]                     102932742, 11.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9357522  
Error Set:Host: http://192.168.0.232:8080
Requests      [total, rate, throughput]         9442222, 31473.86, 31473.83
Duration      [total, attack, wait]             5m0s, 5m0s, 312.585µs
Latencies     [min, mean, 50, 90, 95, 99, max]  166.902µs, 1.718ms, 1.119ms, 4.019ms, 5.193ms, 7.891ms, 34.967ms
Bytes In      [total, mean]                     103864442, 11.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9442222  
Error Set:

x86_64

Host: http://192.168.0.206:8080
Requests      [total, rate, throughput]         9382456, 31274.63, 31274.53
Duration      [total, attack, wait]             5m0s, 5m0s, 904.792µs
Latencies     [min, mean, 50, 90, 95, 99, max]  567.128µs, 2.07ms, 1.451ms, 4.325ms, 5.42ms, 7.966ms, 33.78ms
Bytes In      [total, mean]                     103207016, 11.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9382456
Error Set:Host: http://192.168.0.206:8080
Requests      [total, rate, throughput]         9523167, 31743.64, 31743.53
Duration      [total, attack, wait]             5m0s, 5m0s, 1.013ms
Latencies     [min, mean, 50, 90, 95, 99, max]  570.478µs, 2.024ms, 1.412ms, 4.246ms, 5.338ms, 7.811ms, 28.339ms
Bytes In      [total, mean]                     104754837, 11.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9523167
Error Set:Host: http://192.168.0.206:8080
Requests      [total, rate, throughput]         9402293, 31340.94, 31340.46
Duration      [total, attack, wait]             5m0s, 5m0s, 4.605ms
Latencies     [min, mean, 50, 90, 95, 99, max]  570.668µs, 2.053ms, 1.44ms, 4.297ms, 5.397ms, 7.929ms, 43.179ms
Bytes In      [total, mean]                     103425223, 11.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9402293  
Error Set:Host: http://192.168.0.206:8080
Requests      [total, rate, throughput]         9497124, 31656.89, 31656.74
Duration      [total, attack, wait]             5m0s, 5m0s, 1.445ms
Latencies     [min, mean, 50, 90, 95, 99, max]  569.538µs, 2.057ms, 1.442ms, 4.318ms, 5.397ms, 7.908ms, 27.122ms
Bytes In      [total, mean]                     104468364, 11.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9497124  
Error Set:Host: http://192.168.0.206:8080
Requests      [total, rate, throughput]         9547313, 31824.36, 31824.21
Duration      [total, attack, wait]             5m0s, 5m0s, 1.405ms
Latencies     [min, mean, 50, 90, 95, 99, max]  569.538µs, 2.05ms, 1.446ms, 4.278ms, 5.358ms, 7.822ms, 30.454ms
Bytes In      [total, mean]                     105020443, 11.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:9547313  
Error Set:

The throughput is almost the same for both architectures and the aarch64 VM gives slightly better latency!

Here is the output of varnishstat -n /home/ubuntu/varnish/work -1 for both instances:

aarch64

x86_64

If the backend service was a more realistic one, e.g. doing some calculations or IO operations, then Varnish Cache would definitely help! But it is interesting to find out whether something could be improved for this specific case of serving a static content!

Happy hacking and stay safe!

Compare Varnish Cache performance on x86_64 and aarch64 CPU architectures

Written by Martin Grigorov

No responses yet