Compare Memcached performance on x86_64 and arm64 CPU architectures

Martin Grigorov
3 min readMay 18, 2020

--

Last week I’ve shared with you the results of load testing Apache Tomcat on x86_64 and ARM64 CPU architecture. In this article I will test Memcached.

What is Memcached ?

From Wikipedia: Memcached is a general-purpose distributed memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source (such as a database or API) must be read.

In contrast to Apache Tomcat which is written in Java and thus is multi-platform Memcached is written in C and one needs to build it especially for the your CPU architecture. As stated at its Hardware Wiki page ARM64 is one of the officially supported architectures and there is a BuildBot builder testing all code changes! If you face any issue just report it at the project’s issue tracker! Dormando, the project maintainer, is very friendly and responsive!

In my first attempt to find a good load testing tool for Memcached I stumbled upon RedisLabs Memtier Benchmark tool. Running it on the same VMs as in the article for Apache Tomcat and the results were:

ASCII protocol on ARM64
ASCII protocol on x86_64

The command used to run the benchmark is:

./memtier_benchmark — server a.b.c.d— port 11211 -P memcache_text

Above we see that the Memcached server running on the ARM64 VM was slightly faster!

Note: the Memcached server was running with default settings (maximum 1024 connections, 4 threads and 64M memory), i.e. without specifying custom values.

For binary protocol the numbers are almost the same:

Binary protocol on ARM64
Binary protocol on x86_64

The command used to run the benchmark is:

./memtier_benchmark — server a.b.c.d — port 11211 -P memcache_binary

After sharing these results with Memcached community it was recommended to me to use MC Crusher tool instead. And indeed the numbers are much better with it:

ASCII protocol GET operations per second
ASCII protocol SET operations per second

In the first chart you may see that both on x86_64 and ARM64 it makes around 1.5 million get operations per second!

On the second chart it makes a little bit more than 900 thousand set operations per second on ARM64 and around 840 thousand ops per second on x86_64.

Note: Since mc-crusher tool does not provide any statistics from its execution I used Memcached’s stats command to get the number of executed operations.

Here are the settings used for the load test:

  1. The servers are started with:
$ memcached -t 16 -c 256 -m 2048

i.e. with 16 threads, maximum of 256 simultaneous connections and 2Gb memory.

2. MC Crusher

2.1. GET config

2.2. SET config

Note: In the chart for the GET operation you see that the number rises at May 13th 2020 from around 950K operations per second to around 1.6 million ops/s. At that day I’ve upgraded the VM that I use as a client, i.e. where I run the load testing tools (mc-crusher) because I’ve noticed that during the test run there were spikes when the client itself was overloaded.

2.3. Perl script that automates the testing

Once again we saw that ARM64 on the server could be as fast as x86_64!

If you have ideas how to improve this test or how to measure some other aspect of Memcached feel free to share it with me in the comments!

Happy hacking and stay safe!

--

--