Benchmarking Storage: Difference between revisions
(Page created) |
(Trying to explain how storage benchmarking works) |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
You should be familiar with [[Storage]] in Proxmox VE and that there exist two types of storage: | |||
= | * block based storage | ||
* file based storage | |||
with their pros and cons. | |||
Benchmarking itself is a very complex subject and this page should give you some simple commands and explanatory guidelines in order to enable you | |||
to judge for yourself, if the system performs reasonable or not. It does not go into details like (storage tiering, concurrent access, read/write amplification, alignment, thin provisioning, etc.) | |||
= Why benchmark = | |||
In short: benchmarking is a good tool for determining the speed of a storage system and compare it to other systems, hardware, setups and configuration settings. | |||
Without comparison, the benchmark is totally useless, therefore you need to have the same test environments and this page exists to lay down some ground rules. | |||
It is important to understand, that this does not mean that this solves your performance problem, or that you some small value in your test means that your system will be slow. | |||
A database server does have totally different performance requirements than a video streaming server, so a benchmark suite should cover all possible use cases without going | |||
into too much detail. All the presented commands try to emulate a specific workload that may or may not match yours. Often in an | |||
virtualisation eco system, you will have all of them at once. | |||
= What to benchmark = | |||
You can characterise your benchmark in at least these categories: | |||
* operation: read, write or mixture of both with a fixed ratio | |||
* access pattern: sequential or random | |||
* block size | |||
With these categories, you can try to emulate a real workload, e.g. | |||
* a video streaming server will mostly do sequential reads with a big block size to serve video content | |||
* an (oversimplified) database server will mostly do random reads of a fixed block size, e.g. 8 KB | |||
* a fileserver serves small and big files, so also a variable block size and mostly sequential reading. | |||
so the benchmark itself should emulate these examples. | |||
The benchmark itself is divided into two dependent performance metrics | |||
* number of input/output operations per second (IOPS) | |||
* throughput in MB/s | |||
Normally you get <code>throughput = IOPS * block size</code>. | |||
You should always benchmark what you actually use or want to use. It does not make any sense to benchmark only one disk in a RAID setup | |||
or the block device on which you run your favourite filesystem. In this storage layer view, you should always benchmark the final layer on which | |||
you access your data. It can be useful for general understanding to know in which layer you will have which performance and where you loose some, | |||
e.g. to fragmentation or simply to management overhead. | |||
= How to determine the blocksize = | |||
Depending on your storage type, this varies: | |||
* CEPH has 4 MB | |||
* ZFS ZVOL has 8K on PVE | |||
* EXT4 has 4K | |||
* ... | |||
= How to benchmark = | |||
You can use a lot of tools for this, most of them are limited to one major operating system. We will focus here on [https://github.com/axboe/fio <code>fio</code> by Jens Axboe], | |||
which is able to run on almost every platform including Linux, Windows, MacOS and is therefore comparable per se with benchmarking results of other setups across the board. | |||
In Proxmox VE as in Debian, you can just install it via <code>apt install fio</code> | |||
The tool has two modi to operation: | |||
* all commands specified on the command line | |||
* use a configuration file | |||
Dependent on your used storage type (file or block based), you need a test file or a block device to test. Make sure that you will not destroy any data if you perform a write test. | |||
best to double or tripple check the devices and files. | |||
All tests presented here are time-based, so that they measure 60 seconds how the system behaves. | |||
== Sequential Tests == | |||
The simplest way to benchmark sequential read and write operation is: | |||
<pre> | |||
root@testnode ~ > fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=/dev/sda | |||
seq_read: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 | |||
fio-2.16 | |||
Starting 1 process | |||
Jobs: 1 (f=1): [R(1)] [100.0% done] [90924KB/0KB/0KB /s] [22.8K/0/0 iops] [eta 00m:00s] | |||
seq_read: (groupid=0, jobs=1): err= 0: pid=26178: Sun Jan 6 22:59:55 2019 | |||
read : io=4909.8MB, bw=83792KB/s, iops=20947, runt= 60001msec | |||
slat (usec): min=2, max=387, avg= 4.15, stdev= 1.55 | |||
clat (usec): min=1, max=131280, avg=42.67, stdev=291.45 | |||
lat (usec): min=35, max=131291, avg=46.81, stdev=291.46 | |||
clat percentiles (usec): | |||
| 1.00th=[ 33], 5.00th=[ 34], 10.00th=[ 35], 20.00th=[ 36], | |||
| 30.00th=[ 38], 40.00th=[ 39], 50.00th=[ 40], 60.00th=[ 41], | |||
| 70.00th=[ 41], 80.00th=[ 42], 90.00th=[ 43], 95.00th=[ 45], | |||
| 99.00th=[ 69], 99.50th=[ 114], 99.90th=[ 213], 99.95th=[ 227], | |||
| 99.99th=[ 4016] | |||
lat (usec) : 2=0.01%, 4=0.01%, 20=0.01%, 50=97.56%, 100=1.83% | |||
lat (usec) : 250=0.57%, 500=0.02%, 750=0.01%, 1000=0.01% | |||
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% | |||
lat (msec) : 100=0.01%, 250=0.01% | |||
cpu : usr=4.62%, sys=14.10%, ctx=1256980, majf=11, minf=12 | |||
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% | |||
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% | |||
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% | |||
issued : total=r=1256894/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 | |||
latency : target=0, window=0, percentile=100.00%, depth=1 | |||
Run status group 0 (all jobs): | |||
READ: io=4909.8MB, aggrb=83791KB/s, minb=83791KB/s, maxb=83791KB/s, mint=60001msec, maxt=60001msec | |||
Disk stats (read/write): | |||
sda: ios=1255028/1886, merge=0/40, ticks=53632/2148, in_queue=55604, util=87.83% | |||
</pre> | |||
This yields the 4K performance. Another test ist the maximum throughput with a much higher blocksize, simulating video downloads: | |||
<pre> | |||
root@pvelocalhost ~ > fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=1M --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=/dev/sda | |||
seq_read: (g=0): rw=read, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=1 | |||
fio-2.16 | |||
Starting 1 process | |||
Jobs: 1 (f=1): [R(1)] [100.0% done] [157.0MB/0KB/0KB /s] [157/0/0 iops] [eta 00m:00s] | |||
seq_read: (groupid=0, jobs=1): err= 0: pid=427: Sun Jan 6 23:01:22 2019 | |||
read : io=9159.0MB, bw=156306KB/s, iops=152, runt= 60003msec | |||
slat (usec): min=20, max=232, avg=42.50, stdev= 8.78 | |||
clat (msec): min=1, max=173, avg= 6.50, stdev= 8.70 | |||
lat (msec): min=1, max=173, avg= 6.55, stdev= 8.70 | |||
clat percentiles (usec): | |||
| 1.00th=[ 1912], 5.00th=[ 1960], 10.00th=[ 5216], 20.00th=[ 5280], | |||
| 30.00th=[ 5536], 40.00th=[ 6240], 50.00th=[ 6240], 60.00th=[ 6304], | |||
| 70.00th=[ 6432], 80.00th=[ 6624], 90.00th=[ 6688], 95.00th=[ 6752], | |||
| 99.00th=[15040], 99.50th=[35584], 99.90th=[142336], 99.95th=[156672], | |||
| 99.99th=[173056] | |||
lat (msec) : 2=5.35%, 4=1.06%, 10=92.37%, 20=0.44%, 50=0.33% | |||
lat (msec) : 250=0.46% | |||
cpu : usr=0.12%, sys=0.83%, ctx=9168, majf=0, minf=266 | |||
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% | |||
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% | |||
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% | |||
issued : total=r=9159/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 | |||
latency : target=0, window=0, percentile=100.00%, depth=1 | |||
Run status group 0 (all jobs): | |||
READ: io=9159.0MB, aggrb=156305KB/s, minb=156305KB/s, maxb=156305KB/s, mint=60003msec, maxt=60003msec | |||
Disk stats (read/write): | |||
sda: ios=18323/1609, merge=0/34, ticks=102976/13736, in_queue=116948, util=99.35% | |||
</pre> | |||
.... other stuff to be filled .... | |||
= Links = | |||
* [http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ Sébastien Han - Ceph: how to test if your SSD is suitable as a journal device?] | |||
[[Category: HOWTO]] | [[Category: HOWTO]] |
Latest revision as of 22:06, 6 January 2019
You should be familiar with Storage in Proxmox VE and that there exist two types of storage:
- block based storage
- file based storage
with their pros and cons.
Benchmarking itself is a very complex subject and this page should give you some simple commands and explanatory guidelines in order to enable you to judge for yourself, if the system performs reasonable or not. It does not go into details like (storage tiering, concurrent access, read/write amplification, alignment, thin provisioning, etc.)
Why benchmark
In short: benchmarking is a good tool for determining the speed of a storage system and compare it to other systems, hardware, setups and configuration settings. Without comparison, the benchmark is totally useless, therefore you need to have the same test environments and this page exists to lay down some ground rules.
It is important to understand, that this does not mean that this solves your performance problem, or that you some small value in your test means that your system will be slow. A database server does have totally different performance requirements than a video streaming server, so a benchmark suite should cover all possible use cases without going into too much detail. All the presented commands try to emulate a specific workload that may or may not match yours. Often in an virtualisation eco system, you will have all of them at once.
What to benchmark
You can characterise your benchmark in at least these categories:
- operation: read, write or mixture of both with a fixed ratio
- access pattern: sequential or random
- block size
With these categories, you can try to emulate a real workload, e.g.
- a video streaming server will mostly do sequential reads with a big block size to serve video content
- an (oversimplified) database server will mostly do random reads of a fixed block size, e.g. 8 KB
- a fileserver serves small and big files, so also a variable block size and mostly sequential reading.
so the benchmark itself should emulate these examples.
The benchmark itself is divided into two dependent performance metrics
- number of input/output operations per second (IOPS)
- throughput in MB/s
Normally you get throughput = IOPS * block size
.
You should always benchmark what you actually use or want to use. It does not make any sense to benchmark only one disk in a RAID setup or the block device on which you run your favourite filesystem. In this storage layer view, you should always benchmark the final layer on which you access your data. It can be useful for general understanding to know in which layer you will have which performance and where you loose some, e.g. to fragmentation or simply to management overhead.
How to determine the blocksize
Depending on your storage type, this varies:
- CEPH has 4 MB
- ZFS ZVOL has 8K on PVE
- EXT4 has 4K
- ...
How to benchmark
You can use a lot of tools for this, most of them are limited to one major operating system. We will focus here on fio
by Jens Axboe,
which is able to run on almost every platform including Linux, Windows, MacOS and is therefore comparable per se with benchmarking results of other setups across the board.
In Proxmox VE as in Debian, you can just install it via apt install fio
The tool has two modi to operation:
- all commands specified on the command line
- use a configuration file
Dependent on your used storage type (file or block based), you need a test file or a block device to test. Make sure that you will not destroy any data if you perform a write test. best to double or tripple check the devices and files.
All tests presented here are time-based, so that they measure 60 seconds how the system behaves.
Sequential Tests
The simplest way to benchmark sequential read and write operation is:
root@testnode ~ > fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=/dev/sda seq_read: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 fio-2.16 Starting 1 process Jobs: 1 (f=1): [R(1)] [100.0% done] [90924KB/0KB/0KB /s] [22.8K/0/0 iops] [eta 00m:00s] seq_read: (groupid=0, jobs=1): err= 0: pid=26178: Sun Jan 6 22:59:55 2019 read : io=4909.8MB, bw=83792KB/s, iops=20947, runt= 60001msec slat (usec): min=2, max=387, avg= 4.15, stdev= 1.55 clat (usec): min=1, max=131280, avg=42.67, stdev=291.45 lat (usec): min=35, max=131291, avg=46.81, stdev=291.46 clat percentiles (usec): | 1.00th=[ 33], 5.00th=[ 34], 10.00th=[ 35], 20.00th=[ 36], | 30.00th=[ 38], 40.00th=[ 39], 50.00th=[ 40], 60.00th=[ 41], | 70.00th=[ 41], 80.00th=[ 42], 90.00th=[ 43], 95.00th=[ 45], | 99.00th=[ 69], 99.50th=[ 114], 99.90th=[ 213], 99.95th=[ 227], | 99.99th=[ 4016] lat (usec) : 2=0.01%, 4=0.01%, 20=0.01%, 50=97.56%, 100=1.83% lat (usec) : 250=0.57%, 500=0.02%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% lat (msec) : 100=0.01%, 250=0.01% cpu : usr=4.62%, sys=14.10%, ctx=1256980, majf=11, minf=12 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=1256894/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: io=4909.8MB, aggrb=83791KB/s, minb=83791KB/s, maxb=83791KB/s, mint=60001msec, maxt=60001msec Disk stats (read/write): sda: ios=1255028/1886, merge=0/40, ticks=53632/2148, in_queue=55604, util=87.83%
This yields the 4K performance. Another test ist the maximum throughput with a much higher blocksize, simulating video downloads:
root@pvelocalhost ~ > fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=1M --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=/dev/sda seq_read: (g=0): rw=read, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=1 fio-2.16 Starting 1 process Jobs: 1 (f=1): [R(1)] [100.0% done] [157.0MB/0KB/0KB /s] [157/0/0 iops] [eta 00m:00s] seq_read: (groupid=0, jobs=1): err= 0: pid=427: Sun Jan 6 23:01:22 2019 read : io=9159.0MB, bw=156306KB/s, iops=152, runt= 60003msec slat (usec): min=20, max=232, avg=42.50, stdev= 8.78 clat (msec): min=1, max=173, avg= 6.50, stdev= 8.70 lat (msec): min=1, max=173, avg= 6.55, stdev= 8.70 clat percentiles (usec): | 1.00th=[ 1912], 5.00th=[ 1960], 10.00th=[ 5216], 20.00th=[ 5280], | 30.00th=[ 5536], 40.00th=[ 6240], 50.00th=[ 6240], 60.00th=[ 6304], | 70.00th=[ 6432], 80.00th=[ 6624], 90.00th=[ 6688], 95.00th=[ 6752], | 99.00th=[15040], 99.50th=[35584], 99.90th=[142336], 99.95th=[156672], | 99.99th=[173056] lat (msec) : 2=5.35%, 4=1.06%, 10=92.37%, 20=0.44%, 50=0.33% lat (msec) : 250=0.46% cpu : usr=0.12%, sys=0.83%, ctx=9168, majf=0, minf=266 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=9159/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: io=9159.0MB, aggrb=156305KB/s, minb=156305KB/s, maxb=156305KB/s, mint=60003msec, maxt=60003msec Disk stats (read/write): sda: ios=18323/1609, merge=0/34, ticks=102976/13736, in_queue=116948, util=99.35%
.... other stuff to be filled ....