Benchmarking Storage: Difference between revisions

From Proxmox VE
Jump to navigation Jump to search
mNo edit summary
(Trying to explain how storage benchmarking works)
 
Line 1: Line 1:
tbd.
You should be familiar with [[Storage]] in Proxmox VE and that there exist two types of storage:
== Introduction ==
* block based storage
* file based storage
with their pros and cons.


== Links ==
Benchmarking itself is a very complex subject and this page should give you some simple commands and explanatory guidelines in order to enable you
to judge for yourself, if the system performs reasonable or not. It does not go into details like (storage tiering, concurrent access, read/write amplification, alignment, thin provisioning, etc.)
 
= Why benchmark =
 
In short: benchmarking is a good tool for determining the speed of a storage system and compare it to other systems, hardware, setups and configuration settings.
Without comparison, the benchmark is totally useless, therefore you need to have the same test environments and this page exists to lay down some ground rules.
 
It is important to understand, that this does not mean that this solves your performance problem, or that you some small value in your test means that your system will be slow.
A database server does have totally different performance requirements than a video streaming server, so a benchmark suite should cover all possible use cases without going
into too much detail.  All the presented commands try to emulate a specific workload that may or may not match yours. Often in an
virtualisation eco system, you will have all of them at once.
 
= What to benchmark =
 
You can characterise your benchmark in at least these categories:
* operation: read, write or mixture of both with a fixed ratio
* access pattern: sequential or random
* block size
With these categories, you can try to emulate a real workload, e.g.
* a video streaming server will mostly do sequential reads with a big block size to serve video content
* an (oversimplified) database server will mostly do random reads of a fixed block size, e.g. 8 KB
* a fileserver serves small and big files, so also a variable block size and mostly sequential reading.
so the benchmark itself should emulate these examples.
 
The benchmark itself is divided into two dependent performance metrics
* number of input/output operations per second (IOPS)
* throughput in MB/s
Normally you get <code>throughput = IOPS * block size</code>.
 
You should always benchmark what you actually use or want to use. It does not make any sense to benchmark only one disk in a RAID setup
or the block device on which you run your favourite filesystem. In this storage layer view, you should always benchmark the final layer on which
you access your data. It can be useful for general understanding to know in which layer you will have which performance and where you loose some,
e.g. to fragmentation or simply to management overhead.
 
= How to determine the blocksize =
 
Depending on your storage type, this varies:
* CEPH has 4 MB
* ZFS ZVOL has 8K on PVE
* EXT4 has 4K
* ...
 
= How to benchmark =
 
You can use a lot of tools for this, most of them are limited to one major operating system. We will focus here on [https://github.com/axboe/fio <code>fio</code> by Jens Axboe],
which is able to run on almost every platform including Linux, Windows, MacOS and is therefore comparable per se with benchmarking results of other setups across the board.
In Proxmox VE as in Debian, you can just install it via <code>apt install fio</code>
 
The tool has two modi to operation:
* all commands specified on the command line
* use a configuration file
 
Dependent on your used storage type (file or block based), you need a test file or a block device to test. Make sure that you will not destroy any data if you perform a write test.
best to double or tripple check the devices and files.
 
All tests presented here are time-based, so that they measure 60 seconds how the system behaves.
 
== Sequential Tests ==
 
The simplest way to benchmark sequential read and write operation is:
 
<pre>
root@testnode ~ >  fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=/dev/sda
seq_read: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [R(1)] [100.0% done] [90924KB/0KB/0KB /s] [22.8K/0/0 iops] [eta 00m:00s]
seq_read: (groupid=0, jobs=1): err= 0: pid=26178: Sun Jan  6 22:59:55 2019
  read : io=4909.8MB, bw=83792KB/s, iops=20947, runt= 60001msec
    slat (usec): min=2, max=387, avg= 4.15, stdev= 1.55
    clat (usec): min=1, max=131280, avg=42.67, stdev=291.45
    lat (usec): min=35, max=131291, avg=46.81, stdev=291.46
    clat percentiles (usec):
    |  1.00th=[  33],  5.00th=[  34], 10.00th=[  35], 20.00th=[  36],
    | 30.00th=[  38], 40.00th=[  39], 50.00th=[  40], 60.00th=[  41],
    | 70.00th=[  41], 80.00th=[  42], 90.00th=[  43], 95.00th=[  45],
    | 99.00th=[  69], 99.50th=[  114], 99.90th=[  213], 99.95th=[  227],
    | 99.99th=[ 4016]
    lat (usec) : 2=0.01%, 4=0.01%, 20=0.01%, 50=97.56%, 100=1.83%
    lat (usec) : 250=0.57%, 500=0.02%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
    lat (msec) : 100=0.01%, 250=0.01%
  cpu          : usr=4.62%, sys=14.10%, ctx=1256980, majf=11, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
    submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    issued    : total=r=1256894/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
    latency  : target=0, window=0, percentile=100.00%, depth=1
 
Run status group 0 (all jobs):
  READ: io=4909.8MB, aggrb=83791KB/s, minb=83791KB/s, maxb=83791KB/s, mint=60001msec, maxt=60001msec
 
Disk stats (read/write):
  sda: ios=1255028/1886, merge=0/40, ticks=53632/2148, in_queue=55604, util=87.83%
</pre>
 
This yields the 4K performance. Another test ist the maximum throughput with a much higher blocksize, simulating video downloads:
 
<pre>
root@pvelocalhost ~ > fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=1M --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=/dev/sda
seq_read: (g=0): rw=read, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=1
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [R(1)] [100.0% done] [157.0MB/0KB/0KB /s] [157/0/0 iops] [eta 00m:00s]
seq_read: (groupid=0, jobs=1): err= 0: pid=427: Sun Jan  6 23:01:22 2019
  read : io=9159.0MB, bw=156306KB/s, iops=152, runt= 60003msec
    slat (usec): min=20, max=232, avg=42.50, stdev= 8.78
    clat (msec): min=1, max=173, avg= 6.50, stdev= 8.70
    lat (msec): min=1, max=173, avg= 6.55, stdev= 8.70
    clat percentiles (usec):
    |  1.00th=[ 1912],  5.00th=[ 1960], 10.00th=[ 5216], 20.00th=[ 5280],
    | 30.00th=[ 5536], 40.00th=[ 6240], 50.00th=[ 6240], 60.00th=[ 6304],
    | 70.00th=[ 6432], 80.00th=[ 6624], 90.00th=[ 6688], 95.00th=[ 6752],
    | 99.00th=[15040], 99.50th=[35584], 99.90th=[142336], 99.95th=[156672],
    | 99.99th=[173056]
    lat (msec) : 2=5.35%, 4=1.06%, 10=92.37%, 20=0.44%, 50=0.33%
    lat (msec) : 250=0.46%
  cpu          : usr=0.12%, sys=0.83%, ctx=9168, majf=0, minf=266
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
    submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    issued    : total=r=9159/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
    latency  : target=0, window=0, percentile=100.00%, depth=1
 
Run status group 0 (all jobs):
  READ: io=9159.0MB, aggrb=156305KB/s, minb=156305KB/s, maxb=156305KB/s, mint=60003msec, maxt=60003msec
 
Disk stats (read/write):
  sda: ios=18323/1609, merge=0/34, ticks=102976/13736, in_queue=116948, util=99.35%
</pre>
 
 
.... other stuff to be filled ....
 
 
= Links =
 
* [http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ Sébastien Han - Ceph: how to test if your SSD is suitable as a journal device?]


[[Category: HOWTO]]
[[Category: HOWTO]]

Latest revision as of 22:06, 6 January 2019

You should be familiar with Storage in Proxmox VE and that there exist two types of storage:

  • block based storage
  • file based storage

with their pros and cons.

Benchmarking itself is a very complex subject and this page should give you some simple commands and explanatory guidelines in order to enable you to judge for yourself, if the system performs reasonable or not. It does not go into details like (storage tiering, concurrent access, read/write amplification, alignment, thin provisioning, etc.)

Why benchmark

In short: benchmarking is a good tool for determining the speed of a storage system and compare it to other systems, hardware, setups and configuration settings. Without comparison, the benchmark is totally useless, therefore you need to have the same test environments and this page exists to lay down some ground rules.

It is important to understand, that this does not mean that this solves your performance problem, or that you some small value in your test means that your system will be slow. A database server does have totally different performance requirements than a video streaming server, so a benchmark suite should cover all possible use cases without going into too much detail. All the presented commands try to emulate a specific workload that may or may not match yours. Often in an virtualisation eco system, you will have all of them at once.

What to benchmark

You can characterise your benchmark in at least these categories:

  • operation: read, write or mixture of both with a fixed ratio
  • access pattern: sequential or random
  • block size

With these categories, you can try to emulate a real workload, e.g.

  • a video streaming server will mostly do sequential reads with a big block size to serve video content
  • an (oversimplified) database server will mostly do random reads of a fixed block size, e.g. 8 KB
  • a fileserver serves small and big files, so also a variable block size and mostly sequential reading.

so the benchmark itself should emulate these examples.

The benchmark itself is divided into two dependent performance metrics

  • number of input/output operations per second (IOPS)
  • throughput in MB/s

Normally you get throughput = IOPS * block size.

You should always benchmark what you actually use or want to use. It does not make any sense to benchmark only one disk in a RAID setup or the block device on which you run your favourite filesystem. In this storage layer view, you should always benchmark the final layer on which you access your data. It can be useful for general understanding to know in which layer you will have which performance and where you loose some, e.g. to fragmentation or simply to management overhead.

How to determine the blocksize

Depending on your storage type, this varies:

  • CEPH has 4 MB
  • ZFS ZVOL has 8K on PVE
  • EXT4 has 4K
  • ...

How to benchmark

You can use a lot of tools for this, most of them are limited to one major operating system. We will focus here on fio by Jens Axboe, which is able to run on almost every platform including Linux, Windows, MacOS and is therefore comparable per se with benchmarking results of other setups across the board. In Proxmox VE as in Debian, you can just install it via apt install fio

The tool has two modi to operation:

  • all commands specified on the command line
  • use a configuration file

Dependent on your used storage type (file or block based), you need a test file or a block device to test. Make sure that you will not destroy any data if you perform a write test. best to double or tripple check the devices and files.

All tests presented here are time-based, so that they measure 60 seconds how the system behaves.

Sequential Tests

The simplest way to benchmark sequential read and write operation is:

root@testnode ~ >  fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=/dev/sda
seq_read: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [R(1)] [100.0% done] [90924KB/0KB/0KB /s] [22.8K/0/0 iops] [eta 00m:00s]
seq_read: (groupid=0, jobs=1): err= 0: pid=26178: Sun Jan  6 22:59:55 2019
  read : io=4909.8MB, bw=83792KB/s, iops=20947, runt= 60001msec
    slat (usec): min=2, max=387, avg= 4.15, stdev= 1.55
    clat (usec): min=1, max=131280, avg=42.67, stdev=291.45
     lat (usec): min=35, max=131291, avg=46.81, stdev=291.46
    clat percentiles (usec):
     |  1.00th=[   33],  5.00th=[   34], 10.00th=[   35], 20.00th=[   36],
     | 30.00th=[   38], 40.00th=[   39], 50.00th=[   40], 60.00th=[   41],
     | 70.00th=[   41], 80.00th=[   42], 90.00th=[   43], 95.00th=[   45],
     | 99.00th=[   69], 99.50th=[  114], 99.90th=[  213], 99.95th=[  227],
     | 99.99th=[ 4016]
    lat (usec) : 2=0.01%, 4=0.01%, 20=0.01%, 50=97.56%, 100=1.83%
    lat (usec) : 250=0.57%, 500=0.02%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
    lat (msec) : 100=0.01%, 250=0.01%
  cpu          : usr=4.62%, sys=14.10%, ctx=1256980, majf=11, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=1256894/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=4909.8MB, aggrb=83791KB/s, minb=83791KB/s, maxb=83791KB/s, mint=60001msec, maxt=60001msec

Disk stats (read/write):
  sda: ios=1255028/1886, merge=0/40, ticks=53632/2148, in_queue=55604, util=87.83%

This yields the 4K performance. Another test ist the maximum throughput with a much higher blocksize, simulating video downloads:

root@pvelocalhost ~ > fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=1M --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=/dev/sda
seq_read: (g=0): rw=read, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=1
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [R(1)] [100.0% done] [157.0MB/0KB/0KB /s] [157/0/0 iops] [eta 00m:00s]
seq_read: (groupid=0, jobs=1): err= 0: pid=427: Sun Jan  6 23:01:22 2019
  read : io=9159.0MB, bw=156306KB/s, iops=152, runt= 60003msec
    slat (usec): min=20, max=232, avg=42.50, stdev= 8.78
    clat (msec): min=1, max=173, avg= 6.50, stdev= 8.70
     lat (msec): min=1, max=173, avg= 6.55, stdev= 8.70
    clat percentiles (usec):
     |  1.00th=[ 1912],  5.00th=[ 1960], 10.00th=[ 5216], 20.00th=[ 5280],
     | 30.00th=[ 5536], 40.00th=[ 6240], 50.00th=[ 6240], 60.00th=[ 6304],
     | 70.00th=[ 6432], 80.00th=[ 6624], 90.00th=[ 6688], 95.00th=[ 6752],
     | 99.00th=[15040], 99.50th=[35584], 99.90th=[142336], 99.95th=[156672],
     | 99.99th=[173056]
    lat (msec) : 2=5.35%, 4=1.06%, 10=92.37%, 20=0.44%, 50=0.33%
    lat (msec) : 250=0.46%
  cpu          : usr=0.12%, sys=0.83%, ctx=9168, majf=0, minf=266
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=9159/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=9159.0MB, aggrb=156305KB/s, minb=156305KB/s, maxb=156305KB/s, mint=60003msec, maxt=60003msec

Disk stats (read/write):
  sda: ios=18323/1609, merge=0/34, ticks=102976/13736, in_queue=116948, util=99.35%


.... other stuff to be filled ....


Links