Proxmox VE is based on the famous Debian Linux distribution. That means that you have access to the whole world of Debian packages, and the base system is well documented. The Debian Administrator's Handbook is available online, and provides a comprehensive introduction to the Debian operating system (see [Hertzog13]).
A standard Proxmox VE installation uses the default repositories from Debian, so you get bug fixes and security updates through that channel. In addition, we provide our own package repository to roll out all Proxmox VE related packages. This includes updates to some Debian packages when necessary.
We also deliver a specially optimized Linux kernel, where we enable all required virtualization and container features. That kernel includes drivers for ZFS, and several hardware drivers. For example, we ship Intel network card drivers to support their newest hardware.
The following sections will concentrate on virtualization related topics. They either explain things which are different on Proxmox VE, or tasks which are commonly used on Proxmox VE. For other topics, please refer to the standard Debian documentation.
All Debian based systems use APT as package management tool. The list of repositories is defined in /etc/apt/sources.list and .list files found inside /etc/apt/sources.d/. Updates can be installed directly using apt-get, or via the GUI.
Apt sources.list files list one package repository per line, with the most preferred source listed first. Empty lines are ignored, and a # character anywhere on a line marks the remainder of that line as a comment. The information available from the configured sources is acquired by apt-get update.
deb http://ftp.debian.org/debian buster main contrib deb http://ftp.debian.org/debian buster-updates main contrib # security updates deb http://security.debian.org buster/updates main contrib
In addition, Proxmox VE provides three different package repositories.
Proxmox VE Enterprise Repository
This is the default, stable and recommended repository, available for all Proxmox VE subscription users. It contains the most stable packages, and is suitable for production use. The pve-enterprise repository is enabled by default:
deb https://enterprise.proxmox.com/debian/pve buster pve-enterprise
As soon as updates are available, the root@pam user is notified via email about the available new packages. On the GUI, the change-log of each package can be viewed (if available), showing all details of the update. So you will never miss important security fixes.
Please note that you need a valid subscription key to access this repository. We offer different support levels, and you can find further details at https://www.proxmox.com/en/proxmox-ve/pricing.
|You can disable this repository by commenting out the above line using a # (at the start of the line). This prevents error messages if you do not have a subscription key. Please configure the pve-no-subscription repository in that case.|
Proxmox VE No-Subscription Repository
As the name suggests, you do not need a subscription key to access this repository. It can be used for testing and non-production use. Its not recommended to run on production servers, as these packages are not always heavily tested and validated.
We recommend to configure this repository in /etc/apt/sources.list.
deb http://ftp.debian.org/debian buster main contrib deb http://ftp.debian.org/debian buster-updates main contrib # PVE pve-no-subscription repository provided by proxmox.com, # NOT recommended for production use deb http://download.proxmox.com/debian/pve buster pve-no-subscription # security updates deb http://security.debian.org buster/updates main contrib
Proxmox VE Test Repository
Finally, there is a repository called pvetest. This one contains the latest packages and is heavily used by developers to test new features. As usual, you can configure this using /etc/apt/sources.list by adding the following line:
deb http://download.proxmox.com/debian/pve buster pvetest
|the pvetest repository should (as the name implies) only be used for testing new features or bug fixes.|
Proxmox VE Ceph Repository
This is Proxmox VE’s main Ceph repository and holds the Ceph packages for production use. You can also use this repository to update only the Ceph client.
deb http://download.proxmox.com/debian/ceph-nautilus buster main
Proxmox VE Ceph Testing Repository
This Ceph repository contains the Ceph packages before they are moved into the main repository and is used to test new Ceph release on Proxmox VE.
deb http://download.proxmox.com/debian/ceph-nautilus buster test
Proxmox VE Ceph Luminous Repository For Upgrade
This is a build of tje Ceph Luminous release for Proxmox VE 6.0, this can be used to upgrade a Proxmox VE cluster with Ceph Luminous deployed first to our 6.0 release, based on Debian Buster, and only afterwards upgrade the Ceph on it’s own.
deb http://download.proxmox.com/debian/ceph-luminous buster main
We use GnuPG to sign the Release files inside those repositories, and APT uses that signatures to verify that all packages are from a trusted source.
The key used for verification is already installed if you install from our installation CD. If you install by other means, you can manually download the key with:
# wget http://download.proxmox.com/debian/proxmox-ve-release-6.x.gpg -O /etc/apt/trusted.gpg.d/proxmox-ve-release-6.x.gpg
Please verify the checksum afterwards:
# sha512sum /etc/apt/trusted.gpg.d/proxmox-ve-release-6.x.gpg acca6f416917e8e11490a08a1e2842d500b3a5d9f322c6319db0927b2901c3eae23cfb5cd5df6facf2b57399d3cfa52ad7769ebdd75d9b204549ca147da52626 /etc/apt/trusted.gpg.d/proxmox-ve-release-6.x.gpg
# md5sum /etc/apt/trusted.gpg.d/proxmox-ve-release-6.x.gpg f3f6c5a3a67baf38ad178e5ff1ee270c /etc/apt/trusted.gpg.d/proxmox-ve-release-6.x.gpg
System Software Updates
We provide regular package updates on all repositories. You can install those update using the GUI, or you can directly run the CLI command apt-get:
apt-get update apt-get dist-upgrade
|The apt package management system is extremely flexible and provides countless of feature - see man apt-get or [Hertzog13] for additional information.|
You should do such updates at regular intervals, or when we release versions with security related fixes. Major system upgrades are announced at the Proxmox VE Community Forum. Those announcement also contain detailed upgrade instructions.
|We recommend to run regular upgrades, because it is important to get the latest security updates.|
Network configuration can be done either via the GUI, or by manually editing the file /etc/network/interfaces, which contains the whole network configuration. The interfaces(5) manual page contains the complete format description. All Proxmox VE tools try hard to keep direct user modifications, but using the GUI is still preferable, because it protects you from errors.
Once the network is configured, you can use the Debian traditional tools ifup and ifdown commands to bring interfaces up and down.
|Proxmox VE does not write changes directly to /etc/network/interfaces. Instead, we write into a temporary file called /etc/network/interfaces.new, and commit those changes when you reboot the node.|
We currently use the following naming conventions for device names:
Ethernet devices: en*, systemd network interface names. This naming scheme is used for new Proxmox VE installations since version 5.0.
Ethernet devices: eth[N], where 0 ≤ N (eth0, eth1, …) This naming scheme is used for Proxmox VE hosts which were installed before the 5.0 release. When upgrading to 5.0, the names are kept as-is.
Bridge names: vmbr[N], where 0 ≤ N ≤ 4094 (vmbr0 - vmbr4094)
Bonds: bond[N], where 0 ≤ N (bond0, bond1, …)
VLANs: Simply add the VLAN number to the device name, separated by a period (eno1.50, bond1.30)
This makes it easier to debug networks problems, because the device name implies the device type.
Systemd Network Interface Names
Systemd uses the two character prefix en for Ethernet network devices. The next characters depends on the device driver and the fact which schema matches first.
o<index>[n<phys_port_name>|d<dev_port>] — devices on board
s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — device by hotplug id
[P<domain>]p<bus>s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — devices by bus id
x<MAC> — device by MAC address
The most common patterns are:
eno1 — is the first on board NIC
enp3s0f1 — is the NIC on pcibus 3 slot 0 and use the NIC function 1.
For more information see Predictable Network Interface Names.
Choosing a network configuration
Depending on your current network organization and your resources you can choose either a bridged, routed, or masquerading networking setup.
Proxmox VE server in a private LAN, using an external gateway to reach the internet
The Bridged model makes the most sense in this case, and this is also the default mode on new Proxmox VE installations. Each of your Guest system will have a virtual interface attached to the Proxmox VE bridge. This is similar in effect to having the Guest network card directly connected to a new switch on your LAN, the Proxmox VE host playing the role of the switch.
Proxmox VE server at hosting provider, with public IP ranges for Guests
For this setup, you can use either a Bridged or Routed model, depending on what your provider allows.
Proxmox VE server at hosting provider, with a single public IP address
In that case the only way to get outgoing network accesses for your guest systems is to use Masquerading. For incoming network access to your guests, you will need to configure Port Forwarding.
For further flexibility, you can configure VLANs (IEEE 802.1q) and network bonding, also known as "link aggregation". That way it is possible to build complex and flexible virtual networks.
Default Configuration using a Bridge
Bridges are like physical network switches implemented in software. All VMs can share a single bridge, or you can create multiple bridges to separate network domains. Each host can have up to 4094 bridges.
The installation program creates a single bridge named vmbr0, which is connected to the first Ethernet card. The corresponding configuration in /etc/network/interfaces might look like this:
auto lo iface lo inet loopback iface eno1 inet manual auto vmbr0 iface vmbr0 inet static address 192.168.10.2 netmask 255.255.255.0 gateway 192.168.10.1 bridge_ports eno1 bridge_stp off bridge_fd 0
Virtual machines behave as if they were directly connected to the physical network. The network, in turn, sees each virtual machine as having its own MAC, even though there is only one network cable connecting all of these VMs to the network.
Most hosting providers do not support the above setup. For security reasons, they disable networking as soon as they detect multiple MAC addresses on a single interface.
|Some providers allows you to register additional MACs on their management interface. This avoids the problem, but is clumsy to configure because you need to register a MAC for each of your VMs.|
You can avoid the problem by “routing” all traffic via a single interface. This makes sure that all network packets use the same MAC address.
A common scenario is that you have a public IP (assume 198.51.100.5 for this example), and an additional IP block for your VMs (203.0.113.16/29). We recommend the following setup for such situations:
auto lo iface lo inet loopback auto eno1 iface eno1 inet static address 198.51.100.5 netmask 255.255.255.0 gateway 198.51.100.1 post-up echo 1 > /proc/sys/net/ipv4/ip_forward post-up echo 1 > /proc/sys/net/ipv4/conf/eno1/proxy_arp auto vmbr0 iface vmbr0 inet static address 203.0.113.17 netmask 255.255.255.248 bridge_ports none bridge_stp off bridge_fd 0
Masquerading (NAT) with iptables
Masquerading allows guests having only a private IP address to access the network by using the host IP address for outgoing traffic. Each outgoing packet is rewritten by iptables to appear as originating from the host, and responses are rewritten accordingly to be routed to the original sender.
auto lo iface lo inet loopback auto eno1 #real IP address iface eno1 inet static address 198.51.100.5 netmask 255.255.255.0 gateway 198.51.100.1 auto vmbr0 #private sub network iface vmbr0 inet static address 10.10.10.1 netmask 255.255.255.0 bridge_ports none bridge_stp off bridge_fd 0 post-up echo 1 > /proc/sys/net/ipv4/ip_forward post-up iptables -t nat -A POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE post-down iptables -t nat -D POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE
Bonding (also called NIC teaming or Link Aggregation) is a technique for binding multiple NIC’s to a single network device. It is possible to achieve different goals, like make the network fault-tolerant, increase the performance or both together.
High-speed hardware like Fibre Channel and the associated switching hardware can be quite expensive. By doing link aggregation, two NICs can appear as one logical interface, resulting in double speed. This is a native Linux kernel feature that is supported by most switches. If your nodes have multiple Ethernet ports, you can distribute your points of failure by running network cables to different switches and the bonded connection will failover to one cable or the other in case of network trouble.
Aggregated links can improve live-migration delays and improve the speed of replication of data between Proxmox VE Cluster nodes.
There are 7 modes for bonding:
Round-robin (balance-rr): Transmit network packets in sequential order from the first available network interface (NIC) slave through the last. This mode provides load balancing and fault tolerance.
Active-backup (active-backup): Only one NIC slave in the bond is active. A different slave becomes active if, and only if, the active slave fails. The single logical bonded interface’s MAC address is externally visible on only one NIC (port) to avoid distortion in the network switch. This mode provides fault tolerance.
XOR (balance-xor): Transmit network packets based on [(source MAC address XOR’d with destination MAC address) modulo NIC slave count]. This selects the same NIC slave for each destination MAC address. This mode provides load balancing and fault tolerance.
Broadcast (broadcast): Transmit network packets on all slave network interfaces. This mode provides fault tolerance.
IEEE 802.3ad Dynamic link aggregation (802.3ad)(LACP): Creates aggregation groups that share the same speed and duplex settings. Utilizes all slave network interfaces in the active aggregator group according to the 802.3ad specification.
Adaptive transmit load balancing (balance-tlb): Linux bonding driver mode that does not require any special network-switch support. The outgoing network packet traffic is distributed according to the current load (computed relative to the speed) on each network interface slave. Incoming traffic is received by one currently designated slave network interface. If this receiving slave fails, another slave takes over the MAC address of the failed receiving slave.
Adaptive load balancing (balance-alb): Includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special network switch support. The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the NIC slaves in the single logical bonded interface such that different network-peers use different MAC addresses for their network packet traffic.
If your switch support the LACP (IEEE 802.3ad) protocol then we recommend using
the corresponding bonding mode (802.3ad). Otherwise you should generally use the
If you intend to run your cluster network on the bonding interfaces, then you have to use active-passive mode on the bonding interfaces, other modes are unsupported.
The following bond configuration can be used as distributed/shared storage network. The benefit would be that you get more speed and the network will be fault-tolerant.
auto lo iface lo inet loopback iface eno1 inet manual iface eno2 inet manual auto bond0 iface bond0 inet static slaves eno1 eno2 address 192.168.1.2 netmask 255.255.255.0 bond_miimon 100 bond_mode 802.3ad bond_xmit_hash_policy layer2+3 auto vmbr0 iface vmbr0 inet static address 10.10.10.2 netmask 255.255.255.0 gateway 10.10.10.1 bridge_ports eno1 bridge_stp off bridge_fd 0
Another possibility it to use the bond directly as bridge port. This can be used to make the guest network fault-tolerant.
auto lo iface lo inet loopback iface eno1 inet manual iface eno2 inet manual auto bond0 iface bond0 inet manual slaves eno1 eno2 bond_miimon 100 bond_mode 802.3ad bond_xmit_hash_policy layer2+3 auto vmbr0 iface vmbr0 inet static address 10.10.10.2 netmask 255.255.255.0 gateway 10.10.10.1 bridge_ports bond0 bridge_stp off bridge_fd 0
A virtual LAN (VLAN) is a broadcast domain that is partitioned and isolated in the network at layer two. So it is possible to have multiple networks (4096) in a physical network, each independent of the other ones.
Each VLAN network is identified by a number often called tag. Network packages are then tagged to identify which virtual network they belong to.
VLAN for Guest Networks
Proxmox VE supports this setup out of the box. You can specify the VLAN tag when you create a VM. The VLAN tag is part of the guest network configuration. The networking layer supports different modes to implement VLANs, depending on the bridge configuration:
VLAN awareness on the Linux bridge: In this case, each guest’s virtual network card is assigned to a VLAN tag, which is transparently supported by the Linux bridge. Trunk mode is also possible, but that makes configuration in the guest necessary.
"traditional" VLAN on the Linux bridge: In contrast to the VLAN awareness method, this method is not transparent and creates a VLAN device with associated bridge for each VLAN. That is, creating a guest on VLAN 5 for example, would create two interfaces eno1.5 and vmbr0v5, which would remain until a reboot occurs.
Open vSwitch VLAN: This mode uses the OVS VLAN feature.
Guest configured VLAN: VLANs are assigned inside the guest. In this case, the setup is completely done inside the guest and can not be influenced from the outside. The benefit is that you can use more than one VLAN on a single virtual NIC.
VLAN on the Host
To allow host communication with an isolated network. It is possible to apply VLAN tags to any network device (NIC, Bond, Bridge). In general, you should configure the VLAN on the interface with the least abstraction layers between itself and the physical NIC.
For example, in a default configuration where you want to place the host management address on a separate VLAN.
auto lo iface lo inet loopback iface eno1 inet manual iface eno1.5 inet manual auto vmbr0v5 iface vmbr0v5 inet static address 10.10.10.2 netmask 255.255.255.0 gateway 10.10.10.1 bridge_ports eno1.5 bridge_stp off bridge_fd 0 auto vmbr0 iface vmbr0 inet manual bridge_ports eno1 bridge_stp off bridge_fd 0
auto lo iface lo inet loopback iface eno1 inet manual auto vmbr0.5 iface vmbr0.5 inet static address 10.10.10.2 netmask 255.255.255.0 gateway 10.10.10.1 auto vmbr0 iface vmbr0 inet manual bridge_ports eno1 bridge_stp off bridge_fd 0 bridge_vlan_aware yes
The next example is the same setup but a bond is used to make this network fail-safe.
auto lo iface lo inet loopback iface eno1 inet manual iface eno2 inet manual auto bond0 iface bond0 inet manual slaves eno1 eno2 bond_miimon 100 bond_mode 802.3ad bond_xmit_hash_policy layer2+3 iface bond0.5 inet manual auto vmbr0v5 iface vmbr0v5 inet static address 10.10.10.2 netmask 255.255.255.0 gateway 10.10.10.1 bridge_ports bond0.5 bridge_stp off bridge_fd 0 auto vmbr0 iface vmbr0 inet manual bridge_ports bond0 bridge_stp off bridge_fd 0
The Proxmox VE cluster stack itself relies heavily on the fact that all the nodes have precisely synchronized time. Some other components, like Ceph, also refuse to work properly if the local time on nodes is not in sync.
Time synchronization between nodes can be achieved with the “Network Time Protocol” (NTP). Proxmox VE uses systemd-timesyncd as NTP client by default, preconfigured to use a set of public servers. This setup works out of the box in most cases.
Using Custom NTP Servers
In some cases, it might be desired to not use the default NTP servers. For example, if your Proxmox VE nodes do not have access to the public internet (e.g., because of restrictive firewall rules), you need to setup local NTP servers and tell systemd-timesyncd to use them:
[Time] NTP=ntp1.example.com ntp2.example.com ntp3.example.com ntp4.example.com
After restarting the synchronization service (systemctl restart systemd-timesyncd) you should verify that your newly configured NTP servers are used by checking the journal (journalctl --since -1h -u systemd-timesyncd):
... Oct 07 14:58:36 node1 systemd: Stopping Network Time Synchronization... Oct 07 14:58:36 node1 systemd: Starting Network Time Synchronization... Oct 07 14:58:36 node1 systemd: Started Network Time Synchronization. Oct 07 14:58:36 node1 systemd-timesyncd: Using NTP server 10.0.0.1:123 (ntp1.example.com). Oct 07 14:58:36 nora systemd-timesyncd: interval/delta/delay/jitter/drift 64s/-0.002s/0.020s/0.000s/-31ppm ...
External Metric Server
Starting with Proxmox VE 4.0, you can define external metric servers, which will be sent various stats about your hosts, virtual machines and storages.
Currently supported are:
Graphite (see http://graphiteapp.org )
InfluxDB (see https://www.influxdata.com/time-series-platform/influxdb/ )
The server definitions are saved in /etc/pve/status.cfg
Graphite server configuration
The definition of a server is:
graphite: your-id server your-server port your-port path your-path
where your-port defaults to 2003 and your-path defaults to proxmox
Proxmox VE sends the data over UDP, so the graphite server has to be configured for this.
Influxdb plugin configuration
The definition is:
influxdb: your-id server your-server port your-port
Proxmox VE sends the data over UDP, so the influxdb server has to be configured for this.
Here is an example configuration for influxdb (on your influxdb server):
[[udp]] enabled = true bind-address = "0.0.0.0:8089" database = "proxmox" batch-size = 1000 batch-timeout = "1s"
With this configuration, your server listens on all IP addresses on port 8089, and writes the data in the proxmox database
Multiple Definitions and Example
The id is optional, but if you want to have multiple definitions of a single type, then the ids must be defined and different from each other.
Here is an example of a finished status.cfg
graphite: server 10.0.0.5 influxdb: influx1 server 10.0.0.6 port 8089 influxdb: influx2 server 10.0.0.7 port 8090
Disk Health Monitoring
Although a robust and redundant storage is recommended, it can be very helpful to monitor the health of your local disks.
Starting with Proxmox VE 4.3, the package smartmontools
[smartmontools homepage https://www.smartmontools.org]
is installed and required. This is a set of tools to monitor and control the S.M.A.R.T. system for local hard disks.
You can get the status of a disk by issuing the following command:
# smartctl -a /dev/sdX
where /dev/sdX is the path to one of your local disks.
If the output says:
SMART support is: Disabled
you can enable it with the command:
# smartctl -s on /dev/sdX
For more information on how to use smartctl, please see man smartctl.
By default, smartmontools daemon smartd is active and enabled, and scans the disks under /dev/sdX and /dev/hdX every 30 minutes for errors and warnings, and sends an e-mail to root if it detects a problem.
For more information about how to configure smartd, please see man smartd and man smartd.conf.
If you use your hard disks with a hardware raid controller, there are most likely tools to monitor the disks in the raid array and the array itself. For more information about this, please refer to the vendor of your raid controller.
Logical Volume Manager (LVM)
Most people install Proxmox VE directly on a local disk. The Proxmox VE installation CD offers several options for local disk management, and the current default setup uses LVM. The installer let you select a single disk for such setup, and uses that disk as physical volume for the Volume Group (VG) pve. The following output is from a test installation using a small 8GB disk:
# pvs PV VG Fmt Attr PSize PFree /dev/sda3 pve lvm2 a-- 7.87g 876.00m # vgs VG #PV #LV #SN Attr VSize VFree pve 1 3 0 wz--n- 7.87g 876.00m
The installer allocates three Logical Volumes (LV) inside this VG:
# lvs LV VG Attr LSize Pool Origin Data% Meta% data pve twi-a-tz-- 4.38g 0.00 0.63 root pve -wi-ao---- 1.75g swap pve -wi-ao---- 896.00m
Formatted as ext4, and contains the operation system.
This volume uses LVM-thin, and is used to store VM images. LVM-thin is preferable for this task, because it offers efficient support for snapshots and clones.
For Proxmox VE versions up to 4.1, the installer creates a standard logical volume called “data”, which is mounted at /var/lib/vz.
Starting from version 4.2, the logical volume “data” is a LVM-thin pool, used to store block based guest images, and /var/lib/vz is simply a directory on the root file system.
We highly recommend to use a hardware RAID controller (with BBU) for such setups. This increases performance, provides redundancy, and make disk replacements easier (hot-pluggable).
LVM itself does not need any special hardware, and memory requirements are very low.
We install two boot loaders by default. The first partition contains the standard GRUB boot loader. The second partition is an EFI System Partition (ESP), which makes it possible to boot on EFI systems.
Creating a Volume Group
Let’s assume we have an empty disk /dev/sdb, onto which we want to create a volume group named “vmdata”.
|Please note that the following commands will destroy all existing data on /dev/sdb.|
First create a partition.
# sgdisk -N 1 /dev/sdb
Create a Physical Volume (PV) without confirmation and 250K metadatasize.
# pvcreate --metadatasize 250k -y -ff /dev/sdb1
Create a volume group named “vmdata” on /dev/sdb1
# vgcreate vmdata /dev/sdb1
Creating an extra LV for /var/lib/vz
This can be easily done by creating a new thin LV.
# lvcreate -n <Name> -V <Size[M,G,T]> <VG>/<LVThin_pool>
A real world example:
# lvcreate -n vz -V 10G pve/data
Now a filesystem must be created on the LV.
# mkfs.ext4 /dev/pve/vz
At last this has to be mounted.
|be sure that /var/lib/vz is empty. On a default installation it’s not.|
To make it always accessible add the following line in /etc/fstab.
# echo '/dev/pve/vz /var/lib/vz ext4 defaults 0 2' >> /etc/fstab
Resizing the thin pool
Resize the LV and the metadata pool can be achieved with the following command.
# lvresize --size +<size[\M,G,T]> --poolmetadatasize +<size[\M,G]> <VG>/<LVThin_pool>
|When extending the data pool, the metadata pool must also be extended.|
Create a LVM-thin pool
A thin pool has to be created on top of a volume group. How to create a volume group see Section LVM.
# lvcreate -L 80G -T -n vmstore vmdata
ZFS on Linux
ZFS is a combined file system and logical volume manager designed by Sun Microsystems. Starting with Proxmox VE 3.4, the native Linux kernel port of the ZFS file system is introduced as optional file system and also as an additional selection for the root file system. There is no need for manually compile ZFS modules - all packages are included.
By using ZFS, its possible to achieve maximum enterprise features with low budget hardware, but also high performance systems by leveraging SSD caching or even SSD only setups. ZFS can replace cost intense hardware raid cards by moderate CPU and memory load combined with easy management.
Easy configuration and management with Proxmox VE GUI and CLI.
Protection against data corruption
Data compression on file system level
Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3
Can use SSD for cache
Continuous integrity checking
Designed for high storage capacities
Protection against data corruption
Asynchronous replication over network
ZFS depends heavily on memory, so you need at least 8GB to start. In practice, use as much you can get for your hardware/budget. To prevent data corruption, we recommend the use of high quality ECC RAM.
If you use a dedicated cache and/or log disk, you should use an enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can increase the overall performance significantly.
|Do not use ZFS on top of hardware controller which has its own cache management. ZFS needs to directly communicate with disks. An HBA adapter is the way to go, or something like LSI controller flashed in “IT” mode.|
If you are experimenting with an installation of Proxmox VE inside a VM (Nested Virtualization), don’t use virtio for disks of that VM, since they are not supported by ZFS. Use IDE or SCSI instead (works also with virtio SCSI controller type).
Installation as Root File System
When you install using the Proxmox VE installer, you can choose ZFS for the root file system. You need to select the RAID type at installation time:
Also called “striping”. The capacity of such volume is the sum of the capacities of all disks. But RAID0 does not add any redundancy, so the failure of a single drive makes the volume unusable.
Also called “mirroring”. Data is written identically to all disks. This mode requires at least 2 disks with the same size. The resulting capacity is that of a single disk.
A combination of RAID0 and RAID1. Requires at least 4 disks.
A variation on RAID-5, single parity. Requires at least 3 disks.
A variation on RAID-5, double parity. Requires at least 4 disks.
A variation on RAID-5, triple parity. Requires at least 5 disks.
The installer automatically partitions the disks, creates a ZFS pool called rpool, and installs the root file system on the ZFS subvolume rpool/ROOT/pve-1.
Another subvolume called rpool/data is created to store VM images. In order to use that with the Proxmox VE tools, the installer creates the following configuration entry in /etc/pve/storage.cfg:
zfspool: local-zfs pool rpool/data sparse content images,rootdir
After installation, you can view your ZFS pool status using the zpool command:
# zpool status pool: rpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sda2 ONLINE 0 0 0 sdb2 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 errors: No known data errors
The zfs command is used configure and manage your ZFS file systems. The following command lists all file systems after installation:
# zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 4.94G 7.68T 96K /rpool rpool/ROOT 702M 7.68T 96K /rpool/ROOT rpool/ROOT/pve-1 702M 7.68T 702M / rpool/data 96K 7.68T 96K /rpool/data rpool/swap 4.25G 7.69T 64K -
Depending on whether the system is booted in EFI or legacy BIOS mode the Proxmox VE installer sets up either grub or systemd-boot as main bootloader. See the chapter on Proxmox VE host bootladers for details.
This section gives you some usage examples for common tasks. ZFS itself is really powerful and provides many options. The main commands to manage ZFS are zfs and zpool. Both commands come with great manual pages, which can be read with:
# man zpool # man zfs
To create a new pool, at least one disk is needed. The ashift should have the same sector-size (2 power of ashift) or larger as the underlying disk.
zpool create -f -o ashift=12 <pool> <device>
To activate compression
zfs set compression=lz4 <pool>
Minimum 1 Disk
zpool create -f -o ashift=12 <pool> <device1> <device2>
Minimum 2 Disks
zpool create -f -o ashift=12 <pool> mirror <device1> <device2>
Minimum 4 Disks
zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4>
Minimum 3 Disks
zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3>
Minimum 4 Disks
zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4>
It is possible to use a dedicated cache drive partition to increase the performance (use SSD).
As <device> it is possible to use more devices, like it’s shown in "Create a new pool with RAID*".
zpool create -f -o ashift=12 <pool> <device> cache <cache_device>
It is possible to use a dedicated cache drive partition to increase the performance(SSD).
As <device> it is possible to use more devices, like it’s shown in "Create a new pool with RAID*".
zpool create -f -o ashift=12 <pool> <device> log <log_device>
If you have an pool without cache and log. First partition the SSD in 2 partition with parted or gdisk
|Always use GPT partition tables.|
The maximum size of a log device should be about half the size of physical memory, so this is usually quite small. The rest of the SSD can be used as cache.
zpool add -f <pool> log <device-part1> cache <device-part2>
zpool replace -f <pool> <old device> <new device>
sgdisk <healthy bootable device> -R <new device> sgdisk -G <new device> zpool replace -f <pool> <old zfs partition> <new zfs partition> pve-efiboot-tool format <new disk's ESP> pve-efiboot-tool init <new disk's ESP>
|ESP stands for EFI System Partition, which is setup as partition #2 on bootable disks setup by the Proxmox VE installer since version 5.4. For details, see Setting up a new partition for use as synced ESP.|
Activate E-Mail Notification
ZFS comes with an event daemon, which monitors events generated by the ZFS kernel module. The daemon can also send emails on ZFS events like pool errors. Newer ZFS packages ships the daemon in a separate package, and you can install it using apt-get:
# apt-get install zfs-zed
To activate the daemon it is necessary to edit /etc/zfs/zed.d/zed.rc with your favourite editor, and uncomment the ZED_EMAIL_ADDR setting:
Please note Proxmox VE forwards mails to root to the email address configured for the root user.
|The only setting that is required is ZED_EMAIL_ADDR. All other settings are optional.|
Limit ZFS Memory Usage
It is good to use at most 50 percent (which is the default) of the system memory for ZFS ARC to prevent performance shortage of the host. Use your preferred editor to change the configuration in /etc/modprobe.d/zfs.conf and insert:
options zfs zfs_arc_max=8589934592
This example setting limits the usage to 8GB.
If your root file system is ZFS you must update your initramfs every time this value changes:
SWAP on ZFS
Swap-space created on a zvol may generate some troubles, like blocking the server or generating a high IO load, often seen when starting a Backup to an external Storage.
We strongly recommend to use enough memory, so that you normally do not run into low memory situations. Should you need or want to add swap, it is preferred to create a partition on a physical disk and use it as swapdevice. You can leave some space free for this purpose in the advanced options of the installer. Additionally, you can lower the “swappiness” value. A good value for servers is 10:
sysctl -w vm.swappiness=10
To make the swappiness persistent, open /etc/sysctl.conf with an editor of your choice and add the following line:
vm.swappiness = 10
vm.swappiness = 0
The kernel will swap only to avoid an out of memory condition
vm.swappiness = 1
Minimum amount of swapping without disabling it entirely.
vm.swappiness = 10
This value is sometimes recommended to improve performance when sufficient memory exists in a system.
vm.swappiness = 60
The default value.
vm.swappiness = 100
The kernel will swap aggressively.
Encrypted ZFS Datasets
ZFS on Linux version 0.8.0 introduced support for native encryption of datasets. After an upgrade from previous ZFS on Linux versions, the encryption feature can be enabled per pool:
# zpool get feature@encryption tank NAME PROPERTY VALUE SOURCE tank feature@encryption disabled local # zpool set feature@encryption=enabled # zpool get feature@encryption tank NAME PROPERTY VALUE SOURCE tank feature@encryption enabled local
|There is currently no support for booting from pools with encrypted datasets using Grub, and only limited support for automatically unlocking encrypted datasets on boot. Older versions of ZFS without encryption support will not be able to decrypt stored data.|
|It is recommended to either unlock storage datasets manually after booting, or to write a custom unit to pass the key material needed for unlocking on boot to zfs load-key.|
|Establish and test a backup procedure before enabling encryption of production data.If the associated key material/passphrase/keyfile has been lost, accessing the encrypted data is no longer possible.|
Encryption needs to be setup when creating datasets/zvols, and is inherited by default to child datasets. For example, to create an encrypted dataset tank/encrypted_data and configure it as storage in Proxmox VE, run the following commands:
# zfs create -o encryption=on -o keyformat=passphrase tank/encrypted_data Enter passphrase: Re-enter passphrase: # pvesm add zfspool encrypted_zfs -pool tank/encrypted_data
All guest volumes/disks create on this storage will be encrypted with the shared key material of the parent dataset.
To actually use the storage, the associated key material needs to be loaded with zfs load-key:
# zfs load-key tank/encrypted_data Enter passphrase for 'tank/encrypted_data':
It is also possible to use a (random) keyfile instead of prompting for a passphrase by setting the keylocation and keyformat properties, either at creation time or with zfs change-key on existing datasets:
# dd if=/dev/urandom of=/path/to/keyfile bs=32 count=1 # zfs change-key -o keyformat=raw -o keylocation=file:///path/to/keyfile tank/encrypted_data
|When using a keyfile, special care needs to be taken to secure the keyfile against unauthorized access or accidental loss. Without the keyfile, it is not possible to access the plaintext data!|
A guest volume created underneath an encrypted dataset will have its encryptionroot property set accordingly. The key material only needs to be loaded once per encryptionroot to be available to all encrypted datasets underneath it.
See the encryptionroot, encryption, keylocation, keyformat and keystatus properties, the zfs load-key, zfs unload-key and zfs change-key commands and the Encryption section from man zfs for more details and advanced usage.
Certificates for communication within the cluster
Each Proxmox VE cluster creates its own (self-signed) Certificate Authority (CA) and generates a certificate for each node which gets signed by the aforementioned CA. These certificates are used for encrypted communication with the cluster’s pveproxy service and the Shell/Console feature if SPICE is used.
The CA certificate and key are stored in the Proxmox Cluster File System (pmxcfs).
Certificates for API and web GUI
The REST API and web GUI are provided by the pveproxy service, which runs on each node.
You have the following options for the certificate used by pveproxy:
By default the node-specific certificate in /etc/pve/nodes/NODENAME/pve-ssl.pem is used. This certificate is signed by the cluster CA and therefore not trusted by browsers and operating systems by default.
use an externally provided certificate (e.g. signed by a commercial CA).
use ACME (e.g., Let’s Encrypt) to get a trusted certificate with automatic renewal.
For options 2 and 3 the file /etc/pve/local/pveproxy-ssl.pem (and /etc/pve/local/pveproxy-ssl.key, which needs to be without password) is used.
Certificates are managed with the Proxmox VE Node management command (see the pvenode(1) manpage).
|Do not replace or manually modify the automatically generated node certificate files in /etc/pve/local/pve-ssl.pem and /etc/pve/local/pve-ssl.key or the cluster CA files in /etc/pve/pve-root-ca.pem and /etc/pve/priv/pve-root-ca.key.|
Getting trusted certificates via ACME
Proxmox VE includes an implementation of the Automatic Certificate Management Environment ACME protocol, allowing Proxmox VE admins to interface with Let’s Encrypt for easy setup of trusted TLS certificates which are accepted out of the box on most modern operating systems and browsers.
Currently the two ACME endpoints implemented are Let’s Encrypt (LE) and its staging environment (see https://letsencrypt.org), both using the standalone HTTP challenge.
Because of rate-limits you should use LE staging for experiments.
There are a few prerequisites to use Let’s Encrypt:
Port 80 of the node needs to be reachable from the internet.
There must be no other listener on port 80.
The requested (sub)domain needs to resolve to a public IP of the Node.
You have to accept the ToS of Let’s Encrypt.
At the moment the GUI uses only the default ACME account.
root@proxmox:~# pvenode acme account register default firstname.lastname@example.org Directory endpoints: 0) Let's Encrypt V2 (https://acme-v02.api.letsencrypt.org/directory) 1) Let's Encrypt V2 Staging (https://acme-staging-v02.api.letsencrypt.org/directory) 2) Custom Enter selection: 1 Attempting to fetch Terms of Service from 'https://acme-staging-v02.api.letsencrypt.org/directory'.. Terms of Service: https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf Do you agree to the above terms? [y|N]y Attempting to register account with 'https://acme-staging-v02.api.letsencrypt.org/directory'.. Generating ACME account key.. Registering ACME account.. Registration successful, account URL: 'https://acme-staging-v02.api.letsencrypt.org/acme/acct/xxxxxxx' Task OK root@proxmox:~# pvenode acme account list default root@proxmox:~# pvenode config set --acme domains=example.invalid root@proxmox:~# pvenode acme cert order Loading ACME account details Placing ACME order Order URL: https://acme-staging-v02.api.letsencrypt.org/acme/order/xxxxxxxxxxxxxx Getting authorization details from 'https://acme-staging-v02.api.letsencrypt.org/acme/authz/xxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxx-xxxxxxx' ... pending! Setting up webserver Triggering validation Sleeping for 5 seconds Status is 'valid'! All domains validated! Creating CSR Finalizing order Checking order status valid! Downloading certificate Setting pveproxy certificate and key Restarting pveproxy Task OK
Switching from the staging to the regular ACME directory
Changing the ACME directory for an account is unsupported. If you want to switch an account from the staging ACME directory to the regular, trusted, one you need to deactivate it and recreate it.
This procedure is also needed to change the default ACME account used in the GUI.
root@proxmox:~# pvenode acme account info default Directory URL: https://acme-staging-v02.api.letsencrypt.org/directory Account URL: https://acme-staging-v02.api.letsencrypt.org/acme/acct/6332194 Terms Of Service: https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf Account information: ID: xxxxxxx Contact: - mailto:email@example.com Creation date: 2018-07-31T08:41:44.54196435Z Initial IP: 192.0.2.1 Status: valid root@proxmox:~# pvenode acme account deactivate default Renaming account file from '/etc/pve/priv/acme/default' to '/etc/pve/priv/acme/_deactivated_default_4' Task OK root@proxmox:~# pvenode acme account register default firstname.lastname@example.org Directory endpoints: 0) Let's Encrypt V2 (https://acme-v02.api.letsencrypt.org/directory) 1) Let's Encrypt V2 Staging (https://acme-staging-v02.api.letsencrypt.org/directory) 2) Custom Enter selection: 0 Attempting to fetch Terms of Service from 'https://acme-v02.api.letsencrypt.org/directory'.. Terms of Service: https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf Do you agree to the above terms? [y|N]y Attempting to register account with 'https://acme-v02.api.letsencrypt.org/directory'.. Generating ACME account key.. Registering ACME account.. Registration successful, account URL: 'https://acme-v02.api.letsencrypt.org/acme/acct/39335247' Task OK
Automatic renewal of ACME certificates
If a node has been successfully configured with an ACME-provided certificate (either via pvenode or via the GUI), the certificate will be automatically renewed by the pve-daily-update.service. Currently, renewal will be attempted if the certificate has expired or will expire in the next 30 days.
Proxmox VE currently uses one of two bootloaders depending on the disk setup selected in the installer.
For EFI Systems installed with ZFS as the root filesystem systemd-boot is used. All other deployments use the standard grub bootloader (this usually also applies to systems which are installed on top of Debian).
Partitioning scheme used by the installer
The Proxmox VE installer creates 3 partitions on the bootable disks selected for installation. The bootable disks are:
For Installations with ext4 or xfs the selected disk
For ZFS installations all disks belonging to the first vdev:
The first disk for RAID0
All disks for RAID1, RAIDZ1, RAIDZ2, RAIDZ3
The first two disks for RAID10
The created partitions are:
a 1 MB BIOS Boot Partition (gdisk type EF02)
a 512 MB EFI System Partition (ESP, gdisk type EF00)
a third partition spanning the set hdsize parameter or the remaining space used for the chosen storage type
grub in BIOS mode (--target i386-pc) is installed onto the BIOS Boot Partition of all bootable disks for supporting older systems.
grub has been the de-facto standard for booting Linux systems for many years
and is quite well documented
[Grub Manual https://www.gnu.org/software/grub/manual/grub/grub.html]
The kernel and initrd images are taken from /boot and its configuration file /boot/grub/grub.cfg gets updated by the kernel installation process.
Changes to the grub configuration are done via the defaults file /etc/default/grub or config snippets in /etc/default/grub.d. To regenerate the /boot/grub/grub.cfg after a change to the configuration run:
systemd-boot is a lightweight EFI bootloader. It reads the kernel and initrd
images directly from the EFI Service Partition (ESP) where it is installed.
The main advantage of directly loading the kernel from the ESP is that it does
not need to reimplement the drivers for accessing the storage. In the context
of ZFS as root filesystem this means that you can use all optional features on
your root pool instead of the subset which is also present in the ZFS
implementation in grub or having to create a separate small boot-pool
[Booting ZFS on root with grub https://github.com/zfsonlinux/zfs/wiki/Debian-Stretch-Root-on-ZFS]
In setups with redundancy (RAID1, RAID10, RAIDZ*) all bootable disks (those being part of the first vdev) are partitioned with an ESP. This ensures the system boots even if the first boot device fails. The ESPs are kept in sync by a kernel postinstall hook script /etc/kernel/postinst.d/zz-pve-efiboot. The script copies certain kernel versions and the initrd images to EFI/proxmox/ on the root of each ESP and creates the appropriate config files in loader/entries/proxmox-*.conf. The pve-efiboot-tool script assists in managing both the synced ESPs themselves and their contents.
The following kernel versions are configured by default:
the currently running kernel
the version being newly installed on package updates
the two latest already installed kernels
the latest version of the second-to-last kernel series (e.g. 4.15, 5.0), if applicable
any manually selected kernels (see below)
The ESPs are not kept mounted during regular operation, in contrast to grub, which keeps an ESP mounted on /boot/efi. This helps to prevent filesystem corruption to the vfat formatted ESPs in case of a system crash, and removes the need to manually adapt /etc/fstab in case the primary boot device fails.
systemd-boot is configured via the file loader/loader.conf in the root directory of an EFI System Partition (ESP). See the loader.conf(5) manpage for details.
Each bootloader entry is placed in a file of its own in the directory loader/entries/
An example entry.conf looks like this (/ refers to the root of the ESP):
title Proxmox version 5.0.15-1-pve options root=ZFS=rpool/ROOT/pve-1 boot=zfs linux /EFI/proxmox/5.0.15-1-pve/vmlinuz-5.0.15-1-pve initrd /EFI/proxmox/5.0.15-1-pve/initrd.img-5.0.15-1-pve
Should you wish to add a certain kernel and initrd image to the list of bootable kernel use pve-efiboot-tool kernel add.
For example run the following to add the kernel with ABI version 5.0.15-1-pve to the list of kernels to keep installed and synced to all ESPs:
pve-efiboot-tool kernel add 5.0.15-1-pve
pve-efiboot-tool kernel list will list all kernel versions currently selected for booting:
# pve-efiboot-tool kernel list Manually selected kernels: 5.0.15-1-pve Automatically selected kernels: 5.0.12-1-pve 4.15.18-18-pve
Run pve-efiboot-tool remove to remove a kernel from the list of manually selected kernels, for example:
pve-efiboot-tool kernel remove 5.0.15-1-pve
|It’s required to run pve-efiboot-tool refresh to update all EFI System Partitions (ESPs) after a manual kernel addition or removal from above.|
To format and initialize a partition as synced ESP, e.g., after replacing a failed vdev in an rpool, or when converting an existing system that pre-dates the sync mechanism, pve-efiboot-tool from pve-kernel-helpers can be used.
|the format command will format the <partition>, make sure to pass in the right device/partition!|
For example, to format an empty partition /dev/sda2 as ESP, run the following:
pve-efiboot-tool format /dev/sda2
To setup an existing, unmounted ESP located on /dev/sda2 for inclusion in Proxmox VE’s kernel update synchronization mechanism, use the following:
pve-efiboot-tool init /dev/sda2
Afterwards /etc/kernel/pve-efiboot-uuids should contain a new line with the UUID of the newly added partition. The init command will also automatically trigger a refresh of all configured ESPs.
To copy and configure all bootable kernels and keep all ESPs listed in /etc/kernel/pve-efiboot-uuids in sync you just need to run:
(The equivalent to running update-grub on Systems being booted with grub).
This is necessary should you make changes to the kernel commandline, or want to sync all kernels and initrds after regenerating the latter.
Editing the kernel commandline
You can modify the kernel commandline in the following places, depending on the bootloarder used:
The kernel commandline needs to be placed in the variable GRUB_CMDLINE_LINUX_DEFAULT in the file /etc/default/grub. Running update-grub appends its content to all linux entries in /boot/grub/grub.cfg.
The kernel commandline needs to be placed as line in /etc/kernel/cmdline Running /etc/kernel/postinst.d/zz-pve-efiboot sets it as option line for all config files in loader/entries/proxmox-*.conf.