Proxmox VE is based on the famous Debian Linux distribution. That means that you have access to the whole world of Debian packages, and the base system is well documented. The Debian Administrator's Handbook is available online, and provides a comprehensive introduction to the Debian operating system (see [Hertzog13]).

A standard Proxmox VE installation uses the default repositories from Debian, so you get bug fixes and security updates through that channel. In addition, we provide our own package repository to roll out all Proxmox VE related packages. This includes updates to some Debian packages when necessary.

We also deliver a specially optimized Linux kernel, where we enable all required virtualization and container features. That kernel includes drivers for ZFS, and several hardware drivers. For example, we ship Intel network card drivers to support their newest hardware.

The following sections will concentrate on virtualization related topics. They either explain things which are different on Proxmox VE, or tasks which are commonly used on Proxmox VE. For other topics, please refer to the standard Debian documentation.

Package Repositories

All Debian based systems use APT as package management tool. The list of repositories is defined in /etc/apt/sources.list and .list files found inside /etc/apt/sources.d/. Updates can be installed directly using apt-get, or via the GUI.

Apt sources.list files list one package repository per line, with the most preferred source listed first. Empty lines are ignored, and a # character anywhere on a line marks the remainder of that line as a comment. The information available from the configured sources is acquired by apt-get update.

File /etc/apt/sources.list
deb http://ftp.debian.org/debian buster main contrib
deb http://ftp.debian.org/debian buster-updates main contrib

# security updates
deb http://security.debian.org buster/updates main contrib

In addition, Proxmox VE provides three different package repositories.

Proxmox VE Enterprise Repository

This is the default, stable and recommended repository, available for all Proxmox VE subscription users. It contains the most stable packages, and is suitable for production use. The pve-enterprise repository is enabled by default:

File /etc/apt/sources.list.d/pve-enterprise.list
deb https://enterprise.proxmox.com/debian/pve buster pve-enterprise

As soon as updates are available, the root@pam user is notified via email about the available new packages. On the GUI, the change-log of each package can be viewed (if available), showing all details of the update. So you will never miss important security fixes.

Please note that you need a valid subscription key to access this repository. We offer different support levels, and you can find further details at https://www.proxmox.com/en/proxmox-ve/pricing.

Note You can disable this repository by commenting out the above line using a # (at the start of the line). This prevents error messages if you do not have a subscription key. Please configure the pve-no-subscription repository in that case.

Proxmox VE No-Subscription Repository

As the name suggests, you do not need a subscription key to access this repository. It can be used for testing and non-production use. Its not recommended to run on production servers, as these packages are not always heavily tested and validated.

We recommend to configure this repository in /etc/apt/sources.list.

File /etc/apt/sources.list
deb http://ftp.debian.org/debian buster main contrib
deb http://ftp.debian.org/debian buster-updates main contrib

# PVE pve-no-subscription repository provided by proxmox.com,
# NOT recommended for production use
deb http://download.proxmox.com/debian/pve buster pve-no-subscription

# security updates
deb http://security.debian.org buster/updates main contrib

Proxmox VE Test Repository

Finally, there is a repository called pvetest. This one contains the latest packages and is heavily used by developers to test new features. As usual, you can configure this using /etc/apt/sources.list by adding the following line:

sources.list entry for pvetest
deb http://download.proxmox.com/debian/pve buster pvetest
Warning the pvetest repository should (as the name implies) only be used for testing new features or bug fixes.

Proxmox VE Ceph Repository

This is Proxmox VE’s main Ceph repository and holds the Ceph packages for production use. You can also use this repository to update only the Ceph client.

File /etc/apt/sources.list.d/ceph.list
deb http://download.proxmox.com/debian/ceph-nautilus buster main

Proxmox VE Ceph Testing Repository

This Ceph repository contains the Ceph packages before they are moved into the main repository and is used to test new Ceph release on Proxmox VE.

File /etc/apt/sources.list.d/ceph.list
deb http://download.proxmox.com/debian/ceph-nautilus buster test

Proxmox VE Ceph Luminous Repository For Upgrade

This is a build of tje Ceph Luminous release for Proxmox VE 6.0, this can be used to upgrade a Proxmox VE cluster with Ceph Luminous deployed first to our 6.0 release, based on Debian Buster, and only afterwards upgrade the Ceph on it’s own.

File /etc/apt/sources.list.d/ceph.list
deb http://download.proxmox.com/debian/ceph-luminous buster main

SecureApt

We use GnuPG to sign the Release files inside those repositories, and APT uses that signatures to verify that all packages are from a trusted source.

The key used for verification is already installed if you install from our installation CD. If you install by other means, you can manually download the key with:

# wget http://download.proxmox.com/debian/proxmox-ve-release-6.x.gpg -O /etc/apt/trusted.gpg.d/proxmox-ve-release-6.x.gpg

Please verify the checksum afterwards:

# sha512sum /etc/apt/trusted.gpg.d/proxmox-ve-release-6.x.gpg
acca6f416917e8e11490a08a1e2842d500b3a5d9f322c6319db0927b2901c3eae23cfb5cd5df6facf2b57399d3cfa52ad7769ebdd75d9b204549ca147da52626 /etc/apt/trusted.gpg.d/proxmox-ve-release-6.x.gpg

or

# md5sum /etc/apt/trusted.gpg.d/proxmox-ve-release-6.x.gpg
f3f6c5a3a67baf38ad178e5ff1ee270c /etc/apt/trusted.gpg.d/proxmox-ve-release-6.x.gpg

System Software Updates

We provide regular package updates on all repositories. You can install those update using the GUI, or you can directly run the CLI command apt-get:

apt-get update
apt-get dist-upgrade
Note The apt package management system is extremely flexible and provides countless of feature - see man apt-get or [Hertzog13] for additional information.

You should do such updates at regular intervals, or when we release versions with security related fixes. Major system upgrades are announced at the Proxmox VE Community Forum. Those announcement also contain detailed upgrade instructions.

Tip We recommend to run regular upgrades, because it is important to get the latest security updates.

Network Configuration

Network configuration can be done either via the GUI, or by manually editing the file /etc/network/interfaces, which contains the whole network configuration. The interfaces(5) manual page contains the complete format description. All Proxmox VE tools try hard to keep direct user modifications, but using the GUI is still preferable, because it protects you from errors.

Once the network is configured, you can use the Debian traditional tools ifup and ifdown commands to bring interfaces up and down.

Note Proxmox VE does not write changes directly to /etc/network/interfaces. Instead, we write into a temporary file called /etc/network/interfaces.new, and commit those changes when you reboot the node.

Naming Conventions

We currently use the following naming conventions for device names:

  • Ethernet devices: en*, systemd network interface names. This naming scheme is used for new Proxmox VE installations since version 5.0.

  • Ethernet devices: eth[N], where 0 ≤ N (eth0, eth1, …) This naming scheme is used for Proxmox VE hosts which were installed before the 5.0 release. When upgrading to 5.0, the names are kept as-is.

  • Bridge names: vmbr[N], where 0 ≤ N ≤ 4094 (vmbr0 - vmbr4094)

  • Bonds: bond[N], where 0 ≤ N (bond0, bond1, …)

  • VLANs: Simply add the VLAN number to the device name, separated by a period (eno1.50, bond1.30)

This makes it easier to debug networks problems, because the device name implies the device type.

Systemd Network Interface Names

Systemd uses the two character prefix en for Ethernet network devices. The next characters depends on the device driver and the fact which schema matches first.

  • o<index>[n<phys_port_name>|d<dev_port>] — devices on board

  • s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — device by hotplug id

  • [P<domain>]p<bus>s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — devices by bus id

  • x<MAC> — device by MAC address

The most common patterns are:

  • eno1 — is the first on board NIC

  • enp3s0f1 — is the NIC on pcibus 3 slot 0 and use the NIC function 1.

For more information see Predictable Network Interface Names.

Choosing a network configuration

Depending on your current network organization and your resources you can choose either a bridged, routed, or masquerading networking setup.

Proxmox VE server in a private LAN, using an external gateway to reach the internet

The Bridged model makes the most sense in this case, and this is also the default mode on new Proxmox VE installations. Each of your Guest system will have a virtual interface attached to the Proxmox VE bridge. This is similar in effect to having the Guest network card directly connected to a new switch on your LAN, the Proxmox VE host playing the role of the switch.

Proxmox VE server at hosting provider, with public IP ranges for Guests

For this setup, you can use either a Bridged or Routed model, depending on what your provider allows.

Proxmox VE server at hosting provider, with a single public IP address

In that case the only way to get outgoing network accesses for your guest systems is to use Masquerading. For incoming network access to your guests, you will need to configure Port Forwarding.

For further flexibility, you can configure VLANs (IEEE 802.1q) and network bonding, also known as "link aggregation". That way it is possible to build complex and flexible virtual networks.

Default Configuration using a Bridge

default-network-setup-bridge.svg

Bridges are like physical network switches implemented in software. All VMs can share a single bridge, or you can create multiple bridges to separate network domains. Each host can have up to 4094 bridges.

The installation program creates a single bridge named vmbr0, which is connected to the first Ethernet card. The corresponding configuration in /etc/network/interfaces might look like this:

auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.10.2
        netmask 255.255.255.0
        gateway 192.168.10.1
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0

Virtual machines behave as if they were directly connected to the physical network. The network, in turn, sees each virtual machine as having its own MAC, even though there is only one network cable connecting all of these VMs to the network.

Routed Configuration

Most hosting providers do not support the above setup. For security reasons, they disable networking as soon as they detect multiple MAC addresses on a single interface.

Tip Some providers allows you to register additional MACs on their management interface. This avoids the problem, but is clumsy to configure because you need to register a MAC for each of your VMs.

You can avoid the problem by “routing” all traffic via a single interface. This makes sure that all network packets use the same MAC address.

default-network-setup-routed.svg

A common scenario is that you have a public IP (assume 198.51.100.5 for this example), and an additional IP block for your VMs (203.0.113.16/29). We recommend the following setup for such situations:

auto lo
iface lo inet loopback

auto eno1
iface eno1 inet static
        address  198.51.100.5
        netmask  255.255.255.0
        gateway  198.51.100.1
        post-up echo 1 > /proc/sys/net/ipv4/ip_forward
        post-up echo 1 > /proc/sys/net/ipv4/conf/eno1/proxy_arp


auto vmbr0
iface vmbr0 inet static
        address  203.0.113.17
        netmask  255.255.255.248
        bridge_ports none
        bridge_stp off
        bridge_fd 0

Masquerading (NAT) with iptables

Masquerading allows guests having only a private IP address to access the network by using the host IP address for outgoing traffic. Each outgoing packet is rewritten by iptables to appear as originating from the host, and responses are rewritten accordingly to be routed to the original sender.

auto lo
iface lo inet loopback

auto eno1
#real IP address
iface eno1 inet static
        address  198.51.100.5
        netmask  255.255.255.0
        gateway  198.51.100.1

auto vmbr0
#private sub network
iface vmbr0 inet static
        address  10.10.10.1
        netmask  255.255.255.0
        bridge_ports none
        bridge_stp off
        bridge_fd 0

        post-up echo 1 > /proc/sys/net/ipv4/ip_forward
        post-up   iptables -t nat -A POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE
        post-down iptables -t nat -D POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE

Linux Bond

Bonding (also called NIC teaming or Link Aggregation) is a technique for binding multiple NIC’s to a single network device. It is possible to achieve different goals, like make the network fault-tolerant, increase the performance or both together.

High-speed hardware like Fibre Channel and the associated switching hardware can be quite expensive. By doing link aggregation, two NICs can appear as one logical interface, resulting in double speed. This is a native Linux kernel feature that is supported by most switches. If your nodes have multiple Ethernet ports, you can distribute your points of failure by running network cables to different switches and the bonded connection will failover to one cable or the other in case of network trouble.

Aggregated links can improve live-migration delays and improve the speed of replication of data between Proxmox VE Cluster nodes.

There are 7 modes for bonding:

  • Round-robin (balance-rr): Transmit network packets in sequential order from the first available network interface (NIC) slave through the last. This mode provides load balancing and fault tolerance.

  • Active-backup (active-backup): Only one NIC slave in the bond is active. A different slave becomes active if, and only if, the active slave fails. The single logical bonded interface’s MAC address is externally visible on only one NIC (port) to avoid distortion in the network switch. This mode provides fault tolerance.

  • XOR (balance-xor): Transmit network packets based on [(source MAC address XOR’d with destination MAC address) modulo NIC slave count]. This selects the same NIC slave for each destination MAC address. This mode provides load balancing and fault tolerance.

  • Broadcast (broadcast): Transmit network packets on all slave network interfaces. This mode provides fault tolerance.

  • IEEE 802.3ad Dynamic link aggregation (802.3ad)(LACP): Creates aggregation groups that share the same speed and duplex settings. Utilizes all slave network interfaces in the active aggregator group according to the 802.3ad specification.

  • Adaptive transmit load balancing (balance-tlb): Linux bonding driver mode that does not require any special network-switch support. The outgoing network packet traffic is distributed according to the current load (computed relative to the speed) on each network interface slave. Incoming traffic is received by one currently designated slave network interface. If this receiving slave fails, another slave takes over the MAC address of the failed receiving slave.

  • Adaptive load balancing (balance-alb): Includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special network switch support. The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the NIC slaves in the single logical bonded interface such that different network-peers use different MAC addresses for their network packet traffic.

If your switch support the LACP (IEEE 802.3ad) protocol then we recommend using the corresponding bonding mode (802.3ad). Otherwise you should generally use the active-backup mode.
If you intend to run your cluster network on the bonding interfaces, then you have to use active-passive mode on the bonding interfaces, other modes are unsupported.

The following bond configuration can be used as distributed/shared storage network. The benefit would be that you get more speed and the network will be fault-tolerant.

Example: Use bond with fixed IP address
auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual

auto bond0
iface bond0 inet static
      slaves eno1 eno2
      address  192.168.1.2
      netmask  255.255.255.0
      bond_miimon 100
      bond_mode 802.3ad
      bond_xmit_hash_policy layer2+3

auto vmbr0
iface vmbr0 inet static
        address  10.10.10.2
        netmask  255.255.255.0
        gateway  10.10.10.1
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0
default-network-setup-bond.svg

Another possibility it to use the bond directly as bridge port. This can be used to make the guest network fault-tolerant.

Example: Use a bond as bridge port
auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual

auto bond0
iface bond0 inet manual
      slaves eno1 eno2
      bond_miimon 100
      bond_mode 802.3ad
      bond_xmit_hash_policy layer2+3

auto vmbr0
iface vmbr0 inet static
        address  10.10.10.2
        netmask  255.255.255.0
        gateway  10.10.10.1
        bridge_ports bond0
        bridge_stp off
        bridge_fd 0

VLAN 802.1Q

A virtual LAN (VLAN) is a broadcast domain that is partitioned and isolated in the network at layer two. So it is possible to have multiple networks (4096) in a physical network, each independent of the other ones.

Each VLAN network is identified by a number often called tag. Network packages are then tagged to identify which virtual network they belong to.

VLAN for Guest Networks

Proxmox VE supports this setup out of the box. You can specify the VLAN tag when you create a VM. The VLAN tag is part of the guest network configuration. The networking layer supports different modes to implement VLANs, depending on the bridge configuration:

  • VLAN awareness on the Linux bridge: In this case, each guest’s virtual network card is assigned to a VLAN tag, which is transparently supported by the Linux bridge. Trunk mode is also possible, but that makes configuration in the guest necessary.

  • "traditional" VLAN on the Linux bridge: In contrast to the VLAN awareness method, this method is not transparent and creates a VLAN device with associated bridge for each VLAN. That is, creating a guest on VLAN 5 for example, would create two interfaces eno1.5 and vmbr0v5, which would remain until a reboot occurs.

  • Open vSwitch VLAN: This mode uses the OVS VLAN feature.

  • Guest configured VLAN: VLANs are assigned inside the guest. In this case, the setup is completely done inside the guest and can not be influenced from the outside. The benefit is that you can use more than one VLAN on a single virtual NIC.

VLAN on the Host

To allow host communication with an isolated network. It is possible to apply VLAN tags to any network device (NIC, Bond, Bridge). In general, you should configure the VLAN on the interface with the least abstraction layers between itself and the physical NIC.

For example, in a default configuration where you want to place the host management address on a separate VLAN.

Example: Use VLAN 5 for the Proxmox VE management IP with traditional Linux bridge
auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno1.5 inet manual

auto vmbr0v5
iface vmbr0v5 inet static
        address  10.10.10.2
        netmask  255.255.255.0
        gateway  10.10.10.1
        bridge_ports eno1.5
        bridge_stp off
        bridge_fd 0

auto vmbr0
iface vmbr0 inet manual
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0
Example: Use VLAN 5 for the Proxmox VE management IP with VLAN aware Linux bridge
auto lo
iface lo inet loopback

iface eno1 inet manual


auto vmbr0.5
iface vmbr0.5 inet static
        address  10.10.10.2
        netmask  255.255.255.0
        gateway  10.10.10.1

auto vmbr0
iface vmbr0 inet manual
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0
        bridge_vlan_aware yes

The next example is the same setup but a bond is used to make this network fail-safe.

Example: Use VLAN 5 with bond0 for the Proxmox VE management IP with traditional Linux bridge
auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual

auto bond0
iface bond0 inet manual
      slaves eno1 eno2
      bond_miimon 100
      bond_mode 802.3ad
      bond_xmit_hash_policy layer2+3

iface bond0.5 inet manual

auto vmbr0v5
iface vmbr0v5 inet static
        address  10.10.10.2
        netmask  255.255.255.0
        gateway  10.10.10.1
        bridge_ports bond0.5
        bridge_stp off
        bridge_fd 0

auto vmbr0
iface vmbr0 inet manual
        bridge_ports bond0
        bridge_stp off
        bridge_fd 0

Time Synchronization

The Proxmox VE cluster stack itself relies heavily on the fact that all the nodes have precisely synchronized time. Some other components, like Ceph, also refuse to work properly if the local time on nodes is not in sync.

Time synchronization between nodes can be achieved with the “Network Time Protocol” (NTP). Proxmox VE uses systemd-timesyncd as NTP client by default, preconfigured to use a set of public servers. This setup works out of the box in most cases.

Using Custom NTP Servers

In some cases, it might be desired to not use the default NTP servers. For example, if your Proxmox VE nodes do not have access to the public internet (e.g., because of restrictive firewall rules), you need to setup local NTP servers and tell systemd-timesyncd to use them:

File /etc/systemd/timesyncd.conf
[Time]
NTP=ntp1.example.com ntp2.example.com ntp3.example.com ntp4.example.com

After restarting the synchronization service (systemctl restart systemd-timesyncd) you should verify that your newly configured NTP servers are used by checking the journal (journalctl --since -1h -u systemd-timesyncd):

...
Oct 07 14:58:36 node1 systemd[1]: Stopping Network Time Synchronization...
Oct 07 14:58:36 node1 systemd[1]: Starting Network Time Synchronization...
Oct 07 14:58:36 node1 systemd[1]: Started Network Time Synchronization.
Oct 07 14:58:36 node1 systemd-timesyncd[13514]: Using NTP server 10.0.0.1:123 (ntp1.example.com).
Oct 07 14:58:36 nora systemd-timesyncd[13514]: interval/delta/delay/jitter/drift 64s/-0.002s/0.020s/0.000s/-31ppm
...

External Metric Server

Starting with Proxmox VE 4.0, you can define external metric servers, which will be sent various stats about your hosts, virtual machines and storages.

Currently supported are:

The server definitions are saved in /etc/pve/status.cfg

Graphite server configuration

The definition of a server is:

 graphite: your-id
    server your-server
    port your-port
    path your-path

where your-port defaults to 2003 and your-path defaults to proxmox

Proxmox VE sends the data over UDP, so the graphite server has to be configured for this.

Influxdb plugin configuration

The definition is:

 influxdb: your-id
    server your-server
    port your-port

Proxmox VE sends the data over UDP, so the influxdb server has to be configured for this.

Here is an example configuration for influxdb (on your influxdb server):

 [[udp]]
   enabled = true
   bind-address = "0.0.0.0:8089"
   database = "proxmox"
   batch-size = 1000
   batch-timeout = "1s"

With this configuration, your server listens on all IP addresses on port 8089, and writes the data in the proxmox database

Multiple Definitions and Example

The id is optional, but if you want to have multiple definitions of a single type, then the ids must be defined and different from each other.

Here is an example of a finished status.cfg

 graphite:
    server 10.0.0.5

 influxdb: influx1
    server 10.0.0.6
    port 8089

 influxdb: influx2
    server 10.0.0.7
    port 8090

Disk Health Monitoring

Although a robust and redundant storage is recommended, it can be very helpful to monitor the health of your local disks.

Starting with Proxmox VE 4.3, the package smartmontools
[smartmontools homepage https://www.smartmontools.org]
is installed and required. This is a set of tools to monitor and control the S.M.A.R.T. system for local hard disks.

You can get the status of a disk by issuing the following command:

# smartctl -a /dev/sdX

where /dev/sdX is the path to one of your local disks.

If the output says:

SMART support is: Disabled

you can enable it with the command:

# smartctl -s on /dev/sdX

For more information on how to use smartctl, please see man smartctl.

By default, smartmontools daemon smartd is active and enabled, and scans the disks under /dev/sdX and /dev/hdX every 30 minutes for errors and warnings, and sends an e-mail to root if it detects a problem.

For more information about how to configure smartd, please see man smartd and man smartd.conf.

If you use your hard disks with a hardware raid controller, there are most likely tools to monitor the disks in the raid array and the array itself. For more information about this, please refer to the vendor of your raid controller.

Logical Volume Manager (LVM)

Most people install Proxmox VE directly on a local disk. The Proxmox VE installation CD offers several options for local disk management, and the current default setup uses LVM. The installer let you select a single disk for such setup, and uses that disk as physical volume for the Volume Group (VG) pve. The following output is from a test installation using a small 8GB disk:

# pvs
  PV         VG   Fmt  Attr PSize PFree
  /dev/sda3  pve  lvm2 a--  7.87g 876.00m

# vgs
  VG   #PV #LV #SN Attr   VSize VFree
  pve    1   3   0 wz--n- 7.87g 876.00m

The installer allocates three Logical Volumes (LV) inside this VG:

# lvs
  LV   VG   Attr       LSize   Pool Origin Data%  Meta%
  data pve  twi-a-tz--   4.38g             0.00   0.63
  root pve  -wi-ao----   1.75g
  swap pve  -wi-ao---- 896.00m
root

Formatted as ext4, and contains the operation system.

swap

Swap partition

data

This volume uses LVM-thin, and is used to store VM images. LVM-thin is preferable for this task, because it offers efficient support for snapshots and clones.

For Proxmox VE versions up to 4.1, the installer creates a standard logical volume called “data”, which is mounted at /var/lib/vz.

Starting from version 4.2, the logical volume “data” is a LVM-thin pool, used to store block based guest images, and /var/lib/vz is simply a directory on the root file system.

Hardware

We highly recommend to use a hardware RAID controller (with BBU) for such setups. This increases performance, provides redundancy, and make disk replacements easier (hot-pluggable).

LVM itself does not need any special hardware, and memory requirements are very low.

Bootloader

We install two boot loaders by default. The first partition contains the standard GRUB boot loader. The second partition is an EFI System Partition (ESP), which makes it possible to boot on EFI systems.

Creating a Volume Group

Let’s assume we have an empty disk /dev/sdb, onto which we want to create a volume group named “vmdata”.

Caution Please note that the following commands will destroy all existing data on /dev/sdb.

First create a partition.

# sgdisk -N 1 /dev/sdb

Create a Physical Volume (PV) without confirmation and 250K metadatasize.

# pvcreate --metadatasize 250k -y -ff /dev/sdb1

Create a volume group named “vmdata” on /dev/sdb1

# vgcreate vmdata /dev/sdb1

Creating an extra LV for /var/lib/vz

This can be easily done by creating a new thin LV.

# lvcreate -n <Name> -V <Size[M,G,T]> <VG>/<LVThin_pool>

A real world example:

# lvcreate -n vz -V 10G pve/data

Now a filesystem must be created on the LV.

# mkfs.ext4 /dev/pve/vz

At last this has to be mounted.

Warning be sure that /var/lib/vz is empty. On a default installation it’s not.

To make it always accessible add the following line in /etc/fstab.

# echo '/dev/pve/vz /var/lib/vz ext4 defaults 0 2' >> /etc/fstab

Resizing the thin pool

Resize the LV and the metadata pool can be achieved with the following command.

# lvresize --size +<size[\M,G,T]> --poolmetadatasize +<size[\M,G]> <VG>/<LVThin_pool>
Note When extending the data pool, the metadata pool must also be extended.

Create a LVM-thin pool

A thin pool has to be created on top of a volume group. How to create a volume group see Section LVM.

# lvcreate -L 80G -T -n vmstore vmdata

ZFS on Linux

ZFS is a combined file system and logical volume manager designed by Sun Microsystems. Starting with Proxmox VE 3.4, the native Linux kernel port of the ZFS file system is introduced as optional file system and also as an additional selection for the root file system. There is no need for manually compile ZFS modules - all packages are included.

By using ZFS, its possible to achieve maximum enterprise features with low budget hardware, but also high performance systems by leveraging SSD caching or even SSD only setups. ZFS can replace cost intense hardware raid cards by moderate CPU and memory load combined with easy management.

General ZFS advantages
  • Easy configuration and management with Proxmox VE GUI and CLI.

  • Reliable

  • Protection against data corruption

  • Data compression on file system level

  • Snapshots

  • Copy-on-write clone

  • Various raid levels: RAID0, RAID1, RAID10, RAIDZ-1, RAIDZ-2 and RAIDZ-3

  • Can use SSD for cache

  • Self healing

  • Continuous integrity checking

  • Designed for high storage capacities

  • Protection against data corruption

  • Asynchronous replication over network

  • Open Source

  • Encryption

Hardware

ZFS depends heavily on memory, so you need at least 8GB to start. In practice, use as much you can get for your hardware/budget. To prevent data corruption, we recommend the use of high quality ECC RAM.

If you use a dedicated cache and/or log disk, you should use an enterprise class SSD (e.g. Intel SSD DC S3700 Series). This can increase the overall performance significantly.

Important Do not use ZFS on top of hardware controller which has its own cache management. ZFS needs to directly communicate with disks. An HBA adapter is the way to go, or something like LSI controller flashed in “IT” mode.

If you are experimenting with an installation of Proxmox VE inside a VM (Nested Virtualization), don’t use virtio for disks of that VM, since they are not supported by ZFS. Use IDE or SCSI instead (works also with virtio SCSI controller type).

Installation as Root File System

When you install using the Proxmox VE installer, you can choose ZFS for the root file system. You need to select the RAID type at installation time:

RAID0

Also called “striping”. The capacity of such volume is the sum of the capacities of all disks. But RAID0 does not add any redundancy, so the failure of a single drive makes the volume unusable.

RAID1

Also called “mirroring”. Data is written identically to all disks. This mode requires at least 2 disks with the same size. The resulting capacity is that of a single disk.

RAID10

A combination of RAID0 and RAID1. Requires at least 4 disks.

RAIDZ-1

A variation on RAID-5, single parity. Requires at least 3 disks.

RAIDZ-2

A variation on RAID-5, double parity. Requires at least 4 disks.

RAIDZ-3

A variation on RAID-5, triple parity. Requires at least 5 disks.

The installer automatically partitions the disks, creates a ZFS pool called rpool, and installs the root file system on the ZFS subvolume rpool/ROOT/pve-1.

Another subvolume called rpool/data is created to store VM images. In order to use that with the Proxmox VE tools, the installer creates the following configuration entry in /etc/pve/storage.cfg:

zfspool: local-zfs
        pool rpool/data
        sparse
        content images,rootdir

After installation, you can view your ZFS pool status using the zpool command:

# zpool status
  pool: rpool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda2    ONLINE       0     0     0
            sdb2    ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0

errors: No known data errors

The zfs command is used configure and manage your ZFS file systems. The following command lists all file systems after installation:

# zfs list
NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool             4.94G  7.68T    96K  /rpool
rpool/ROOT         702M  7.68T    96K  /rpool/ROOT
rpool/ROOT/pve-1   702M  7.68T   702M  /
rpool/data          96K  7.68T    96K  /rpool/data
rpool/swap        4.25G  7.69T    64K  -

Bootloader

Depending on whether the system is booted in EFI or legacy BIOS mode the Proxmox VE installer sets up either grub or systemd-boot as main bootloader. See the chapter on Proxmox VE host bootladers for details.

ZFS Administration

This section gives you some usage examples for common tasks. ZFS itself is really powerful and provides many options. The main commands to manage ZFS are zfs and zpool. Both commands come with great manual pages, which can be read with:

# man zpool
# man zfs
Create a new zpool

To create a new pool, at least one disk is needed. The ashift should have the same sector-size (2 power of ashift) or larger as the underlying disk.

zpool create -f -o ashift=12 <pool> <device>

To activate compression

zfs set compression=lz4 <pool>
Create a new pool with RAID-0

Minimum 1 Disk

zpool create -f -o ashift=12 <pool> <device1> <device2>
Create a new pool with RAID-1

Minimum 2 Disks

zpool create -f -o ashift=12 <pool> mirror <device1> <device2>
Create a new pool with RAID-10

Minimum 4 Disks

zpool create -f -o ashift=12 <pool> mirror <device1> <device2> mirror <device3> <device4>
Create a new pool with RAIDZ-1

Minimum 3 Disks

zpool create -f -o ashift=12 <pool> raidz1 <device1> <device2> <device3>
Create a new pool with RAIDZ-2

Minimum 4 Disks

zpool create -f -o ashift=12 <pool> raidz2 <device1> <device2> <device3> <device4>
Create a new pool with cache (L2ARC)

It is possible to use a dedicated cache drive partition to increase the performance (use SSD).

As <device> it is possible to use more devices, like it’s shown in "Create a new pool with RAID*".

zpool create -f -o ashift=12 <pool> <device> cache <cache_device>
Create a new pool with log (ZIL)

It is possible to use a dedicated cache drive partition to increase the performance(SSD).

As <device> it is possible to use more devices, like it’s shown in "Create a new pool with RAID*".

zpool create -f -o ashift=12 <pool> <device> log <log_device>
Add cache and log to an existing pool

If you have an pool without cache and log. First partition the SSD in 2 partition with parted or gdisk

Important Always use GPT partition tables.

The maximum size of a log device should be about half the size of physical memory, so this is usually quite small. The rest of the SSD can be used as cache.

zpool add -f <pool> log <device-part1> cache <device-part2>
Changing a failed device
zpool replace -f <pool> <old device> <new device>
Changing a failed bootable device when using systemd-boot
sgdisk <healthy bootable device> -R <new device>
sgdisk -G <new device>
zpool replace -f <pool> <old zfs partition> <new zfs partition>
pve-efiboot-tool format <new disk's ESP>
pve-efiboot-tool init <new disk's ESP>
Note ESP stands for EFI System Partition, which is setup as partition #2 on bootable disks setup by the Proxmox VE installer since version 5.4. For details, see Setting up a new partition for use as synced ESP.

Activate E-Mail Notification

ZFS comes with an event daemon, which monitors events generated by the ZFS kernel module. The daemon can also send emails on ZFS events like pool errors. Newer ZFS packages ships the daemon in a separate package, and you can install it using apt-get:

# apt-get install zfs-zed

To activate the daemon it is necessary to edit /etc/zfs/zed.d/zed.rc with your favourite editor, and uncomment the ZED_EMAIL_ADDR setting:

ZED_EMAIL_ADDR="root"

Please note Proxmox VE forwards mails to root to the email address configured for the root user.

Important The only setting that is required is ZED_EMAIL_ADDR. All other settings are optional.

Limit ZFS Memory Usage

It is good to use at most 50 percent (which is the default) of the system memory for ZFS ARC to prevent performance shortage of the host. Use your preferred editor to change the configuration in /etc/modprobe.d/zfs.conf and insert:

options zfs zfs_arc_max=8589934592

This example setting limits the usage to 8GB.

Important

If your root file system is ZFS you must update your initramfs every time this value changes:

update-initramfs -u

SWAP on ZFS

Swap-space created on a zvol may generate some troubles, like blocking the server or generating a high IO load, often seen when starting a Backup to an external Storage.

We strongly recommend to use enough memory, so that you normally do not run into low memory situations. Should you need or want to add swap, it is preferred to create a partition on a physical disk and use it as swapdevice. You can leave some space free for this purpose in the advanced options of the installer. Additionally, you can lower the “swappiness” value. A good value for servers is 10:

sysctl -w vm.swappiness=10

To make the swappiness persistent, open /etc/sysctl.conf with an editor of your choice and add the following line:

vm.swappiness = 10
Table 1. Linux kernel swappiness parameter values
Value Strategy

vm.swappiness = 0

The kernel will swap only to avoid an out of memory condition

vm.swappiness = 1

Minimum amount of swapping without disabling it entirely.

vm.swappiness = 10

This value is sometimes recommended to improve performance when sufficient memory exists in a system.

vm.swappiness = 60

The default value.

vm.swappiness = 100

The kernel will swap aggressively.

Encrypted ZFS Datasets

ZFS on Linux version 0.8.0 introduced support for native encryption of datasets. After an upgrade from previous ZFS on Linux versions, the encryption feature can be enabled per pool:

# zpool get feature@encryption tank
NAME  PROPERTY            VALUE            SOURCE
tank  feature@encryption  disabled         local

# zpool set feature@encryption=enabled

# zpool get feature@encryption tank
NAME  PROPERTY            VALUE            SOURCE
tank  feature@encryption  enabled         local
Warning There is currently no support for booting from pools with encrypted datasets using Grub, and only limited support for automatically unlocking encrypted datasets on boot. Older versions of ZFS without encryption support will not be able to decrypt stored data.
Note It is recommended to either unlock storage datasets manually after booting, or to write a custom unit to pass the key material needed for unlocking on boot to zfs load-key.
Warning Establish and test a backup procedure before enabling encryption of production data.If the associated key material/passphrase/keyfile has been lost, accessing the encrypted data is no longer possible.

Encryption needs to be setup when creating datasets/zvols, and is inherited by default to child datasets. For example, to create an encrypted dataset tank/encrypted_data and configure it as storage in Proxmox VE, run the following commands:

# zfs create -o encryption=on -o keyformat=passphrase tank/encrypted_data
Enter passphrase:
Re-enter passphrase:

# pvesm add zfspool encrypted_zfs -pool tank/encrypted_data

All guest volumes/disks create on this storage will be encrypted with the shared key material of the parent dataset.

To actually use the storage, the associated key material needs to be loaded with zfs load-key:

# zfs load-key tank/encrypted_data
Enter passphrase for 'tank/encrypted_data':

It is also possible to use a (random) keyfile instead of prompting for a passphrase by setting the keylocation and keyformat properties, either at creation time or with zfs change-key on existing datasets:

# dd if=/dev/urandom of=/path/to/keyfile bs=32 count=1

# zfs change-key -o keyformat=raw -o keylocation=file:///path/to/keyfile tank/encrypted_data
Warning When using a keyfile, special care needs to be taken to secure the keyfile against unauthorized access or accidental loss. Without the keyfile, it is not possible to access the plaintext data!

A guest volume created underneath an encrypted dataset will have its encryptionroot property set accordingly. The key material only needs to be loaded once per encryptionroot to be available to all encrypted datasets underneath it.

See the encryptionroot, encryption, keylocation, keyformat and keystatus properties, the zfs load-key, zfs unload-key and zfs change-key commands and the Encryption section from man zfs for more details and advanced usage.

Certificate Management

Certificates for communication within the cluster

Each Proxmox VE cluster creates its own (self-signed) Certificate Authority (CA) and generates a certificate for each node which gets signed by the aforementioned CA. These certificates are used for encrypted communication with the cluster’s pveproxy service and the Shell/Console feature if SPICE is used.

The CA certificate and key are stored in the Proxmox Cluster File System (pmxcfs).

Certificates for API and web GUI

The REST API and web GUI are provided by the pveproxy service, which runs on each node.

You have the following options for the certificate used by pveproxy:

  1. By default the node-specific certificate in /etc/pve/nodes/NODENAME/pve-ssl.pem is used. This certificate is signed by the cluster CA and therefore not trusted by browsers and operating systems by default.

  2. use an externally provided certificate (e.g. signed by a commercial CA).

  3. use ACME (e.g., Let’s Encrypt) to get a trusted certificate with automatic renewal.

For options 2 and 3 the file /etc/pve/local/pveproxy-ssl.pem (and /etc/pve/local/pveproxy-ssl.key, which needs to be without password) is used.

Certificates are managed with the Proxmox VE Node management command (see the pvenode(1) manpage).

Warning Do not replace or manually modify the automatically generated node certificate files in /etc/pve/local/pve-ssl.pem and /etc/pve/local/pve-ssl.key or the cluster CA files in /etc/pve/pve-root-ca.pem and /etc/pve/priv/pve-root-ca.key.

Getting trusted certificates via ACME

Proxmox VE includes an implementation of the Automatic Certificate Management Environment ACME protocol, allowing Proxmox VE admins to interface with Let’s Encrypt for easy setup of trusted TLS certificates which are accepted out of the box on most modern operating systems and browsers.

Currently the two ACME endpoints implemented are Let’s Encrypt (LE) and its staging environment (see https://letsencrypt.org), both using the standalone HTTP challenge.

Because of rate-limits you should use LE staging for experiments.

There are a few prerequisites to use Let’s Encrypt:

  1. Port 80 of the node needs to be reachable from the internet.

  2. There must be no other listener on port 80.

  3. The requested (sub)domain needs to resolve to a public IP of the Node.

  4. You have to accept the ToS of Let’s Encrypt.

At the moment the GUI uses only the default ACME account.

Example: Sample pvenode invocation for using Let’s Encrypt certificates
root@proxmox:~# pvenode acme account register default mail@example.invalid
Directory endpoints:
0) Let's Encrypt V2 (https://acme-v02.api.letsencrypt.org/directory)
1) Let's Encrypt V2 Staging (https://acme-staging-v02.api.letsencrypt.org/directory)
2) Custom
Enter selection:
1

Attempting to fetch Terms of Service from 'https://acme-staging-v02.api.letsencrypt.org/directory'..
Terms of Service: https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf
Do you agree to the above terms? [y|N]y

Attempting to register account with 'https://acme-staging-v02.api.letsencrypt.org/directory'..
Generating ACME account key..
Registering ACME account..
Registration successful, account URL: 'https://acme-staging-v02.api.letsencrypt.org/acme/acct/xxxxxxx'
Task OK
root@proxmox:~# pvenode acme account list
default
root@proxmox:~# pvenode config set --acme domains=example.invalid
root@proxmox:~# pvenode acme cert order
Loading ACME account details
Placing ACME order
Order URL: https://acme-staging-v02.api.letsencrypt.org/acme/order/xxxxxxxxxxxxxx

Getting authorization details from
'https://acme-staging-v02.api.letsencrypt.org/acme/authz/xxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxx-xxxxxxx'
... pending!
Setting up webserver
Triggering validation
Sleeping for 5 seconds
Status is 'valid'!

All domains validated!

Creating CSR
Finalizing order
Checking order status
valid!

Downloading certificate
Setting pveproxy certificate and key
Restarting pveproxy
Task OK

Switching from the staging to the regular ACME directory

Changing the ACME directory for an account is unsupported. If you want to switch an account from the staging ACME directory to the regular, trusted, one you need to deactivate it and recreate it.

This procedure is also needed to change the default ACME account used in the GUI.

Example: Changing the default ACME account from the staging to the regular directory
root@proxmox:~# pvenode acme account info default
Directory URL: https://acme-staging-v02.api.letsencrypt.org/directory
Account URL: https://acme-staging-v02.api.letsencrypt.org/acme/acct/6332194
Terms Of Service: https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf

Account information:
ID: xxxxxxx
Contact:
        - mailto:example@proxmox.com
Creation date: 2018-07-31T08:41:44.54196435Z
Initial IP: 192.0.2.1
Status: valid

root@proxmox:~# pvenode acme account deactivate default
Renaming account file from '/etc/pve/priv/acme/default' to '/etc/pve/priv/acme/_deactivated_default_4'
Task OK

root@proxmox:~# pvenode acme account register default example@proxmox.com
Directory endpoints:
0) Let's Encrypt V2 (https://acme-v02.api.letsencrypt.org/directory)
1) Let's Encrypt V2 Staging (https://acme-staging-v02.api.letsencrypt.org/directory)
2) Custom
Enter selection:
0

Attempting to fetch Terms of Service from 'https://acme-v02.api.letsencrypt.org/directory'..
Terms of Service: https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf
Do you agree to the above terms? [y|N]y

Attempting to register account with 'https://acme-v02.api.letsencrypt.org/directory'..
Generating ACME account key..
Registering ACME account..
Registration successful, account URL: 'https://acme-v02.api.letsencrypt.org/acme/acct/39335247'
Task OK

Automatic renewal of ACME certificates

If a node has been successfully configured with an ACME-provided certificate (either via pvenode or via the GUI), the certificate will be automatically renewed by the pve-daily-update.service. Currently, renewal will be attempted if the certificate has expired or will expire in the next 30 days.

Host Bootloader

Proxmox VE currently uses one of two bootloaders depending on the disk setup selected in the installer.

For EFI Systems installed with ZFS as the root filesystem systemd-boot is used. All other deployments use the standard grub bootloader (this usually also applies to systems which are installed on top of Debian).

Partitioning scheme used by the installer

The Proxmox VE installer creates 3 partitions on the bootable disks selected for installation. The bootable disks are:

  • For Installations with ext4 or xfs the selected disk

  • For ZFS installations all disks belonging to the first vdev:

    • The first disk for RAID0

    • All disks for RAID1, RAIDZ1, RAIDZ2, RAIDZ3

    • The first two disks for RAID10

The created partitions are:

  • a 1 MB BIOS Boot Partition (gdisk type EF02)

  • a 512 MB EFI System Partition (ESP, gdisk type EF00)

  • a third partition spanning the set hdsize parameter or the remaining space used for the chosen storage type

grub in BIOS mode (--target i386-pc) is installed onto the BIOS Boot Partition of all bootable disks for supporting older systems.

Grub

grub has been the de-facto standard for booting Linux systems for many years and is quite well documented
[Grub Manual https://www.gnu.org/software/grub/manual/grub/grub.html]
.

The kernel and initrd images are taken from /boot and its configuration file /boot/grub/grub.cfg gets updated by the kernel installation process.

Configuration

Changes to the grub configuration are done via the defaults file /etc/default/grub or config snippets in /etc/default/grub.d. To regenerate the /boot/grub/grub.cfg after a change to the configuration run:

`update-grub`.

Systemd-boot

systemd-boot is a lightweight EFI bootloader. It reads the kernel and initrd images directly from the EFI Service Partition (ESP) where it is installed. The main advantage of directly loading the kernel from the ESP is that it does not need to reimplement the drivers for accessing the storage. In the context of ZFS as root filesystem this means that you can use all optional features on your root pool instead of the subset which is also present in the ZFS implementation in grub or having to create a separate small boot-pool
[Booting ZFS on root with grub https://github.com/zfsonlinux/zfs/wiki/Debian-Stretch-Root-on-ZFS]
.

In setups with redundancy (RAID1, RAID10, RAIDZ*) all bootable disks (those being part of the first vdev) are partitioned with an ESP. This ensures the system boots even if the first boot device fails. The ESPs are kept in sync by a kernel postinstall hook script /etc/kernel/postinst.d/zz-pve-efiboot. The script copies certain kernel versions and the initrd images to EFI/proxmox/ on the root of each ESP and creates the appropriate config files in loader/entries/proxmox-*.conf. The pve-efiboot-tool script assists in managing both the synced ESPs themselves and their contents.

The following kernel versions are configured by default:

  • the currently running kernel

  • the version being newly installed on package updates

  • the two latest already installed kernels

  • the latest version of the second-to-last kernel series (e.g. 4.15, 5.0), if applicable

  • any manually selected kernels (see below)

The ESPs are not kept mounted during regular operation, in contrast to grub, which keeps an ESP mounted on /boot/efi. This helps to prevent filesystem corruption to the vfat formatted ESPs in case of a system crash, and removes the need to manually adapt /etc/fstab in case the primary boot device fails.

Configuration

systemd-boot is configured via the file loader/loader.conf in the root directory of an EFI System Partition (ESP). See the loader.conf(5) manpage for details.

Each bootloader entry is placed in a file of its own in the directory loader/entries/

An example entry.conf looks like this (/ refers to the root of the ESP):

title    Proxmox
version  5.0.15-1-pve
options   root=ZFS=rpool/ROOT/pve-1 boot=zfs
linux    /EFI/proxmox/5.0.15-1-pve/vmlinuz-5.0.15-1-pve
initrd   /EFI/proxmox/5.0.15-1-pve/initrd.img-5.0.15-1-pve
Manually keeping a kernel bootable

Should you wish to add a certain kernel and initrd image to the list of bootable kernel use pve-efiboot-tool kernel add.

For example run the following to add the kernel with ABI version 5.0.15-1-pve to the list of kernels to keep installed and synced to all ESPs:

pve-efiboot-tool kernel add 5.0.15-1-pve

pve-efiboot-tool kernel list will list all kernel versions currently selected for booting:

# pve-efiboot-tool kernel list
Manually selected kernels:
5.0.15-1-pve

Automatically selected kernels:
5.0.12-1-pve
4.15.18-18-pve

Run pve-efiboot-tool remove to remove a kernel from the list of manually selected kernels, for example:

pve-efiboot-tool kernel remove 5.0.15-1-pve
Note It’s required to run pve-efiboot-tool refresh to update all EFI System Partitions (ESPs) after a manual kernel addition or removal from above.
Setting up a new partition for use as synced ESP

To format and initialize a partition as synced ESP, e.g., after replacing a failed vdev in an rpool, or when converting an existing system that pre-dates the sync mechanism, pve-efiboot-tool from pve-kernel-helpers can be used.

Warning the format command will format the <partition>, make sure to pass in the right device/partition!

For example, to format an empty partition /dev/sda2 as ESP, run the following:

pve-efiboot-tool format /dev/sda2

To setup an existing, unmounted ESP located on /dev/sda2 for inclusion in Proxmox VE’s kernel update synchronization mechanism, use the following:

pve-efiboot-tool init /dev/sda2

Afterwards /etc/kernel/pve-efiboot-uuids should contain a new line with the UUID of the newly added partition. The init command will also automatically trigger a refresh of all configured ESPs.

Updating the configuration on all ESPs

To copy and configure all bootable kernels and keep all ESPs listed in /etc/kernel/pve-efiboot-uuids in sync you just need to run:

 pve-efiboot-tool refresh

(The equivalent to running update-grub on Systems being booted with grub).

This is necessary should you make changes to the kernel commandline, or want to sync all kernels and initrds after regenerating the latter.

Editing the kernel commandline

You can modify the kernel commandline in the following places, depending on the bootloarder used:

Grub

The kernel commandline needs to be placed in the variable GRUB_CMDLINE_LINUX_DEFAULT in the file /etc/default/grub. Running update-grub appends its content to all linux entries in /boot/grub/grub.cfg.

Systemd-boot

The kernel commandline needs to be placed as line in /etc/kernel/cmdline Running /etc/kernel/postinst.d/zz-pve-efiboot sets it as option line for all config files in loader/entries/proxmox-*.conf.