[pve-devel] ZFS recv in Proxmox 5.x VM extremly hanging problems

Fabian Grünbichler f.gruenbichler at proxmox.com
Fri Nov 10 08:43:24 CET 2017


On Thu, Nov 09, 2017 at 06:58:35PM +0100, Detlef Bracker wrote:
> Dear,
> 
> min more as 1 year we make backups into VM-containers with ZFS without
> problems, but I dont know from weeks ago,
> this works not more. The incremental zfs-streams works every time fine, but
> not create new zfs subpools! This is equal now
> in last proxmox 4.x version and too in proxmox 5.x in VM (KVM) and with
> different disk-types expl qcow2, raw - I have test
> all out!
> 
> Phenomen the stream receive expl. 500 MB and then stops - no more transfer
> between sender and receiver and this
> on different machines I have tested! One short backup from an LXC-container
> use 556 MB effectiv and compressed as
> gzip 359 MBytes and uncompressed 2 GB.
> 
> About this, the VM hangs with 100% CPU (SYS) and in Proxmox VE you see expl
> 20 % (USER) (htop on host and in vm).
> In the VM the traffic to discs is 0% - equal nothing in iotop! And in atop
> the same 0% disc IO - only CPU 100% SYS!
> But the CPU has nothing to doe and has 8 GB RAM in the VM and in use from
> this only 2 GB!
> I have test on VM-Debian 8 and VM-Debian 9! Again before this was every time
> running perfect under VM-Debian 8!
> So, we have too a dedicated machine free with nothing on it only for test
> the scenario!
> 
> Ok, then I separate the stream to send in a gzip file and on the vm I have
> unpack and let the file get in the the recv and
> thats the same thing! Wait 2 or 10 hours or 24 hours - nothing will been
> changed! ZFS-check with zpool grub brings no
> errors! So the file system are correct. The Job hangs without booting equal
> days!
> 
> Ok, then i have let receive the same backup-package what I have got on the
> VM import on the Proxmox VE (on the host)
> via zfs recv [Subpool at today] and this finshed in short as 2 seconds (normal
> and fine)!
> 
> So something in Proxmox 4.x (latest versions) and Proxmox 5.x blocks inside
> VM the ZFS receiving - Traffic and produce
> only CPU load! Cache off or cache writethrou / qcow2 or raw / as zfs or as
> disc-file in normal arrea!
> 
> So I have test this problem on minimum 3 complete different machine!
> 
> The hosts runs equal under Debian 8 (Jessie) and under Debian 9 (Stretch) -
> and on Host ok, but in VM not! So that
> cant been a problem of Debian self. This must been a problem of Proxmox
> scheduler or something what block the VM!

I am not quite sure how you are sending and from which version to which
version (OS and ZFS), but I just noticed that there was an issue with
our kernel build script/patches which lead to the ZFS 0.7.2 modules
being built without a bug fix for 'zfs send', which would lead to a
hanging 'zfs recv' on systems with ZFS < 0.7.x . if those version
numbers match you situation, you should upgrade your PVE 5 systems to
the current kernel and ZFS packages (containing 0.7.3, with the fix
correctly applied), and reboot the upgraded systems and those running
older ZFS versions where you have hanging receive processes.

if you have hanging send/recv with a different combination of ZFS
versions, please provide a clear description (version numbers, send/recv
command, properties of involved datasets and pools, ...) and ideally a
stream dump (this only contains metadata) obtained by replacing the 'zfs
recv' part with 'zstreamdump -v > someoutputfile'.




More information about the pve-devel mailing list