[pve-devel] [Patch V2 guest-common] fix #1694: Replication risks permanently losing sync in high loads due to timeout bug

Thu Apr 12 11:06:32 CEST 2018

> diff --git a/PVE/Replication.pm b/PVE/Replication.pm
> index 9bc4e61..d8ccfaf 100644
> --- a/PVE/Replication.pm
> +++ b/PVE/Replication.pm
> @@ -136,8 +136,18 @@ sub prepare {
>  		$last_snapshots->{$volid}->{$snap} = 1;
>  	    } elsif ($snap =~ m/^\Q$prefix\E/) {
>  		$logfunc->("delete stale replication snapshot '$snap' on $volid");
> -		PVE::Storage::volume_snapshot_delete($storecfg, $volid, $snap);
> -		$cleaned_replicated_volumes->{$volid} = 1;
> +
> +		eval {
> +		    PVE::Storage::volume_snapshot_delete($storecfg, $volid, $snap);
> +		    $cleaned_replicated_volumes->{$volid} = 1;
> +		};
> +
> +		# If deleting the snapshot fails, we can not be sure if it was due to an
> error or a timeout.
> +		# The likelihood that the delete has worked out is high at a timeout.
> +		# If it really fails, it will try to remove on the next run.
> +		warn $@ if $@;
> +
> +		$logfunc->("delete stale replication snapshot error: $@") if $@;

why do we need this in prepare?