[pve-devel] [PATCH common] allow longer timeout for cancelling 'vzdump' jobs

Stefan Reiter s.reiter at proxmox.com
Thu Jan 14 16:39:21 CET 2021


This attempts to solve the issue where on slow network storages,
aborting a backup job (which may wait for buffers to flush) could take
longer than 5 seconds, and would thus result in the task being killed by
SIGKILL, not removing the backup lock in the process.

Make the implementation future-proof by using a map from task type to a
timeout value. Default stays at 5, so tasks other than 'vzdump' are not
affected.

Signed-off-by: Stefan Reiter <s.reiter at proxmox.com>
---
 src/PVE/RESTEnvironment.pm | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
index d5b84d0..8a0cb9a 100644
--- a/src/PVE/RESTEnvironment.pm
+++ b/src/PVE/RESTEnvironment.pm
@@ -365,8 +365,16 @@ sub active_workers  {
     return $res;
 }
 
+my $timeout_map = {
+    # backup cancellation on slow target storages might take a while, avoid
+    # leaving the VM in locked state
+    "vzdump" => 60,
+};
+
 my $kill_process_group = sub {
-    my ($pid, $pstart) = @_;
+    my ($pid, $pstart, $timeout) = @_;
+
+    $timeout //= 5;
 
     # send kill to process group (negative pid)
     my $kpid = -$pid;
@@ -374,8 +382,7 @@ my $kill_process_group = sub {
     # always send signal to all pgrp members
     kill(15, $kpid); # send TERM signal
 
-    # give max 5 seconds to shut down
-    for (my $i = 0; $i < 5; $i++) {
+    for (my $i = 0; $i < $timeout; $i++) {
 	return if !PVE::ProcFSTools::check_process_running($pid, $pstart);
 	sleep (1);
     }
@@ -394,7 +401,8 @@ sub check_worker {
     return 0 if !$running;
 
     if ($killit) {
-	&$kill_process_group($task->{pid});
+	my $type = $task->{type};
+	&$kill_process_group($task->{pid}, undef, $timeout_map->{$type});
 	return 0;
     }
 
-- 
2.20.1






More information about the pve-devel mailing list