[pve-devel] [PATCH v3 1/3] migrate: collect migration tunnel child process

Thomas Lamprecht t.lamprecht at proxmox.com
Thu Jun 2 12:03:04 CEST 2016



On 02.06.2016 10:34, Dietmar Maurer wrote:
> I do not really understand this loop.
>
> * Why do you call kill -9 multiple times?
>

"Just to be sure", normally the -9 would instantly kill it and the next loop iteration would then pick it up, so the probability that a another sigkill gets send is quite low.
(but yeah, the code so is bad style/confusing I guess)

> * Why do you iterate 20 times (instead of 30)?
>

The migrations is here at an end, succeeded or not,
but if the tunnel is still here at this point we want to quit it,
waiting 30 seconds seems long for that, as the tunnel has no use now, as:

* all data was carried over to the destination
* the migration failed the VM stays on the source and no more data gets send over the tunnel.

I'd maybe actually go for 5 then a sigterm and after then seconds a sigkill if its still there (which is really low probability and it has no effect on our migration anyway).

But as it also does not really hinders us we can use the old timeouts and send a sigterm at 15 seconds and a sigkill after 30 if preferred.

I'll resend the whole thing (mainly this patch and patch 2) where I address the here mentioned issue and that also old versions of qemu-server may live migrate (should not be to much code overhead).

>> +    # collect child process
>> +    for (my $i = 1; $i < 20; $i++) {
>> +	my $waitpid = waitpid($cpid, WNOHANG);
>> +	last if (defined($waitpid) && ($waitpid == $cpid));
>> +
>> +	if ($i == 10) {
>> +	    $self->log('info', "ssh tunnel still running - terminating now with
>> SIGTERM");
>> +	    kill(15, $cpid);
>> +	} elsif ($i >= 15) {
>> +	    $self->log('info', "ssh tunnel still running - terminating now with
>> SIGKILL");
>> +	    kill(9, $cpid);
>>  	}
>> +	sleep (1);
>>      }



More information about the pve-devel mailing list