[pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)

Fri Jul 28 11:21:29 CEST 2017

>>I wonder wether reusing (/extending) the existing SSH tunnel for the 
>>commands we run on the target node might reduce the overhead as well? 
>>for cleanup in error cases opening a new connection is probably still 
>>advisable. 

yes maybe. Don't known if the time is to fork the qm process, or established the ssh tunnel or get response. I'll try to add timer on this.

another idea, why not use https api call through pveproxy directly ? 

I have verified with qmp status,

without pvesr call , around 20ms

2017-07-28 10:24:45,184 -- VM status: paused (inmigrate)
2017-07-28 10:24:45,208 -- VM status: running

with pvesr call , around 4s

2017-07-28 10:38:28,711 -- VM status: paused (inmigrate)
2017-07-28 10:38:28,745 -- VM status: paused
2017-07-28 10:38:28,799 -- VM status: paused
2017-07-28 10:38:28,818 -- VM status: paused
2017-07-28 10:38:28,837 -- VM status: paused
....
2017-07-28 10:38:33,912 -- VM status: running

----- Mail original -----
De: "Fabian Grünbichler" <f.gruenbichler at proxmox.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Vendredi 28 Juillet 2017 10:46:55
Objet: Re: [pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)

On Fri, Jul 28, 2017 at 10:09:55AM +0200, Alexandre DERUMIER wrote: 
> 
> I have added some timer and done a migration without storage replication 
> 
> ->main migration loop : 150ms increase. (it's lower if I put a usleep of 1ms) 
> 
> 2017-07-28 10:00:10 transfer_replication_state: 1.436832 
> 2017-07-28 10:00:10 move config: 0.001174 
> 2017-07-28 10:00:10 switch_replication_job_target: 0.003125 
> 2017-07-28 10:00:12 qm resume: 1.634583 -> (this is the time from source, to get the response, not sure how many time it take exactly on remote) 

I guess only marginally less on the target until the VM is actually 
resumed. 

> 
> seem to be transfer_replication_state which call 
> my $cmd = [ @{$self->{rem_ssh}}, 'pvesr', 'set-state', $self->{vmid}, $state]; 
> 
> 
> I think calling remote qm command take some time to get response. 
> Note that I don't use pvesr, so I think we should bypass theses commands if not needed. 
> 

yes, checking whether a state / job exists earlier on, and only 
transferring state and switching the direction conditionally if needed 
would be an improvement for sure. 

I wonder wether reusing (/extending) the existing SSH tunnel for the 
commands we run on the target node might reduce the overhead as well? 
for cleanup in error cases opening a new connection is probably still 
advisable. 

those two improvements might get us into the <1s range again, without 
sacrificing consistency on the way. 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel