[pve-devel] pvedaemon hanging because of qga retry

Alexandre DERUMIER aderumier at odiso.com
Mon May 28 16:31:04 CEST 2018


Hi,

I have notice that we already send a guest-ping in

PVE::QemuServer::qga_check_running($vmid);

sub qga_check_running {
    my ($vmid) = @_;

    eval { vm_mon_cmd($vmid, "guest-ping", timeout => 3); };
    if ($@) {
        warn "Qemu Guest Agent is not running - $@";
        return 0;
    }
    return 1;
}


(already use in vzdump and other parts).

ex:
        if ($self->{vmlist}->{$vmid}->{agent} && $vm_is_running) {
            $agent_running = PVE::QemuServer::qga_check_running($vmid);
        }

        if ($agent_running){
            eval { PVE::QemuServer::vm_mon_cmd($vmid, "guest-fsfreeze-freeze"); };
            if (my $err = $@) {
                $self->logerr($err);
            }
        }




My problem is that I'm using "qm agent " , and we don't have this ping


/PVE/API2/Qemu/Agent.pm

            die "No Qemu Guest Agent\n" if !defined($conf->{agent});
            die "VM $vmid is not running\n" if !PVE::QemuServer::check_running($vmid);

            my $cmd = $param->{command} // $command;
            my $res = PVE::QemuServer::vm_mon_cmd($vmid, "guest-$cmd");


I'll send a patch


----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "Thomas Lamprecht" <t.lamprecht at proxmox.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mardi 22 Mai 2018 09:59:37
Objet: Re: [pve-devel] pvedaemon hanging because of qga retry

>>But, AFAICT, this isn't your real concern 
yes, indeed. it's normal to have a high timeout for fsfreeze (libvirt also do it). 


>>you propose to make a "simple" 
>>qmp call, be it through the VSERPORT_CHANGE, or a backward compatible ping, 
>>where we know that the time needed to answer cannot be that high, as no IO 
>>is involved. 
exactly ! 


>> That could be done with a relative small timeout and if that 
>>fails we know that it doesn't makes sense to make the fsfreeze call with it 
>>- reasonable - high timeout. If I understood correctly? 

yes ! 


----- Mail original ----- 
De: "Thomas Lamprecht" <t.lamprecht at proxmox.com> 
À: "pve-devel" <pve-devel at pve.proxmox.com>, "aderumier" <aderumier at odiso.com>, "dietmar" <dietmar at proxmox.com> 
Envoyé: Mardi 22 Mai 2018 09:56:13 
Objet: Re: [pve-devel] pvedaemon hanging because of qga retry 

On 5/21/18 3:02 PM, Alexandre DERUMIER wrote: 
>>> Seems this patch does not solve the 'high load' problem at all? 
> 
> I can't reproduce this high load, so I can't say. 

For the high fsfreeze timeout my commit message should provide some 
context: 

> commit cfb7a70165199eca25f92272490c863551efcd89 
> Author: Thomas Lamprecht <t.lamprecht at proxmox.com> 
> Date: Wed Nov 23 11:40:41 2016 +0100 
> 
> increase timeout from guest-fsfreeze-freeze 
> 
> The qmp command 'guest-fsfreeze-freeze' issues in linux a FIFREEZE 
> ioctl call on all mounted guest FS. 
> This ioctl call locks the filesystem and gets it into an consistent 
> state. For this all caches must be synced after blocking new writes 
> to the FS, which may need a relative long time, especially under high 
> IO load on the backing storage. 
> 
> In windows a VSS (Volume Shadow Copy Service) request_freeze will 
> issued. As of the closed Windows nature the exact mechanisms cannot 
> be checked but some microsoft blog posts and other forum post suggest 
> that it should return fast but certain workloads can still trigger a 
> long delay resulting an similar problems. 
> 
> Thus try to minimize the error probability and increase the timeout 
> significantly. 
> We use 60 minutes as timeout as this seems a limit which should not 
> get trespassed in a somewhat healthy system. 
> 
> See: 
> https://forum.proxmox.com/threads/22192/ 
> 
> see the 'freeze_super' and 'thaw_super' function in fs/super.c from 
> the linux kernel tree for more details on the freeze behavior in 
> Linux guests. 


> My main concern is to not wait for a down daemon. (which will never response). 
> 
> If we can be sure that daemon is running, with high load, simply wait for a response with a longer timeout. 
> 
> 

But, AFAICT, this isn't your real concern, you propose to make a "simple" 
qmp call, be it through the VSERPORT_CHANGE, or a backward compatible ping, 
where we know that the time needed to answer cannot be that high, as no IO 
is involved. That could be done with a relative small timeout and if that 
fails we know that it doesn't makes sense to make the fsfreeze call with it 
- reasonable - high timeout. If I understood correctly? 

> 
> 
> ----- Mail original ----- 
> De: "dietmar" <dietmar at proxmox.com> 
> À: "aderumier" <aderumier at odiso.com> 
> Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
> Envoyé: Lundi 21 Mai 2018 09:56:03 
> Objet: Re: [pve-devel] pvedaemon hanging because of qga retry 
> 
>> I have looked at libvirt/ovirt. 
>> 
>> It seem that's it's possible to detect if agent is connected, through a qmp 
>> event VSERPORT_CHANGE. 
>> 
>> https://git.qemu.org/?p=qemu.git;a=commit;h=e2ae6159 
>> https://git.qemu.org/?p=qemu.git;a=blobdiff;f=docs/qmp/qmp-events.txt;h=d759d197486a3edf3b629fb11e9922ad92fb041a;hp=9d7439e3073ac63b639ce282c7466933ccb411b4;hb=032baddea36330384b3654fcbfafa74cc815471c;hpb=db52658b38fea4e54c23c9cfbced9478d368aa84 
> 
> Seems this patch does not solve the 'high load' problem at all? 
> 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 




More information about the pve-devel mailing list