[pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

Alexandre DERUMIER aderumier at odiso.com
Fri Dec 23 10:20:03 CET 2016


Hi wolfgang,

I have done more test with drive mirror and nbd target.

It seem that the hang occur only if the target ip is unreacheable (no network reponse) 

# drive_mirror -n drive-scsi0 nbd://66.66.66.66/target 
ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout

If the ip address exist and up,
# drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target 
Failed to connect socket: Connection refused



I'm not sure, maybe it can hang too if pve-firewall do a drop instead a reject on target port.



I think this come from in qemu net/socket.c,

where we have an infinite loop.

I'm not sure how to add a timeout here, help is welcome :)




static int net_socket_connect_init(NetClientState *peer,
                                   const char *model,
                                   const char *name,
                                   const char *host_str)
{
    NetSocketState *s;
    int fd, connected, ret;
    struct sockaddr_in saddr;

    if (parse_host_port(&saddr, host_str) < 0)
        return -1;

    fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
    if (fd < 0) {
        perror("socket");
        return -1;
    }
    qemu_set_nonblock(fd);

    connected = 0;
    for(;;) {
        ret = connect(fd, (struct sockaddr *)&saddr, sizeof(saddr));
        if (ret < 0) {
            if (errno == EINTR || errno == EWOULDBLOCK) {
                /* continue */
            } else if (errno == EINPROGRESS ||
                       errno == EALREADY ||
                       errno == EINVAL) {
                break;
            } else {
                perror("connect");
                closesocket(fd);
                return -1;
            }
        } else {
            connected = 1;
            break;
        }
    }
    s = net_socket_fd_init(peer, model, name, fd, connected);
    if (!s)
        return -1;
    snprintf(s->nc.info_str, sizeof(s->nc.info_str),
             "socket: connect to %s:%d",
             inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
    return 0;
}




----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mercredi 21 Décembre 2016 13:57:10
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs

>>Then it can still hang if the destination disappears between tcp_ping() 
>>and the `drive-mirror` command, so I'd rather get better behavior on qemu's 
>>side. It needs a time-out or a way to cancel it or something. 
Yes sure! 

I'm currently looking at qemu code to see how nbd client works. 

----- Mail original ----- 
De: "Wolfgang Bumiller" <w.bumiller at proxmox.com> 
À: "aderumier" <aderumier at odiso.com> 
Cc: "pve-devel" <pve-devel at pve.proxmox.com>, "dietmar" <dietmar at proxmox.com> 
Envoyé: Mercredi 21 Décembre 2016 12:20:28 
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs 

> On December 21, 2016 at 10:51 AM Alexandre DERUMIER <aderumier at odiso.com> wrote: 
> 
> 
> >>IIRC that was the only blocker. 
> >> 
> >>Basically the patchset has to work *without* tcp_ping() since it is an 
> >>unreliable check, and then we still have to catch failing connections 
> >>_correctly_. (There's no point in knowing that "some time in the past 
> >>you were able to connect to something which may or may not have been a 
> >>qemu nbd server", we need to know whether the drive-mirror job itself 
> >>was able to connect.) 
> 
> For me, the mirror job auto abort if connection is failing during the migration. Do you see another behaviour ? 

That covers one problem. IIRC the disk-deletion problem was that due 
to wrong [] usage around an ipv6 address it could not connect in the 
first place and didn't error as I would have hoped. 

> 
> the tcp_ping was just before launching the drive mirror command, because it was hanging in this case. 

Then it can still hang if the destination disappears between tcp_ping() 
and the `drive-mirror` command, so I'd rather get better behavior on qemu's 
side. It needs a time-out or a way to cancel it or something. 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 




More information about the pve-devel mailing list