From IMMO.WETZEL at adtran.com Fri Dec 2 11:49:40 2016 From: IMMO.WETZEL at adtran.com (IMMO WETZEL) Date: Fri, 2 Dec 2016 10:49:40 +0000 Subject: [PVE-User] how to check own acces rights ? Message-ID: Hi, how can I check my own access rights on a specific node/qemu instance ? Is there any api function existing I coulnd found ? Background. I want to prevent errors if functions called I don't have the right to do. With kind regards Immo From dietmar at proxmox.com Fri Dec 2 11:55:23 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Fri, 2 Dec 2016 11:55:23 +0100 (CET) Subject: [PVE-User] how to check own acces rights ? In-Reply-To: References: Message-ID: <1183914664.77.1480676124395@webmail.proxmox.com> # pvesh get access/acl > On December 2, 2016 at 11:49 AM IMMO WETZEL wrote: > > > Hi, > > how can I check my own access rights on a specific node/qemu instance ? > Is there any api function existing I coulnd found ? > > Background. I want to prevent errors if functions called I don't have the > right to do. > > With kind regards > > Immo > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From dietmar at proxmox.com Fri Dec 2 11:57:27 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Fri, 2 Dec 2016 11:57:27 +0100 (CET) Subject: [PVE-User] how to check own acces rights ? In-Reply-To: References: Message-ID: <1406257972.79.1480676247514@webmail.proxmox.com> > Background. I want to prevent errors if functions called I don't have the > right to do. Oh, the call just return the access control list, but it is not trivial to do the actual check. We have no real API for that currently. From mark at tuxis.nl Fri Dec 2 12:00:28 2016 From: mark at tuxis.nl (Mark Schouten) Date: Fri, 2 Dec 2016 12:00:28 +0100 Subject: [PVE-User] HA Cluster migration issues In-Reply-To: <2600018850-5504@kerio.tuxis.nl> Message-ID: <3207678815-8076@kerio.tuxis.nl> Nobody? Met vriendelijke groeten, --? Kerio Operator in de Cloud? https://www.kerioindecloud.nl/ Mark Schouten | Tuxis Internet Engineering KvK:?61527076?| http://www.tuxis.nl/ T: 0318 200208 | info at tuxis.nl Van: Mark Schouten Aan: Verzonden: 25-11-2016 11:18 Onderwerp: [PVE-User] HA Cluster migration issues Hi, I have a HA cluster running, with Ceph and all, and I have rebooted one of the nodes this week. We now want te migrate the HA-VM's back to the original server, but that fails without a clear error. I can say: root at proxmox01:~# qm migrate 600 proxmox03 -online Executing HA migrate for VM 600 to node proxmox03 I then see kvm starting on node proxmox03, but then something goes wrong after that and migration fails: task started by HA resource agent Nov 25 10:58:05 starting migration of VM 600 to node 'proxmox03' (10.1.1.3) Nov 25 10:58:05 copying disk images Nov 25 10:58:05 starting VM 600 on remote node 'proxmox03' Nov 25 10:58:06 starting ssh migration tunnel Nov 25 10:58:07 starting online/live migration on localhost:60000 Nov 25 10:58:07 migrate_set_speed: 8589934592 Nov 25 10:58:07 migrate_set_downtime: 0.1 Nov 25 10:58:09 ERROR: online migrate failure - aborting Nov 25 10:58:09 aborting phase 2 - cleanup resources Nov 25 10:58:09 migrate_cancel Nov 25 10:58:10 ERROR: migration finished with problems (duration 00:00:05) TASK ERROR: migration problems I can't see any errormessage that is more useful. Can anybody tell me how I can further debug this or maybe somebody knows what's going on? pveversion -v (This is identical on the two machines) proxmox-ve: 4.2-48 (running kernel: 4.4.6-1-pve) pve-manager: 4.2-2 (running version: 4.2-2/725d76f0) pve-kernel-4.4.6-1-pve: 4.4.6-48 lvm2: 2.02.116-pve2 corosync-pve: 2.3.5-2 libqb0: 1.0-1 pve-cluster: 4.0-39 qemu-server: 4.0-72 pve-firmware: 1.1-8 libpve-common-perl: 4.0-59 libpve-access-control: 4.0-16 libpve-storage-perl: 4.0-50 pve-libspice-server1: 0.12.5-2 vncterm: 1.2-1 pve-qemu-kvm: 2.5-14 pve-container: 1.0-62 pve-firewall: 2.0-25 pve-ha-manager: 1.0-28 ksm-control-daemon: 1.2-1 glusterfs-client: 3.5.2-2+deb8u2 lxc-pve: 1.1.5-7 lxcfs: 2.0.0-pve2 cgmanager: 0.39-pve1 criu: 1.6.0-1 Met vriendelijke groeten, --? Kerio Operator in de Cloud? https://www.kerioindecloud.nl/ Mark Schouten ?| Tuxis Internet Engineering KvK:?61527076?| http://www.tuxis.nl/ T: 0318 200208 | info at tuxis.nl _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From lemonnierk at ulrar.net Fri Dec 2 12:01:50 2016 From: lemonnierk at ulrar.net (Kevin Lemonnier) Date: Fri, 2 Dec 2016 12:01:50 +0100 Subject: [PVE-User] Hide the 200 OK from pvesh Message-ID: <20161202110150.GA19035@luwin.ulrar.net> Hi, That's also on the forum, if I get an answer here I'll update there. Is there a simple way to prevent pvesh from outputting 200 OK to the tty ? Redirecting 1 and 2 doesn't seem to do anything so I assume it writes directly to the tty, which is very annoying in a script. Currently the script just shows a bunch of 200 OK in between my actual output lines, it's very ugly. Thanks -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Digital signature URL: From mavleeuwen at icloud.com Fri Dec 2 12:02:45 2016 From: mavleeuwen at icloud.com (Marcel van Leeuwen) Date: Fri, 02 Dec 2016 12:02:45 +0100 Subject: [PVE-User] Share local storage with 2 or more LXC containers Message-ID: <31A56859-CC60-44EB-BF60-8967D2AD8321@icloud.com> Hi, I have a problem at the moment and i?ve not yet figured out how to solve this. Can I share local storage and make it accessible to 2 or more LXC containers? Of course this can be done with remote network storage but that will be slow till I have a 10gbit switch? Cheers, Marcel From mark at tuxis.nl Fri Dec 2 12:07:25 2016 From: mark at tuxis.nl (Mark Schouten) Date: Fri, 2 Dec 2016 12:07:25 +0100 Subject: [PVE-User] Hide the 200 OK from pvesh In-Reply-To: <20161202110150.GA19035@luwin.ulrar.net> Message-ID: <3208095861-8074@kerio.tuxis.nl> As far as I can see, pvesh outputs that to STDERR: root at proxmox2-4:~# pvesh get /version ?> stdout 200 OK root at proxmox2-4:~# cat stdout? { ? ?"keyboard" : "en-us", ? ?"release" : "12", ? ?"repoid" : "6894c9d9", ? ?"version" : "4.3" } Met vriendelijke groeten, --? Kerio Operator in de Cloud? https://www.kerioindecloud.nl/ Mark Schouten | Tuxis Internet Engineering KvK:?61527076?| http://www.tuxis.nl/ T: 0318 200208 | info at tuxis.nl Van: Kevin Lemonnier Aan: Verzonden: 2-12-2016 12:01 Onderwerp: [PVE-User] Hide the 200 OK from pvesh Hi, That's also on the forum, if I get an answer here I'll update there. Is there a simple way to prevent pvesh from outputting 200 OK to the tty ? Redirecting 1 and 2 doesn't seem to do anything so I assume it writes directly to the tty, which is very annoying in a script. Currently the script just shows a bunch of 200 OK in between my actual output lines, it's very ugly. Thanks -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From dietmar at proxmox.com Fri Dec 2 12:16:29 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Fri, 2 Dec 2016 12:16:29 +0100 (CET) Subject: [PVE-User] Hide the 200 OK from pvesh In-Reply-To: <20161202110150.GA19035@luwin.ulrar.net> References: <20161202110150.GA19035@luwin.ulrar.net> Message-ID: <646505314.81.1480677389931@webmail.proxmox.com> pvesh prints that to stderr, so you just need to redirect stderr. > That's also on the forum, if I get an answer here I'll update there. > > Is there a simple way to prevent pvesh from outputting 200 OK to the tty ? > Redirecting 1 and 2 doesn't seem to do anything so I assume it writes directly > to the tty, which is very annoying in a script. > Currently the script just shows a bunch of 200 OK in between my actual output > lines, it's very ugly. > > Thanks > -- > Kevin Lemonnier > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From lemonnierk at ulrar.net Fri Dec 2 12:28:06 2016 From: lemonnierk at ulrar.net (Kevin Lemonnier) Date: Fri, 2 Dec 2016 12:28:06 +0100 Subject: [PVE-User] Hide the 200 OK from pvesh In-Reply-To: <3208095861-8074@kerio.tuxis.nl> References: <20161202110150.GA19035@luwin.ulrar.net> <3208095861-8074@kerio.tuxis.nl> Message-ID: <20161202112806.GB19035@luwin.ulrar.net> On Fri, Dec 02, 2016 at 12:07:25PM +0100, Mark Schouten wrote: > As far as I can see, pvesh outputs that to STDERR: > Looks true when I run it by hand, but when in a script I do this : /usr/bin/pvesh create "/nodes/${NODE}/qemu/${TPLID}/clone" -full -newid "${VMID}" -name "${_NAME}" >/dev/null 2>/dev/null I still get 200 OK on my terminal. -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Digital signature URL: From sysadmin-pve at cognitec.com Fri Dec 2 15:42:02 2016 From: sysadmin-pve at cognitec.com (Alwin Antreich) Date: Fri, 2 Dec 2016 15:42:02 +0100 Subject: [PVE-User] Share local storage with 2 or more LXC containers In-Reply-To: <31A56859-CC60-44EB-BF60-8967D2AD8321@icloud.com> References: <31A56859-CC60-44EB-BF60-8967D2AD8321@icloud.com> Message-ID: <76aa915d-a811-6126-1776-ff76062ade87@cognitec.com> Hi Marcel, On 12/02/2016 12:02 PM, Marcel van Leeuwen wrote: > Hi, > > I have a problem at the moment and i?ve not yet figured out how to solve this. > > Can I share local storage and make it accessible to 2 or more LXC containers? Of course this can be done with remote network storage but that will be slow till I have a 10gbit switch? > You could try bind mounts for LXC. https://pve.proxmox.com/wiki/Linux_Container > Cheers, > > Marcel > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > -- Cheers, Alwin From pconant at ok7-eleven.com Fri Dec 2 15:43:02 2016 From: pconant at ok7-eleven.com (Patrick Conant) Date: Fri, 2 Dec 2016 08:43:02 -0600 Subject: [PVE-User] Share local storage with 2 or more LXC containers In-Reply-To: <31A56859-CC60-44EB-BF60-8967D2AD8321@icloud.com> References: <31A56859-CC60-44EB-BF60-8967D2AD8321@icloud.com> Message-ID: There are layers of duplication that and synchronization that can be added, but as is obvious with some examination, they all make it as slow, or slower than remote storage. I think you'll be very pleasantly surprised how well an NFS data store performs over 1Gb links. On Fri, Dec 2, 2016 at 5:02 AM, Marcel van Leeuwen wrote: > Hi, > > I have a problem at the moment and i?ve not yet figured out how to solve > this. > > Can I share local storage and make it accessible to 2 or more LXC > containers? Of course this can be done with remote network storage but that > will be slow till I have a 10gbit switch? > > Cheers, > > Marcel > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > -- Patrick Conant Sr Linux Engineer 7-Eleven Stores 2021 S. MacArthur Oklahoma City, Oklahoma 73128 Cell 316-409-2424 <405-682-5711> pconant at OK7-Eleven.com [image: 7-Eleven Stores] From regis.houssin at inodbox.com Sat Dec 3 14:30:53 2016 From: regis.houssin at inodbox.com (=?UTF-8?Q?R=c3=a9gis_Houssin?=) Date: Sat, 3 Dec 2016 14:30:53 +0100 Subject: [PVE-User] New VM created after 4.3 upgrade not start ! In-Reply-To: References: Message-ID: <685d49a9-af72-0fcd-2723-7822778ca846@inodbox.com> Hello Ok I found it! The update crashed the configuration of drbdmanage. replace: storage-plugin = drbdmanage.storage.lvm_thinlv.LvmThinLv by: storage-plugin = drbdmanage.storage.lvm.Lvm * Modify config with drbdmanage modify-config * Choose a storage plugin, and insert it in the config like storage-plugin = drbdmanage.storage.lvm.Lvm Le 30/11/2016 ? 10:42, R?gis Houssin a ?crit : > Hi, > > after upgrade proxmox with the latest 4.3, I have an error message when > starting a new VM : > (the VMs created before the update works fine) > >> kvm: -drive file=/dev/drbd/by-res/vm-502-disk-1/0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on: Could not open '/dev/drbd/by-res/vm-502-disk-1/0': No such file or directory >> TASK ERROR: start failed: command '/usr/bin/kvm -id 502 -chardev 'socket,id=qmp,path=/var/run/qemu-server/502.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/502.pid -daemonize -smbios 'type=1,uuid=7eab7942-fcaf-48b6-94ac-bad24087e609' -name srv1.happylibre.fr -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga cirrus -vnc unix:/var/run/qemu-server/502.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 8192 -k fr -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:3681fcbb6821' -drive 'file=/var/lib/vz/template/iso/debian-8.4.0-amd64-netinst.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/dev/drbd/by-res/vm-502-disk-1/0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap502i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=5A:73:0D:7E:E9:C5,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: exit code 1 > > the volume and resource "vm-502-disk-1" is ok, but it does not appear > with "drbdsetup show", and it does not appear with "drbd-overview" !! > > what is the problem please ? > > thanks > > Cordialement, Cordialement, -- R?gis Houssin --------------------------------------------------------- iNodbox (Cap-Networks) 5, rue Corneille 01000 BOURG EN BRESSE FRANCE VoIP: +33 1 83 62 40 03 GSM: +33 6 33 02 07 97 Email: regis.houssin at inodbox.com Web: https://www.inodbox.com/ Development: https://git.framasoft.org/u/inodbox/ Translation: https://www.transifex.com/inodbox/ --------------------------------------------------------- From hermann at qwer.tk Mon Dec 5 16:36:46 2016 From: hermann at qwer.tk (Hermann Himmelbauer) Date: Mon, 5 Dec 2016 16:36:46 +0100 Subject: [PVE-User] Ceph upgrade from 94.3 - recommendations? In-Reply-To: <549136967.3727214.1480147489744.JavaMail.zimbra@oxygem.tv> References: <08167395-a642-2200-5585-8b259fbdca84@qwer.tk> <549136967.3727214.1480147489744.JavaMail.zimbra@oxygem.tv> Message-ID: <49d3d96a-fdd2-70f9-83b7-7c60d8e34191@qwer.tk> Hi, O.k., great - any recommendation how to do that? Is there some Proxmox-related command to do so or should I just follow the ceph manual? Best Regards, Hermann Am 26.11.2016 um 09:04 schrieb Alexandre DERUMIER: > Sure, > you can always upgrade to last minor version. (0.94.X) > > Only jewel is not yet compatible because of a bug, but it'll be fixed in next jewel release (10.2.4) > > ----- Mail original ----- > De: "Hermann Himmelbauer" > ?: "proxmoxve" > Envoy?: Vendredi 25 Novembre 2016 13:43:07 > Objet: [PVE-User] Ceph upgrade from 94.3 - recommendations? > > Hi, > I recently upgraded the Proxomox community version to the latest version > and wonder if a ceph upgrade is recommended, too? > > Currently my ceph version is 0.94.3 - and I see that there are upgrades > to 0.94.9 on the ceph site, does anyone know how to do such an upgrade > on proxmox? Is it risky? > > Best Regards, > Hermann > -- hermann at qwer.tk PGP/GPG: 299893C7 (on keyservers) From IMMO.WETZEL at adtran.com Wed Dec 7 13:43:08 2016 From: IMMO.WETZEL at adtran.com (IMMO WETZEL) Date: Wed, 7 Dec 2016 12:43:08 +0000 Subject: [PVE-User] vm description for vm #a seems to be used for all new created vms Message-ID: Hi, it looks like that if the documentation for a vm is set via API call this description is afterwards used for all following created VMs if they are created via GUI. Can someone please verify this. If so I can write the Bugzilla request. Immo From f.gruenbichler at proxmox.com Wed Dec 7 13:56:05 2016 From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=) Date: Wed, 7 Dec 2016 13:56:05 +0100 Subject: [PVE-User] Ceph upgrade from 94.3 - recommendations? In-Reply-To: <1fe94657-bc59-92e7-dd16-af9cdbfa24cd@coppint.com> References: <08167395-a642-2200-5585-8b259fbdca84@qwer.tk> <549136967.3727214.1480147489744.JavaMail.zimbra@oxygem.tv> <1fe94657-bc59-92e7-dd16-af9cdbfa24cd@coppint.com> Message-ID: <20161207125605.2lzv6q7izgxwb6im@nora.maurer-it.com> On Wed, Dec 07, 2016 at 12:31:43PM +0100, Florent B wrote: > Jewel 10.2.4 is released today > (https://raw.githubusercontent.com/ceph/ceph/master/doc/release-notes.rst), > is the bug fixed ? (and which one is it ?) > http://tracker.ceph.com/issues/16255 , should be fixed. please note that testing and integrating the 10.2.4 release will still take a bit, but hopefully not too long. From d.csapak at proxmox.com Wed Dec 7 14:32:28 2016 From: d.csapak at proxmox.com (Dominik Csapak) Date: Wed, 7 Dec 2016 14:32:28 +0100 Subject: [PVE-User] vm description for vm #a seems to be used for all new created vms In-Reply-To: References: Message-ID: On 12/07/2016 01:43 PM, IMMO WETZEL wrote: > Hi, > > it looks like that if the documentation for a vm is set via API call this description is afterwards used for all following created VMs if they are created via GUI. > Can someone please verify this. If so I can write the Bugzilla request. > hi, no i cannot reproduce that, how do call the api exactly? From IMMO.WETZEL at adtran.com Wed Dec 7 15:03:18 2016 From: IMMO.WETZEL at adtran.com (IMMO WETZEL) Date: Wed, 7 Dec 2016 14:03:18 +0000 Subject: [PVE-User] vm description for vm #a seems to be used for all new created vms In-Reply-To: References: Message-ID: Its a python call... Anyhow I'll cleanup the server and start testing again Immo -----Original Message----- From: pve-user [mailto:pve-user-bounces at pve.proxmox.com] On Behalf Of Dominik Csapak Sent: Wednesday, December 07, 2016 2:32 PM To: pve-user at pve.proxmox.com Subject: Re: [PVE-User] vm description for vm #a seems to be used for all new created vms On 12/07/2016 01:43 PM, IMMO WETZEL wrote: > Hi, > > it looks like that if the documentation for a vm is set via API call this description is afterwards used for all following created VMs if they are created via GUI. > Can someone please verify this. If so I can write the Bugzilla request. > hi, no i cannot reproduce that, how do call the api exactly? _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From gilberto.nunes32 at gmail.com Wed Dec 7 17:03:01 2016 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Wed, 7 Dec 2016 14:03:01 -0200 Subject: [PVE-User] Critical medium error Message-ID: Hi list I get this issue from an external USB hard disk: raps: atop[87646] trap divide error ip:40780a sp:7fff3074a928 error:0 in atop[400000+26000] [8244487.641127] sd 988:0:0:0: [sdf] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [8244487.641130] sd 988:0:0:0: [sdf] tag#0 Sense Key : Medium Error [current] [8244487.641132] sd 988:0:0:0: [sdf] tag#0 Add. Sense: Unrecovered read error [8244487.641135] sd 988:0:0:0: [sdf] tag#0 CDB: Read(16) 88 00 00 00 00 00 58 06 50 a0 00 00 00 20 00 00 [8244487.641136] blk_update_request: critical medium error, dev sdf, sector 1476808864 [8244518.664247] usb 2-1.2: reset high-speed USB device number 102 using ehci-pci [8260518.890913] sd 988:0:0:0: [sdf] Very big device. Trying to use READ CAPACITY(16). As consequence I get a extremely slow vm access. So I detect that this External Driver has some problem. After disconnect this External Driver, everything is go smoothly... Is that right??? An external driver could be the cause of this slow access? Anybody has some experience with this??? Thanks a lot -- Gilberto Ferreira +55 (47) 9676-7530 Skype: gilberto.nunes36 From e.kasper at proxmox.com Wed Dec 7 18:38:38 2016 From: e.kasper at proxmox.com (Emmanuel Kasper) Date: Wed, 7 Dec 2016 18:38:38 +0100 Subject: [PVE-User] Critical medium error In-Reply-To: References: Message-ID: On 12/07/2016 05:03 PM, Gilberto Nunes wrote: > Hi list > > I get this issue from an external USB hard disk: > > raps: atop[87646] trap divide error ip:40780a sp:7fff3074a928 error:0 in > atop[400000+26000] > [8244487.641127] sd 988:0:0:0: [sdf] tag#0 FAILED Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > [8244487.641130] sd 988:0:0:0: [sdf] tag#0 Sense Key : Medium Error > [current] > [8244487.641132] sd 988:0:0:0: [sdf] tag#0 Add. Sense: Unrecovered read > error > [8244487.641135] sd 988:0:0:0: [sdf] tag#0 CDB: Read(16) 88 00 00 00 00 00 > 58 06 50 a0 00 00 00 20 00 00 > [8244487.641136] blk_update_request: critical medium error, dev sdf, sector > 1476808864 > [8244518.664247] usb 2-1.2: reset high-speed USB device number 102 using > ehci-pci > [8260518.890913] sd 988:0:0:0: [sdf] Very big device. Trying to use READ > CAPACITY(16). > > > As consequence I get a extremely slow vm access. > > So I detect that this External Driver has some problem. After disconnect > this External Driver, everything is go smoothly... Most probably the hard drive or the cable is dead. All processes trying to access the device will hang waiting for I/O, so yes it will slow down the whole system ... Emmanuel From lindsay.mathieson at gmail.com Wed Dec 7 23:17:02 2016 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Thu, 8 Dec 2016 08:17:02 +1000 Subject: [PVE-User] Critical medium error In-Reply-To: References: Message-ID: <6afb6d70-896d-e0a2-16df-f81d72069e23@gmail.com> On 8/12/2016 3:38 AM, Emmanuel Kasper wrote: > Most probably the hard drive or the cable is dead. > All processes trying to access the device will hang waiting for I/O, so > yes it will slow down the whole system ... > Emmanuel Yah. Seriously don't recommend using a USB drive for hosting ... -- Lindsay Mathieson From IMMO.WETZEL at adtran.com Thu Dec 8 16:52:16 2016 From: IMMO.WETZEL at adtran.com (IMMO WETZEL) Date: Thu, 8 Dec 2016 15:52:16 +0000 Subject: [PVE-User] vm description for vm #a seems to be used for all new created vms In-Reply-To: References: Message-ID: Thanks Dominik Funny thing, after reboot server and start from a fresh system everything looks fine.... Immo -----Original Message----- From: pve-user [mailto:pve-user-bounces at pve.proxmox.com] On Behalf Of Dominik Csapak Sent: Wednesday, December 07, 2016 2:32 PM To: pve-user at pve.proxmox.com Subject: Re: [PVE-User] vm description for vm #a seems to be used for all new created vms On 12/07/2016 01:43 PM, IMMO WETZEL wrote: > Hi, > > it looks like that if the documentation for a vm is set via API call this description is afterwards used for all following created VMs if they are created via GUI. > Can someone please verify this. If so I can write the Bugzilla request. > hi, no i cannot reproduce that, how do call the api exactly? _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From gilberto.nunes32 at gmail.com Mon Dec 12 12:42:35 2016 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Mon, 12 Dec 2016 09:42:35 -0200 Subject: [PVE-User] VM KVM slow down.... Message-ID: Hi folks I have a VM KVM running CentOS 7. This is the only one VM running in this server. The Physical Server is a Dell PowerEdge R410 with Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz x 12 cpus... The server has 64 GB of Ram, SAS disk for the PVE installation. The VM file are in a GlusterFS storage with the same machine configuration. Between PVE and Storage we have a gigaethernet bond make it with 3 gigaethernet interface connect through three network cable cat 6. inside the VM, I have Zimbra installed... Sometime, the access to this VM are extremaly slow.... I heard that I need low down the number of VCPU when work with Java intensivaly.... Somebody can has some feedback about this??? I am really needing help to solve this slow issue.... Thanks a lot and forgive me regard my bad English... Thanks -- Gilberto Ferreira +55 (47) 9676-7530 Skype: gilberto.nunes36 From martin at proxmox.com Tue Dec 13 14:02:40 2016 From: martin at proxmox.com (Martin Maurer) Date: Tue, 13 Dec 2016 14:02:40 +0100 Subject: [PVE-User] Proxmox VE 4.4 released! Message-ID: <9b7d20a7-b41b-1e3b-d7f1-6660c3735e37@proxmox.com> Hi all! We are really excited to announce the final release of our Proxmox VE 4.4! Most visible change is the new cluster and Ceph dashboard with improved Ceph Server toolkit. For LXC, we support unprivileged containers, CPU core limits and the new restart migration. We also updated the LXC templates for Debian, Ubuntu, CentOS, Fedora, Alpine and Arch. The whole HA stack and the GUI are improved on several places and using Proxmox VE High Availability is now more userfriendly. In clusters, you can define a dedicated migration network, quite useful if you heavily use (live) migrations. Watch our short introduction video - What's new in Proxmox VE 4.4 https://www.proxmox.com/en/training/video-tutorials/item/what-s-new-in-proxmox-ve-4-4 Release notes https://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_4.4 ISO Download https://www.proxmox.com/en/downloads Alternate ISO download: https://download.proxmox.com/iso/ Upgrading https://pve.proxmox.com/wiki/Downloads Bugtracker https://bugzilla.proxmox.com A big THANK-YOU to our active community for all feedback, testing, bug reporting and patch submissions. -- Best Regards, Martin Maurer martin at proxmox.com http://www.proxmox.com From davel at upilab.com Tue Dec 13 14:57:21 2016 From: davel at upilab.com (David Lawley) Date: Tue, 13 Dec 2016 08:57:21 -0500 Subject: [PVE-User] kernel: kvm: zapping shadow pages for mmio generation wraparound Message-ID: <26f7d078-1e5f-cd44-1d5f-7bfd1fca6919@upilab.com> kernel: kvm: zapping shadow pages for mmio generation wraparound I have noticed these errors in syslog. Most research has indicated its benign. But is there is something I am missing as far as each VM is concerned would be nice to know for sure. All VMs are ubuntu or debian Its been going on awhile so if its just noise so be it, and not related to any current update. really not sure when I first saw it. And appears to only present itself when a VM is started. updated this am to (its a simple single node BTW) proxmox-ve: 4.4-76 (running kernel: 4.4.35-1-pve) pve-manager: 4.4-1 (running version: 4.4-1/eb2d6f1e) pve-kernel-4.4.35-1-pve: 4.4.35-76 pve-kernel-4.4.21-1-pve: 4.4.21-71 pve-kernel-4.4.15-1-pve: 4.4.15-60 pve-kernel-4.4.16-1-pve: 4.4.16-64 pve-kernel-4.4.24-1-pve: 4.4.24-72 pve-kernel-4.4.19-1-pve: 4.4.19-66 lvm2: 2.02.116-pve3 corosync-pve: 2.4.0-1 libqb0: 1.0-1 pve-cluster: 4.0-48 qemu-server: 4.0-101 pve-firmware: 1.1-10 libpve-common-perl: 4.0-83 libpve-access-control: 4.0-19 libpve-storage-perl: 4.0-70 pve-libspice-server1: 0.12.8-1 vncterm: 1.2-1 pve-docs: 4.4-1 pve-qemu-kvm: 2.7.0-9 pve-container: 1.0-88 pve-firewall: 2.0-33 pve-ha-manager: 1.0-38 ksm-control-daemon: 1.2-1 glusterfs-client: 3.5.2-2+deb8u2 lxc-pve: 2.0.6-2 lxcfs: 2.0.5-pve1 criu: 1.6.0-1 novnc-pve: 0.5-8 smartmontools: 6.5+svn4324-1~pve80 fence-agents-pve: 4.0.20-1 From contact at makz.me Wed Dec 14 11:51:16 2016 From: contact at makz.me (contact at makz.me) Date: Wed, 14 Dec 2016 10:51:16 +0000 Subject: [PVE-User] Windows & Gluster 3.8 Message-ID: <7fdbbhhrucscnmhs1y690a02z-0@mailer.nylas.com> Hello, I've installed gluster 3.8.6 on my pve cluster because the gluster used by proxmox too old. It worked great with pve4.3 (except crash when snapshot is used) Today i've upgraded to 4.4, snapshot now works well but since i've updated, all my windows VM are broken, when windows try to write on disk the vm crash without error, even when i try to reinstall a windows, when i partition or format the vm instantly crash. Someone can help me ? Thank you ! From lemonnierk at ulrar.net Wed Dec 14 11:54:47 2016 From: lemonnierk at ulrar.net (Kevin Lemonnier) Date: Wed, 14 Dec 2016 11:54:47 +0100 Subject: [PVE-User] Windows & Gluster 3.8 In-Reply-To: <7fdbbhhrucscnmhs1y690a02z-0@mailer.nylas.com> References: <7fdbbhhrucscnmhs1y690a02z-0@mailer.nylas.com> Message-ID: <20161214105446.GD21299@luwin.ulrar.net> > > I've installed gluster 3.8.6 on my pve cluster because the gluster used by > proxmox too old. > I don't think proxmox comes with any gluster, it's the debian one and as often with debian packages, it is indeed very old. > > Today i've upgraded to 4.4, snapshot now works well but since i've updated, > all my windows VM are broken, when windows try to write on disk the vm crash > without error, even when i try to reinstall a windows, when i partition or > format the vm instantly crash. > > > > Someone can help me ? > Not sure here's the best place for this, you might want to try the gluster-users list instead. I'd advise attaching your logs (/var/log/gluster) since they'll ask for that anyway :) Ideally your client logs too, but being a Proxmox user I know how hard that is to get for proxmox VMs .. That would be a very nice addition to proxmox by the way, a checkbox or something to get the VM output in a file, for debugging libgfapi it would help a lot. Currently the only way I am aware of to get those logs is starting the VM by hand in a shell, bypassing the web interface. -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Digital signature URL: From daniel at linux-nerd.de Wed Dec 14 12:14:42 2016 From: daniel at linux-nerd.de (Daniel) Date: Wed, 14 Dec 2016 12:14:42 +0100 Subject: [PVE-User] Ceph Disk Usage/Storage Message-ID: <25453F57-3042-4E13-BEBA-00F7BCD6810A@linux-nerd.de> Hi there, i created a Ceph File-System with 3x 400GB In my config i said 3/2 that means that one of that disks are only for an faulty issue (like raid5) 3 HDDs Max and 2 HDDs Minimum In my System-Overview u see that i have 1.2TB Free-Space which cant be correct. This is what the CLI command shows me: POOLS: NAME ID USED %USED MAX AVAIL OBJECTS ceph 2 0 0 441G 0 But as i understand it correctly max Avail GB must be round about 800GB Cheers Daniel From lindsay.mathieson at gmail.com Wed Dec 14 12:55:41 2016 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Wed, 14 Dec 2016 21:55:41 +1000 Subject: [PVE-User] Windows & Gluster 3.8 In-Reply-To: <7fdbbhhrucscnmhs1y690a02z-0@mailer.nylas.com> References: <7fdbbhhrucscnmhs1y690a02z-0@mailer.nylas.com> Message-ID: On 14/12/2016 8:51 PM, contact at makz.me wrote: > Today i've upgraded to 4.4, snapshot now works well but since i've updated, > all my windows VM are broken, when windows try to write on disk the vm crash > without error, even when i try to reinstall a windows, when i partition or > format the vm instantly crash. > > Hmmm - I upgraded to PVE 4.4 today (rolling upgrade, 3 nodes) and am using Gluster 3.8.4, all my windows VM's are running fine + snapshots ok. Sounds like you a gluster volume in trouble. What virtual disk interface are you using with the windows VM'? ide? virtio? scsi+virtio controller? can you post: gluster volume info gluster volume status gluster volume heal heal info -- Lindsay Mathieson From w.link at proxmox.com Wed Dec 14 13:00:16 2016 From: w.link at proxmox.com (Wolfgang Link) Date: Wed, 14 Dec 2016 13:00:16 +0100 Subject: [PVE-User] Ceph Disk Usage/Storage In-Reply-To: <25453F57-3042-4E13-BEBA-00F7BCD6810A@linux-nerd.de> References: <25453F57-3042-4E13-BEBA-00F7BCD6810A@linux-nerd.de> Message-ID: <58513450.6010608@proxmox.com> You pool size is 3*400 GB so 1.2GB is correct, but your config say 3/2. This means you have 3 copies of every pg(Placement Group) and min 2 pg are needed to operate correct. This means if you write 1GB you lose 3GB of free storage. On 12/14/2016 12:14 PM, Daniel wrote: > Hi there, > > i created a Ceph File-System with 3x 400GB > In my config i said 3/2 that means that one of that disks are only for an faulty issue (like raid5) > 3 HDDs Max and 2 HDDs Minimum > > In my System-Overview u see that i have 1.2TB Free-Space which cant be correct. > > This is what the CLI command shows me: > > POOLS: > NAME ID USED %USED MAX AVAIL OBJECTS > ceph 2 0 0 441G 0 > > But as i understand it correctly max Avail GB must be round about 800GB > > Cheers > > Daniel > > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From contact at makz.me Wed Dec 14 13:05:33 2016 From: contact at makz.me (contact at makz.me) Date: Wed, 14 Dec 2016 12:05:33 +0000 Subject: [PVE-User] Windows & Gluster 3.8 In-Reply-To: References: <7fdbbhhrucscnmhs1y690a02z-0@mailer.nylas.com> Message-ID: I've tried with ide / sata & scsi, for an existing VM it crash at boot screen (windows 7 logo), for new install it crash when i click next at the disk selector root at hvs1:/var/log/glusterfs# **gluster volume info** Volume Name: vm Type: Replicate Volume ID: 76ff16d4-7bd3-4070-b39d-8a173c6292c3 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: hvs1-gluster:/gluster/vm/brick Brick2: hvs2-gluster:/gluster/vm/brick Brick3: hvsquorum-gluster:/gluster/vm/brick root at hvs1:/var/log/glusterfs# **gluster volume status** Status of volume: vm Gluster process TCP Port RDMA Port Online Pid \----------------------------------------------------------------------------- - Brick hvs1-gluster:/gluster/vm/brick 49153 0 Y 3565 Brick hvs2-gluster:/gluster/vm/brick 49153 0 Y 3465 Brick hvsquorum-gluster:/gluster/vm/brick 49153 0 Y 2696 NFS Server on localhost N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 3586 NFS Server on hvsquorum-gluster N/A N/A N N/A Self-heal Daemon on hvsquorum-gluster N/A N/A Y 4190 NFS Server on hvs2-gluster N/A N/A N N/A Self-heal Daemon on hvs2-gluster N/A N/A Y 6212 Task Status of Volume vm \----------------------------------------------------------------------------- - There are no active volume tasks root at hvs1:/var/log/glusterfs# **gluster volume heal vm info** Brick hvs1-gluster:/gluster/vm/brick Status: Connected Number of entries: 0 Brick hvs2-gluster:/gluster/vm/brick Status: Connected Number of entries: 0 Brick hvsquorum-gluster:/gluster/vm/brick Status: Connected Number of entries: 0 **Here's the kvm log** [2016-12-14 11:08:29.210641] I [MSGID: 104045] [glfs-master.c:91:notify] 0-gfapi: New graph 68767331-2e73-7276-2e62-62772e62652d (0) coming up [2016-12-14 11:08:29.210689] I [MSGID: 114020] [client.c:2356:notify] 0-vm- client-0: parent translators are ready, attempting connect on transport [2016-12-14 11:08:29.211020] I [MSGID: 114020] [client.c:2356:notify] 0-vm- client-1: parent translators are ready, attempting connect on transport [2016-12-14 11:08:29.211272] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-vm- client-0: changing port to 49153 (from 0) [2016-12-14 11:08:29.211285] I [MSGID: 114020] [client.c:2356:notify] 0-vm- client-2: parent translators are ready, attempting connect on transport [2016-12-14 11:08:29.211838] I [MSGID: 114057] [client- handshake.c:1446:select_server_supported_programs] 0-vm-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-12-14 11:08:29.211910] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-vm- client-1: changing port to 49153 (from 0) [2016-12-14 11:08:29.211980] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-vm- client-2: changing port to 49153 (from 0) [2016-12-14 11:08:29.212227] I [MSGID: 114046] [client- handshake.c:1222:client_setvolume_cbk] 0-vm-client-0: Connected to vm- client-0, attached to remote volume '/gluster/vm/brick'. [2016-12-14 11:08:29.212239] I [MSGID: 114047] [client- handshake.c:1233:client_setvolume_cbk] 0-vm-client-0: Server and Client lk- version numbers are not same, reopening the fds [2016-12-14 11:08:29.212296] I [MSGID: 108005] [afr-common.c:4298:afr_notify] 0-vm-replicate-0: Subvolume 'vm-client-0' came back up; going online. [2016-12-14 11:08:29.212316] I [MSGID: 114035] [client- handshake.c:201:client_set_lk_version_cbk] 0-vm-client-0: Server lk version = 1 [2016-12-14 11:08:29.212426] I [MSGID: 114057] [client- handshake.c:1446:select_server_supported_programs] 0-vm-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-12-14 11:08:29.212590] I [MSGID: 114057] [client- handshake.c:1446:select_server_supported_programs] 0-vm-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-12-14 11:08:29.212874] I [MSGID: 114046] [client- handshake.c:1222:client_setvolume_cbk] 0-vm-client-2: Connected to vm- client-2, attached to remote volume '/gluster/vm/brick'. [2016-12-14 11:08:29.212886] I [MSGID: 114047] [client- handshake.c:1233:client_setvolume_cbk] 0-vm-client-2: Server and Client lk- version numbers are not same, reopening the fds [2016-12-14 11:08:29.212983] I [MSGID: 114046] [client- handshake.c:1222:client_setvolume_cbk] 0-vm-client-1: Connected to vm- client-1, attached to remote volume '/gluster/vm/brick'. [2016-12-14 11:08:29.212992] I [MSGID: 114047] [client- handshake.c:1233:client_setvolume_cbk] 0-vm-client-1: Server and Client lk- version numbers are not same, reopening the fds [2016-12-14 11:08:29.213042] I [MSGID: 114035] [client- handshake.c:201:client_set_lk_version_cbk] 0-vm-client-2: Server lk version = 1 [2016-12-14 11:08:29.227393] I [MSGID: 114035] [client- handshake.c:201:client_set_lk_version_cbk] 0-vm-client-1: Server lk version = 1 [2016-12-14 11:08:29.228239] I [MSGID: 108031] [afr- common.c:2068:afr_local_discovery_cbk] 0-vm-replicate-0: selecting local read_child vm-client-0 [2016-12-14 11:08:29.228832] I [MSGID: 104041] [glfs- resolve.c:885:__glfs_active_subvol] 0-vm: switched to graph 68767331-2e73-7276-2e62-62772e62652d (0) [2016-12-14 11:08:29.232505] W [MSGID: 114031] [client-rpc- fops.c:2210:client3_3_seek_cbk] 0-vm-client-0: remote operation failed [No such device or address] kvm: block/gluster.c:1182: find_allocation: Assertion `offs >= start' failed. **<\- This is when it crash** Aborted On Dec 14 2016, at 12:55 pm, Lindsay Mathieson wrote: > On 14/12/2016 8:51 PM, contact at makz.me wrote: > Today i've upgraded to 4.4, snapshot now works well but since i've updated, > all my windows VM are broken, when windows try to write on disk the vm crash > without error, even when i try to reinstall a windows, when i partition or > format the vm instantly crash. > > > > Hmmm - I upgraded to PVE 4.4 today (rolling upgrade, 3 nodes) and am using Gluster 3.8.4, all my windows VM's are running fine + snapshots ok. Sounds like you a gluster volume in trouble. > > What virtual disk interface are you using with the windows VM'? ide? virtio? scsi+virtio controller? > > can you post: > > gluster volume info > > gluster volume status > > gluster volume heal heal info > > \-- Lindsay Mathieson > > _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com From bart at bizway.nl Wed Dec 14 13:06:57 2016 From: bart at bizway.nl (Bart Lageweg | Bizway) Date: Wed, 14 Dec 2016 12:06:57 +0000 Subject: [PVE-User] Windows & Gluster 3.8 In-Reply-To: References: <7fdbbhhrucscnmhs1y690a02z-0@mailer.nylas.com> Message-ID: <2539bf8026844fdc9426d2f3a55a27fb@bizway.nl> Moving VM image to local and start VM? (to start the VM en debug on gluster?) -----Oorspronkelijk bericht----- Van: pve-user [mailto:pve-user-bounces at pve.proxmox.com] Namens contact at makz.me Verzonden: woensdag 14 december 2016 13:06 Aan: PVE User List CC: PVE User List Onderwerp: Re: [PVE-User] Windows & Gluster 3.8 I've tried with ide / sata & scsi, for an existing VM it crash at boot screen (windows 7 logo), for new install it crash when i click next at the disk selector root at hvs1:/var/log/glusterfs# **gluster volume info** Volume Name: vm Type: Replicate Volume ID: 76ff16d4-7bd3-4070-b39d-8a173c6292c3 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: hvs1-gluster:/gluster/vm/brick Brick2: hvs2-gluster:/gluster/vm/brick Brick3: hvsquorum-gluster:/gluster/vm/brick root at hvs1:/var/log/glusterfs# **gluster volume status** Status of volume: vm Gluster process TCP Port RDMA Port Online Pid \----------------------------------------------------------------------------- - Brick hvs1-gluster:/gluster/vm/brick 49153 0 Y 3565 Brick hvs2-gluster:/gluster/vm/brick 49153 0 Y 3465 Brick hvsquorum-gluster:/gluster/vm/brick 49153 0 Y 2696 NFS Server on localhost N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 3586 NFS Server on hvsquorum-gluster N/A N/A N N/A Self-heal Daemon on hvsquorum-gluster N/A N/A Y 4190 NFS Server on hvs2-gluster N/A N/A N N/A Self-heal Daemon on hvs2-gluster N/A N/A Y 6212 Task Status of Volume vm \----------------------------------------------------------------------------- - There are no active volume tasks root at hvs1:/var/log/glusterfs# **gluster volume heal vm info** Brick hvs1-gluster:/gluster/vm/brick Status: Connected Number of entries: 0 Brick hvs2-gluster:/gluster/vm/brick Status: Connected Number of entries: 0 Brick hvsquorum-gluster:/gluster/vm/brick Status: Connected Number of entries: 0 **Here's the kvm log** [2016-12-14 11:08:29.210641] I [MSGID: 104045] [glfs-master.c:91:notify] 0-gfapi: New graph 68767331-2e73-7276-2e62-62772e62652d (0) coming up [2016-12-14 11:08:29.210689] I [MSGID: 114020] [client.c:2356:notify] 0-vm- client-0: parent translators are ready, attempting connect on transport [2016-12-14 11:08:29.211020] I [MSGID: 114020] [client.c:2356:notify] 0-vm- client-1: parent translators are ready, attempting connect on transport [2016-12-14 11:08:29.211272] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-vm- client-0: changing port to 49153 (from 0) [2016-12-14 11:08:29.211285] I [MSGID: 114020] [client.c:2356:notify] 0-vm- client-2: parent translators are ready, attempting connect on transport [2016-12-14 11:08:29.211838] I [MSGID: 114057] [client- handshake.c:1446:select_server_supported_programs] 0-vm-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-12-14 11:08:29.211910] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-vm- client-1: changing port to 49153 (from 0) [2016-12-14 11:08:29.211980] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-vm- client-2: changing port to 49153 (from 0) [2016-12-14 11:08:29.212227] I [MSGID: 114046] [client- handshake.c:1222:client_setvolume_cbk] 0-vm-client-0: Connected to vm- client-0, attached to remote volume '/gluster/vm/brick'. [2016-12-14 11:08:29.212239] I [MSGID: 114047] [client- handshake.c:1233:client_setvolume_cbk] 0-vm-client-0: Server and Client lk- version numbers are not same, reopening the fds [2016-12-14 11:08:29.212296] I [MSGID: 108005] [afr-common.c:4298:afr_notify] 0-vm-replicate-0: Subvolume 'vm-client-0' came back up; going online. [2016-12-14 11:08:29.212316] I [MSGID: 114035] [client- handshake.c:201:client_set_lk_version_cbk] 0-vm-client-0: Server lk version = 1 [2016-12-14 11:08:29.212426] I [MSGID: 114057] [client- handshake.c:1446:select_server_supported_programs] 0-vm-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-12-14 11:08:29.212590] I [MSGID: 114057] [client- handshake.c:1446:select_server_supported_programs] 0-vm-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-12-14 11:08:29.212874] I [MSGID: 114046] [client- handshake.c:1222:client_setvolume_cbk] 0-vm-client-2: Connected to vm- client-2, attached to remote volume '/gluster/vm/brick'. [2016-12-14 11:08:29.212886] I [MSGID: 114047] [client- handshake.c:1233:client_setvolume_cbk] 0-vm-client-2: Server and Client lk- version numbers are not same, reopening the fds [2016-12-14 11:08:29.212983] I [MSGID: 114046] [client- handshake.c:1222:client_setvolume_cbk] 0-vm-client-1: Connected to vm- client-1, attached to remote volume '/gluster/vm/brick'. [2016-12-14 11:08:29.212992] I [MSGID: 114047] [client- handshake.c:1233:client_setvolume_cbk] 0-vm-client-1: Server and Client lk- version numbers are not same, reopening the fds [2016-12-14 11:08:29.213042] I [MSGID: 114035] [client- handshake.c:201:client_set_lk_version_cbk] 0-vm-client-2: Server lk version = 1 [2016-12-14 11:08:29.227393] I [MSGID: 114035] [client- handshake.c:201:client_set_lk_version_cbk] 0-vm-client-1: Server lk version = 1 [2016-12-14 11:08:29.228239] I [MSGID: 108031] [afr- common.c:2068:afr_local_discovery_cbk] 0-vm-replicate-0: selecting local read_child vm-client-0 [2016-12-14 11:08:29.228832] I [MSGID: 104041] [glfs- resolve.c:885:__glfs_active_subvol] 0-vm: switched to graph 68767331-2e73-7276-2e62-62772e62652d (0) [2016-12-14 11:08:29.232505] W [MSGID: 114031] [client-rpc- fops.c:2210:client3_3_seek_cbk] 0-vm-client-0: remote operation failed [No such device or address] kvm: block/gluster.c:1182: find_allocation: Assertion `offs >= start' failed. **<\- This is when it crash** Aborted On Dec 14 2016, at 12:55 pm, Lindsay Mathieson wrote: > On 14/12/2016 8:51 PM, contact at makz.me wrote: > Today i've upgraded to 4.4, snapshot now works well but since i've > updated, all my windows VM are broken, when windows try to write on > disk the vm crash without error, even when i try to reinstall a > windows, when i partition or format the vm instantly crash. > > > > Hmmm - I upgraded to PVE 4.4 today (rolling upgrade, 3 nodes) and am using Gluster 3.8.4, all my windows VM's are running fine + snapshots ok. Sounds like you a gluster volume in trouble. > > What virtual disk interface are you using with the windows VM'? ide? virtio? scsi+virtio controller? > > can you post: > > gluster volume info > > gluster volume status > > gluster volume heal heal info > > \-- Lindsay Mathieson > > _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From daniel at linux-nerd.de Wed Dec 14 13:21:42 2016 From: daniel at linux-nerd.de (Daniel) Date: Wed, 14 Dec 2016 13:21:42 +0100 Subject: [PVE-User] Ceph Disk Usage/Storage In-Reply-To: <58513450.6010608@proxmox.com> References: <25453F57-3042-4E13-BEBA-00F7BCD6810A@linux-nerd.de> <58513450.6010608@proxmox.com> Message-ID: Ahh ok, so when i setup 2/1 it means 2 Copies. right? > Am 14.12.2016 um 13:00 schrieb Wolfgang Link : > > You pool size is 3*400 GB so 1.2GB is correct, but your config say 3/2. > This means you have 3 copies of every pg(Placement Group) and min 2 pg > are needed to operate correct. > > This means if you write 1GB you lose 3GB of free storage. > > > On 12/14/2016 12:14 PM, Daniel wrote: >> Hi there, >> >> i created a Ceph File-System with 3x 400GB >> In my config i said 3/2 that means that one of that disks are only for an faulty issue (like raid5) >> 3 HDDs Max and 2 HDDs Minimum >> >> In my System-Overview u see that i have 1.2TB Free-Space which cant be correct. >> >> This is what the CLI command shows me: >> >> POOLS: >> NAME ID USED %USED MAX AVAIL OBJECTS >> ceph 2 0 0 441G 0 >> >> But as i understand it correctly max Avail GB must be round about 800GB >> >> Cheers >> >> Daniel >> >> >> >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From lindsay.mathieson at gmail.com Wed Dec 14 13:25:56 2016 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Wed, 14 Dec 2016 22:25:56 +1000 Subject: [PVE-User] Windows & Gluster 3.8 In-Reply-To: References: <7fdbbhhrucscnmhs1y690a02z-0@mailer.nylas.com> Message-ID: <75f102d4-6525-7fd0-d892-8da8bb610f18@gmail.com> When host VM's on gluster you should be setting the following: performance.readdir-ahead: on cluster.data-self-heal: on cluster.quorum-type: auto cluster.server-quorum-type: server performance.strict-write-ordering: off performance.stat-prefetch: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off cluster.eager-lock: enable network.remote-dio: enable cluster.granular-entry-heal: yes cluster.locking-scheme: granular And possibly consider downgrading to 3.8.4? On 14/12/2016 10:05 PM, contact at makz.me wrote: > I've tried with ide / sata & scsi, for an existing VM it crash at boot screen > (windows 7 logo), for new install it crash when i click next at the disk > selector > > > > root at hvs1:/var/log/glusterfs# **gluster volume info** > > Volume Name: vm > Type: Replicate > Volume ID: 76ff16d4-7bd3-4070-b39d-8a173c6292c3 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: hvs1-gluster:/gluster/vm/brick > Brick2: hvs2-gluster:/gluster/vm/brick > Brick3: hvsquorum-gluster:/gluster/vm/brick > root at hvs1:/var/log/glusterfs# **gluster volume status** > Status of volume: vm > Gluster process TCP Port RDMA Port Online Pid > \----------------------------------------------------------------------------- > - > Brick hvs1-gluster:/gluster/vm/brick 49153 0 Y 3565 > Brick hvs2-gluster:/gluster/vm/brick 49153 0 Y 3465 > Brick hvsquorum-gluster:/gluster/vm/brick 49153 0 Y 2696 > NFS Server on localhost N/A N/A N N/A > Self-heal Daemon on localhost N/A N/A Y 3586 > NFS Server on hvsquorum-gluster N/A N/A N N/A > Self-heal Daemon on hvsquorum-gluster N/A N/A Y 4190 > NFS Server on hvs2-gluster N/A N/A N N/A > Self-heal Daemon on hvs2-gluster N/A N/A Y 6212 > > Task Status of Volume vm > \----------------------------------------------------------------------------- > - > There are no active volume tasks > > root at hvs1:/var/log/glusterfs# **gluster volume heal vm info** > Brick hvs1-gluster:/gluster/vm/brick > Status: Connected > Number of entries: 0 > > Brick hvs2-gluster:/gluster/vm/brick > Status: Connected > Number of entries: 0 > > Brick hvsquorum-gluster:/gluster/vm/brick > Status: Connected > Number of entries: 0 > > > > **Here's the kvm log** > > > > [2016-12-14 11:08:29.210641] I [MSGID: 104045] [glfs-master.c:91:notify] > 0-gfapi: New graph 68767331-2e73-7276-2e62-62772e62652d (0) coming up > [2016-12-14 11:08:29.210689] I [MSGID: 114020] [client.c:2356:notify] 0-vm- > client-0: parent translators are ready, attempting connect on transport > [2016-12-14 11:08:29.211020] I [MSGID: 114020] [client.c:2356:notify] 0-vm- > client-1: parent translators are ready, attempting connect on transport > [2016-12-14 11:08:29.211272] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-vm- > client-0: changing port to 49153 (from 0) > [2016-12-14 11:08:29.211285] I [MSGID: 114020] [client.c:2356:notify] 0-vm- > client-2: parent translators are ready, attempting connect on transport > [2016-12-14 11:08:29.211838] I [MSGID: 114057] [client- > handshake.c:1446:select_server_supported_programs] 0-vm-client-0: Using > Program GlusterFS 3.3, Num (1298437), Version (330) > [2016-12-14 11:08:29.211910] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-vm- > client-1: changing port to 49153 (from 0) > [2016-12-14 11:08:29.211980] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-vm- > client-2: changing port to 49153 (from 0) > [2016-12-14 11:08:29.212227] I [MSGID: 114046] [client- > handshake.c:1222:client_setvolume_cbk] 0-vm-client-0: Connected to vm- > client-0, attached to remote volume '/gluster/vm/brick'. > [2016-12-14 11:08:29.212239] I [MSGID: 114047] [client- > handshake.c:1233:client_setvolume_cbk] 0-vm-client-0: Server and Client lk- > version numbers are not same, reopening the fds > [2016-12-14 11:08:29.212296] I [MSGID: 108005] [afr-common.c:4298:afr_notify] > 0-vm-replicate-0: Subvolume 'vm-client-0' came back up; going online. > [2016-12-14 11:08:29.212316] I [MSGID: 114035] [client- > handshake.c:201:client_set_lk_version_cbk] 0-vm-client-0: Server lk version = > 1 > [2016-12-14 11:08:29.212426] I [MSGID: 114057] [client- > handshake.c:1446:select_server_supported_programs] 0-vm-client-1: Using > Program GlusterFS 3.3, Num (1298437), Version (330) > [2016-12-14 11:08:29.212590] I [MSGID: 114057] [client- > handshake.c:1446:select_server_supported_programs] 0-vm-client-2: Using > Program GlusterFS 3.3, Num (1298437), Version (330) > [2016-12-14 11:08:29.212874] I [MSGID: 114046] [client- > handshake.c:1222:client_setvolume_cbk] 0-vm-client-2: Connected to vm- > client-2, attached to remote volume '/gluster/vm/brick'. > [2016-12-14 11:08:29.212886] I [MSGID: 114047] [client- > handshake.c:1233:client_setvolume_cbk] 0-vm-client-2: Server and Client lk- > version numbers are not same, reopening the fds > [2016-12-14 11:08:29.212983] I [MSGID: 114046] [client- > handshake.c:1222:client_setvolume_cbk] 0-vm-client-1: Connected to vm- > client-1, attached to remote volume '/gluster/vm/brick'. > [2016-12-14 11:08:29.212992] I [MSGID: 114047] [client- > handshake.c:1233:client_setvolume_cbk] 0-vm-client-1: Server and Client lk- > version numbers are not same, reopening the fds > [2016-12-14 11:08:29.213042] I [MSGID: 114035] [client- > handshake.c:201:client_set_lk_version_cbk] 0-vm-client-2: Server lk version = > 1 > [2016-12-14 11:08:29.227393] I [MSGID: 114035] [client- > handshake.c:201:client_set_lk_version_cbk] 0-vm-client-1: Server lk version = > 1 > [2016-12-14 11:08:29.228239] I [MSGID: 108031] [afr- > common.c:2068:afr_local_discovery_cbk] 0-vm-replicate-0: selecting local > read_child vm-client-0 > [2016-12-14 11:08:29.228832] I [MSGID: 104041] [glfs- > resolve.c:885:__glfs_active_subvol] 0-vm: switched to graph > 68767331-2e73-7276-2e62-62772e62652d (0) > [2016-12-14 11:08:29.232505] W [MSGID: 114031] [client-rpc- > fops.c:2210:client3_3_seek_cbk] 0-vm-client-0: remote operation failed [No > such device or address] > kvm: block/gluster.c:1182: find_allocation: Assertion `offs >= start' failed. > **<\- This is when it crash** > Aborted > > > > > On Dec 14 2016, at 12:55 pm, Lindsay Mathieson > wrote: > >> On 14/12/2016 8:51 PM, contact at makz.me wrote: >> Today i've upgraded to 4.4, snapshot now works well but since i've updated, >> all my windows VM are broken, when windows try to write on disk the vm crash >> without error, even when i try to reinstall a windows, when i partition or >> format the vm instantly crash. >> >> >> Hmmm - I upgraded to PVE 4.4 today (rolling upgrade, 3 nodes) and am > using Gluster 3.8.4, all my windows VM's are running fine + snapshots > ok. Sounds like you a gluster volume in trouble. > >> > What virtual disk interface are you using with the windows VM'? ide? > virtio? scsi+virtio controller? > >> can you post: >> gluster volume info >> > gluster volume status > >> gluster volume heal heal info >> > \-- > Lindsay Mathieson > >> _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user -- Lindsay Mathieson From w.bumiller at proxmox.com Thu Dec 15 09:23:30 2016 From: w.bumiller at proxmox.com (Wolfgang Bumiller) Date: Thu, 15 Dec 2016 09:23:30 +0100 (CET) Subject: [PVE-User] Windows & Gluster 3.8 In-Reply-To: <75f102d4-6525-7fd0-d892-8da8bb610f18@gmail.com> References: <7fdbbhhrucscnmhs1y690a02z-0@mailer.nylas.com> <75f102d4-6525-7fd0-d892-8da8bb610f18@gmail.com> Message-ID: <1041423255.15.1481790210328@webmail.proxmox.com> > On December 14, 2016 at 1:25 PM Lindsay Mathieson wrote: > > > When host VM's on gluster you should be setting the following: > > (...) > > And possibly consider downgrading to 3.8.4? Unfortunately I'll have to confirm that there are a few bugs in versions prior and after 3.8.4 which are easily triggered with qemu. Though I just saw 3.8.7 is available by now which should also contain the fixes. Seems to work in my local tests. Would be nice if some more people could test it. From lindsay.mathieson at gmail.com Thu Dec 15 09:52:34 2016 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Thu, 15 Dec 2016 18:52:34 +1000 Subject: [PVE-User] Windows & Gluster 3.8 In-Reply-To: <1041423255.15.1481790210328@webmail.proxmox.com> References: <7fdbbhhrucscnmhs1y690a02z-0@mailer.nylas.com> <75f102d4-6525-7fd0-d892-8da8bb610f18@gmail.com> <1041423255.15.1481790210328@webmail.proxmox.com> Message-ID: On 15 December 2016 at 18:23, Wolfgang Bumiller wrote: > Though I just saw 3.8.7 is available by now which should also contain the > fixes. Seems to work in my local tests. Would be nice if some more people > could test it. This weekend probably -- Lindsay From d.csapak at proxmox.com Thu Dec 15 10:07:09 2016 From: d.csapak at proxmox.com (Dominik Csapak) Date: Thu, 15 Dec 2016 10:07:09 +0100 Subject: [PVE-User] Windows & Gluster 3.8 In-Reply-To: <1041423255.15.1481790210328@webmail.proxmox.com> References: <7fdbbhhrucscnmhs1y690a02z-0@mailer.nylas.com> <75f102d4-6525-7fd0-d892-8da8bb610f18@gmail.com> <1041423255.15.1481790210328@webmail.proxmox.com> Message-ID: <8f06f902-e76b-39ac-db12-29b9cc7b2439@proxmox.com> On 12/15/2016 09:23 AM, Wolfgang Bumiller wrote: >> On December 14, 2016 at 1:25 PM Lindsay Mathieson wrote: >> >> >> When host VM's on gluster you should be setting the following: >> >> (...) >> >> And possibly consider downgrading to 3.8.4? > > Unfortunately I'll have to confirm that there are a few bugs in > versions prior and after 3.8.4 which are easily triggered with qemu. > > Though I just saw 3.8.7 is available by now which should also contain the > fixes. Seems to work in my local tests. Would be nice if some more people > could test it. > i tested a bit here with proxmox 4.4-1 glusterfs 3.8.7-1 bricks: 3x1 3 hosts what i tested and worked: vm create & install (debian 8) with qcow2 little usage (install some packages, copied some files) snapshot and rollback clone offline linked clones of templates (also across hosts) what did not work reliably: online clone (the source vm sometime simply stops, have to investigate if this is another gluster or qemu bug) what i did not test: migration, raw and vmdk different replicas From IMMO.WETZEL at adtran.com Thu Dec 15 11:35:01 2016 From: IMMO.WETZEL at adtran.com (IMMO WETZEL) Date: Thu, 15 Dec 2016 10:35:01 +0000 Subject: [PVE-User] fast way to get all vm names via pvesh? Message-ID: Hi Thats my current script to get all vm names from the cluster. Afterwards I check the new name against the list to prevent errors. #!/usr/bin/env bash nodes=$(pvesh get /nodes/ 2>/dev/null | sed -n -E '/\"id\"/ s/.*:\s\"(.*)\".*/\1/p' | sed -n -E 's/node/nodes/p' ) vms=$(for i in $nodes ; do vms=$(pvesh get $i/qemu/ 2>/dev/null | sed -n -E '/vmid/ s/.*:\s(.*[^\s]).*/\/qemu\/\1/p') ; for q in $vms ; do echo $i$q ; done ; done ) for i in $vms ; do pvesh get $i/config 2>/dev/null | sed -n -E '/\"name\"/ s/.*:\s\"(.*[^\"])\".*/\1/p' | xargs echo "vm $i has name" ; done But its so SLOOOOOWWWWW Any better ideas ? Mit freundlichen Gr??en / With kind regards Immo Wetzel From mark at tuxis.nl Thu Dec 15 11:41:36 2016 From: mark at tuxis.nl (Mark Schouten) Date: Thu, 15 Dec 2016 11:41:36 +0100 Subject: [PVE-User] fast way to get all vm names via pvesh? In-Reply-To: Message-ID: <34779576-19318@kerio.tuxis.nl> pve:/> get /cluster/resources -type=vm ? Met vriendelijke groeten, --? Kerio Operator in de Cloud? https://www.kerioindecloud.nl/ Mark Schouten | Tuxis Internet Engineering KvK:?61527076?| http://www.tuxis.nl/ T: 0318 200208 | info at tuxis.nl Van: IMMO WETZEL Aan: "pve-user at pve.proxmox.com" Verzonden: 15-12-2016 11:35 Onderwerp: [PVE-User] fast way to get all vm names via pvesh? Hi Thats my current script to get all vm names from the cluster. Afterwards I check the new name against the list to prevent errors. #!/usr/bin/env bash nodes=$(pvesh get ?/nodes/ 2>/dev/null ?| sed -n -E '/\"id\"/ s/.*:\s\"(.*)\".*/\1/p' | sed -n -E 's/node/nodes/p' ) vms=$(for i in $nodes ; do vms=$(pvesh get $i/qemu/ 2>/dev/null | sed -n -E '/vmid/ s/.*:\s(.*[^\s]).*/\/qemu\/\1/p') ; for q in $vms ; do echo $i$q ; done ?; done ) for i in $vms ; do pvesh get $i/config 2>/dev/null | sed -n -E '/\"name\"/ s/.*:\s\"(.*[^\"])\".*/\1/p' | xargs echo "vm $i has name" ?; done But its so SLOOOOOWWWWW Any better ideas ? Mit freundlichen Gr??en / With kind regards Immo Wetzel _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From IMMO.WETZEL at adtran.com Thu Dec 15 11:50:24 2016 From: IMMO.WETZEL at adtran.com (IMMO WETZEL) Date: Thu, 15 Dec 2016 10:50:24 +0000 Subject: [PVE-User] fast way to get all vm names via pvesh? Message-ID: Thats cool thanks a lot Immo -----Original Message----- From: pve-user [mailto:pve-user-bounces at pve.proxmox.com] On Behalf Of Mark Schouten Sent: Thursday, December 15, 2016 11:42 AM To: PVE User List Subject: Re: [PVE-User] fast way to get all vm names via pvesh? pve:/> get /cluster/resources -type=vm ? Met vriendelijke groeten, -- Kerio Operator in de Cloud? https://www.kerioindecloud.nl/ Mark Schouten | Tuxis Internet Engineering KvK:?61527076?| http://www.tuxis.nl/ T: 0318 200208 | info at tuxis.nl Van: IMMO WETZEL Aan: "pve-user at pve.proxmox.com" Verzonden: 15-12-2016 11:35 Onderwerp: [PVE-User] fast way to get all vm names via pvesh? Hi Thats my current script to get all vm names from the cluster. Afterwards I check the new name against the list to prevent errors. #!/usr/bin/env bash nodes=$(pvesh get ?/nodes/ 2>/dev/null ?| sed -n -E '/\"id\"/ s/.*:\s\"(.*)\".*/\1/p' | sed -n -E 's/node/nodes/p' ) vms=$(for i in $nodes ; do vms=$(pvesh get $i/qemu/ 2>/dev/null | sed -n -E '/vmid/ s/.*:\s(.*[^\s]).*/\/qemu\/\1/p') ; for q in $vms ; do echo $i$q ; done ?; done ) for i in $vms ; do pvesh get $i/config 2>/dev/null | sed -n -E '/\"name\"/ s/.*:\s\"(.*[^\"])\".*/\1/p' | xargs echo "vm $i has name" ?; done But its so SLOOOOOWWWWW Any better ideas ? Mit freundlichen Gr??en / With kind regards Immo Wetzel _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From dietmar at proxmox.com Thu Dec 15 11:52:24 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Thu, 15 Dec 2016 11:52:24 +0100 (CET) Subject: [PVE-User] fast way to get all vm names via pvesh? In-Reply-To: References: Message-ID: <982687437.36.1481799144642@webmail.proxmox.com> > Thats my current script to get all vm names from the cluster. Afterwards I > check the new name against the list to prevent errors. > > #!/usr/bin/env bash > nodes=$(pvesh get /nodes/ 2>/dev/null | sed -n -E '/\"id\"/ > s/.*:\s\"(.*)\".*/\1/p' | sed -n -E 's/node/nodes/p' ) > vms=$(for i in $nodes ; do vms=$(pvesh get $i/qemu/ 2>/dev/null | sed -n -E > '/vmid/ s/.*:\s(.*[^\s]).*/\/qemu\/\1/p') ; for q in $vms ; do echo $i$q ; > done ; done ) > for i in $vms ; do pvesh get $i/config 2>/dev/null | sed -n -E '/\"name\"/ > s/.*:\s\"(.*[^\"])\".*/\1/p' | xargs echo "vm $i has name" ; done > > But its so SLOOOOOWWWWW > > Any better ideas ? Maybe the following helps? # pvesh get /cluster/resources --type vm From gaio at sv.lnf.it Thu Dec 15 12:48:53 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Thu, 15 Dec 2016 12:48:53 +0100 Subject: [PVE-User] Ceph: Some trouble creating OSD with journal on a sotware raid device... In-Reply-To: <20161013101318.GG3646@sv.lnf.it> References: <20161013101318.GG3646@sv.lnf.it> Message-ID: <20161215114853.GA8761@sv.lnf.it> Sorry, i came back to this topic because i've done some more tests. Seems that 'pveceph' tool have some trouble creating OSD with journal on ''nonstandard'' partition, for examply on a MD device. A command like: pveceph createosd /dev/sde --journal_dev /dev/md4 fail mysteriously (OSD are added to the cluster, out and down, but on the node even the service get not created, eg, there's nothing to restart). Disks get partitioned/formatted (/dev/sde, but also /dev/md4). If instead i create on the device a GPT partition, for example of type Linux, i can do: pveceph createosd /dev/sde --journal_dev /dev/md4p1 and creation of the OSD work flawlessy, with only a single warning emitted: WARNING:ceph-disk:Journal /dev/md4p1 was not prepared with ceph-disk. Symlinking directly. But journal work as expected. I make a note. Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From t.lamprecht at proxmox.com Thu Dec 15 13:33:30 2016 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Thu, 15 Dec 2016 13:33:30 +0100 Subject: [PVE-User] fast way to get all vm names via pvesh? In-Reply-To: References: Message-ID: On 12/15/2016 11:35 AM, IMMO WETZEL wrote: > Hi > > Thats my current script to get all vm names from the cluster. Afterwards I check the new name against the list to prevent errors. > > #!/usr/bin/env bash > nodes=$(pvesh get /nodes/ 2>/dev/null | sed -n -E '/\"id\"/ s/.*:\s\"(.*)\".*/\1/p' | sed -n -E 's/node/nodes/p' ) > vms=$(for i in $nodes ; do vms=$(pvesh get $i/qemu/ 2>/dev/null | sed -n -E '/vmid/ s/.*:\s(.*[^\s]).*/\/qemu\/\1/p') ; for q in $vms ; do echo $i$q ; done ; done ) > for i in $vms ; do pvesh get $i/config 2>/dev/null | sed -n -E '/\"name\"/ s/.*:\s\"(.*[^\"])\".*/\1/p' | xargs echo "vm $i has name" ; done > > But its so SLOOOOOWWWWW > > Any better ideas ? By factor 10 faster than a pvesh call would be the following "one liner": # shopt -s globstar; for i in /etc/pve/nodes/**/{qemu-server,lxc}/*; do grep -oP '(?<=name:)\s*(\S+)' "$i"; done shopt -s globstar is for '**' only and could be circumvented if necessary. From davel at upilab.com Thu Dec 15 13:36:45 2016 From: davel at upilab.com (David Lawley) Date: Thu, 15 Dec 2016 07:36:45 -0500 Subject: [PVE-User] fast way to get all vm names via pvesh? In-Reply-To: References: Message-ID: <8d87fe0d-15cc-3117-eb36-9dcf35468f6c@upilab.com> no more often than I need it, this works well for me. thanks for sharing On 12/15/2016 5:35 AM, IMMO WETZEL wrote: > #!/usr/bin/env bash > nodes=$(pvesh get /nodes/ 2>/dev/null | sed -n -E '/\"id\"/ s/.*:\s\"(.*)\".*/\1/p' | sed -n -E 's/node/nodes/p' ) > vms=$(for i in $nodes ; do vms=$(pvesh get $i/qemu/ 2>/dev/null | sed -n -E '/vmid/ s/.*:\s(.*[^\s]).*/\/qemu\/\1/p') ; for q in $vms ; do echo $i$q ; done ; done ) > for i in $vms ; do pvesh get $i/config 2>/dev/null | sed -n -E '/\"name\"/ s/.*:\s\"(.*[^\"])\".*/\1/p' | xargs echo "vm $i has name" ; done From yannis.milios at gmail.com Thu Dec 15 18:39:41 2016 From: yannis.milios at gmail.com (Yannis Milios) Date: Thu, 15 Dec 2016 17:39:41 +0000 Subject: [PVE-User] PVE+SPICE smartcard protocol redirection support? Message-ID: Hello, I'm sorry if this has been asked again in the past, but may I ask if PVE SPICE implementation supports smartcard passthrough and if yes how can be enabled? For usb card readers usb redirection works but for built in ones (laptops) smartcard protocol redirection is needed. I'm currently evaluating PVE + SPICE as a possible VDI solution and smartcard redirection is mandatory for the setup. Thanks, Yannis From bc at iptel.co Fri Dec 16 09:58:52 2016 From: bc at iptel.co (Brian ::) Date: Fri, 16 Dec 2016 08:58:52 +0000 Subject: [PVE-User] Ceph: Some trouble creating OSD with journal on a sotware raid device... In-Reply-To: <20161215114853.GA8761@sv.lnf.it> References: <20161013101318.GG3646@sv.lnf.it> <20161215114853.GA8761@sv.lnf.it> Message-ID: This is probably by design.. On Thu, Dec 15, 2016 at 11:48 AM, Marco Gaiarin wrote: > > Sorry, i came back to this topic because i've done some more tests. > > Seems that 'pveceph' tool have some trouble creating OSD with journal > on ''nonstandard'' partition, for examply on a MD device. > > A command like: > > pveceph createosd /dev/sde --journal_dev /dev/md4 > > fail mysteriously (OSD are added to the cluster, out and down, but on > the node even the service get not created, eg, there's nothing to > restart). > Disks get partitioned/formatted (/dev/sde, but also /dev/md4). > > > If instead i create on the device a GPT partition, for example of type > Linux, i can do: > > pveceph createosd /dev/sde --journal_dev /dev/md4p1 > > and creation of the OSD work flawlessy, with only a single warning > emitted: > > WARNING:ceph-disk:Journal /dev/md4p1 was not prepared with ceph-disk. Symlinking directly. > > But journal work as expected. > > > I make a note. Thanks. > > -- > dott. Marco Gaiarin GNUPG Key ID: 240A3D66 > Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ > Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) > marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 > > Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! > http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 > (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From gaio at sv.lnf.it Fri Dec 16 10:14:45 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Fri, 16 Dec 2016 10:14:45 +0100 Subject: [PVE-User] Again, Ceph: default timeout for osd? Message-ID: <20161216091445.GA4063@sv.lnf.it> I've done some tests in the past, but probably without noting that, because the test system was... a test system, so mostly offloaded. Yesterday i've had to reboot a ceph node, that was MON and with some OSD. I've set the flags: 2016-12-15 17:01:29.139923 mon.0 10.27.251.7:6789/0 1213541 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set and then i've reboot a node. Immediately a mon election start: 2016-12-15 17:02:55.923980 mon.3 10.27.251.11:6789/0 861 : cluster [INF] mon.2 calling new monitor election 2016-12-15 17:02:55.924373 mon.4 10.27.251.12:6789/0 932 : cluster [INF] mon.3 calling new monitor election 2016-12-15 17:02:55.935396 mon.2 10.27.251.9:6789/0 767 : cluster [INF] mon.4 calling new monitor election 2016-12-15 17:02:55.937804 mon.1 10.27.251.8:6789/0 1037 : cluster [INF] mon.1 calling new monitor election 2016-12-15 17:03:00.963259 mon.1 10.27.251.8:6789/0 1038 : cluster [INF] mon.1 at 1 won leader election with quorum 1,2,3,4 2016-12-15 17:03:00.974493 mon.1 10.27.251.8:6789/0 1039 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set; 1 mons down, quorum 1,2,3,4 1,4,2,3 2016-12-15 17:03:00.993133 mon.1 10.27.251.8:6789/0 1040 : cluster [INF] monmap e5: 5 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0,4=10.27.251.9:6789/0} 2016-12-15 17:03:00.993751 mon.1 10.27.251.8:6789/0 1042 : cluster [INF] mdsmap e1: 0/0/0 up 2016-12-15 17:03:00.994296 mon.1 10.27.251.8:6789/0 1043 : cluster [INF] osdmap e1457: 10 osds: 10 up, 10 in but after that i've started to log row like that: 2016-12-15 17:03:19.951569 osd.8 10.27.251.9:6808/2444 77 : cluster [WRN] 2 slow requests, 2 included below; oldest blocked for > 30.707671 secs 2016-12-15 17:03:19.951577 osd.8 10.27.251.9:6808/2444 78 : cluster [WRN] slow request 30.707671 seconds old, received at 2016-12-15 17:02:49.243826: osd_op(client.7866004.0:21967625 rbd_data.21daf62ae8944a.0000000000000e0e [set-alloc-hint object_size 4194304 write_size 4194304,write 2002944~4096] 1.1a150784 ack+ondisk+write+known_if_redirected e1457) currently waiting for subops from 1 2016-12-15 17:03:19.951582 osd.8 10.27.251.9:6808/2444 79 : cluster [WRN] slow request 30.295238 seconds old, received at 2016-12-15 17:02:49.656259: osd_op(client.7865380.0:25563538 rbd_data.4384f22ae8944a.0000000000004347 [set-alloc-hint object_size 4194304 write_size 4194304,write 1953792~4096] 1.c2cdeca ack+ondisk+write+known_if_redirected e1457) currently waiting for subops from 0 2016-12-15 17:03:21.415662 mon.1 10.27.251.8:6789/0 1053 : cluster [INF] pgmap v3604380: 768 pgs: 768 active+clean; 984 GB data, 1964 GB used, 12932 GB / 14896 GB avail; 1336 B/s wr, 0 op/s until the node came back. In the time the server reboot, VMs get irresponsive, with load go sky high. Initially i've NOT noted that the mon election start immediately, but OSD where not marked out/down. So, after reading docs and logs, i've understood that: 1) clearly, ceph cannot mark an OSD down in miliseconds, so if an OSD go down, it is normal that io stalls until the system recognize that the osd is down and redirect access elsewhere. 2) setting the 'nodown,noout flag(s)' i ''lock'' not the ability of ceph to recognize an OSD down/out, but the effect of that (eg, rebalancing). 3) the default timeout of setting OSD out/down are, for me, absolutely far from a reasonable value. The example config (/usr/share/doc/ceph/sample.ceph.conf.gz) say: # The number of seconds Ceph waits before marking a Ceph OSD # Daemon "down" and "out" if it doesn't respond. # Type: 32-bit Integer # (Default: 300) ;mon osd down out interval = 300 # The grace period in seconds before declaring unresponsive Ceph OSD # Daemons "down". # Type: 32-bit Integer # (Default: 900) ;mon osd report timeout = 300 (the same in http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/). So, i've to wait 5 minutes, at the best options, to get an OSD really marked down. In 5 miutes, the server get back from a reboot, so i've never had an OSD down... but with all the VMs in stall! Seems to me a totally unreasonable ''timeout''. I think a reasonable value could be 5-15 seconds, but i'm confused and so i'm seeking feedback. Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From aderumier at odiso.com Fri Dec 16 10:47:12 2016 From: aderumier at odiso.com (Alexandre DERUMIER) Date: Fri, 16 Dec 2016 10:47:12 +0100 (CET) Subject: [PVE-User] Again, Ceph: default timeout for osd? In-Reply-To: <20161216091445.GA4063@sv.lnf.it> References: <20161216091445.GA4063@sv.lnf.it> Message-ID: <14416315.4581108.1481881632874.JavaMail.zimbra@oxygem.tv> >>mon osd down out interval This is the time between when a monitor marks an OSD "down" (not currently serving data) and "out" (not considered *responsible* for data by the cluster). IO will resume once the OSD is down (assuming the PG has its minimum number of live replicas); it's just that data will be re-replicated to other nodes once an OSD is marked "out". osd should go down in around 30s max. (in this time, the cluster will be stale).. but not 5min. I think this can be tunable, don't remember the value. (in ceph kraken, they have done optimisation for this detection https://github.com/ceph/ceph/pull/8558) ----- Mail original ----- De: "Marco Gaiarin" ?: "proxmoxve" Envoy?: Vendredi 16 D?cembre 2016 10:14:45 Objet: [PVE-User] Again, Ceph: default timeout for osd? I've done some tests in the past, but probably without noting that, because the test system was... a test system, so mostly offloaded. Yesterday i've had to reboot a ceph node, that was MON and with some OSD. I've set the flags: 2016-12-15 17:01:29.139923 mon.0 10.27.251.7:6789/0 1213541 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set and then i've reboot a node. Immediately a mon election start: 2016-12-15 17:02:55.923980 mon.3 10.27.251.11:6789/0 861 : cluster [INF] mon.2 calling new monitor election 2016-12-15 17:02:55.924373 mon.4 10.27.251.12:6789/0 932 : cluster [INF] mon.3 calling new monitor election 2016-12-15 17:02:55.935396 mon.2 10.27.251.9:6789/0 767 : cluster [INF] mon.4 calling new monitor election 2016-12-15 17:02:55.937804 mon.1 10.27.251.8:6789/0 1037 : cluster [INF] mon.1 calling new monitor election 2016-12-15 17:03:00.963259 mon.1 10.27.251.8:6789/0 1038 : cluster [INF] mon.1 at 1 won leader election with quorum 1,2,3,4 2016-12-15 17:03:00.974493 mon.1 10.27.251.8:6789/0 1039 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set; 1 mons down, quorum 1,2,3,4 1,4,2,3 2016-12-15 17:03:00.993133 mon.1 10.27.251.8:6789/0 1040 : cluster [INF] monmap e5: 5 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0,4=10.27.251.9:6789/0} 2016-12-15 17:03:00.993751 mon.1 10.27.251.8:6789/0 1042 : cluster [INF] mdsmap e1: 0/0/0 up 2016-12-15 17:03:00.994296 mon.1 10.27.251.8:6789/0 1043 : cluster [INF] osdmap e1457: 10 osds: 10 up, 10 in but after that i've started to log row like that: 2016-12-15 17:03:19.951569 osd.8 10.27.251.9:6808/2444 77 : cluster [WRN] 2 slow requests, 2 included below; oldest blocked for > 30.707671 secs 2016-12-15 17:03:19.951577 osd.8 10.27.251.9:6808/2444 78 : cluster [WRN] slow request 30.707671 seconds old, received at 2016-12-15 17:02:49.243826: osd_op(client.7866004.0:21967625 rbd_data.21daf62ae8944a.0000000000000e0e [set-alloc-hint object_size 4194304 write_size 4194304,write 2002944~4096] 1.1a150784 ack+ondisk+write+known_if_redirected e1457) currently waiting for subops from 1 2016-12-15 17:03:19.951582 osd.8 10.27.251.9:6808/2444 79 : cluster [WRN] slow request 30.295238 seconds old, received at 2016-12-15 17:02:49.656259: osd_op(client.7865380.0:25563538 rbd_data.4384f22ae8944a.0000000000004347 [set-alloc-hint object_size 4194304 write_size 4194304,write 1953792~4096] 1.c2cdeca ack+ondisk+write+known_if_redirected e1457) currently waiting for subops from 0 2016-12-15 17:03:21.415662 mon.1 10.27.251.8:6789/0 1053 : cluster [INF] pgmap v3604380: 768 pgs: 768 active+clean; 984 GB data, 1964 GB used, 12932 GB / 14896 GB avail; 1336 B/s wr, 0 op/s until the node came back. In the time the server reboot, VMs get irresponsive, with load go sky high. Initially i've NOT noted that the mon election start immediately, but OSD where not marked out/down. So, after reading docs and logs, i've understood that: 1) clearly, ceph cannot mark an OSD down in miliseconds, so if an OSD go down, it is normal that io stalls until the system recognize that the osd is down and redirect access elsewhere. 2) setting the 'nodown,noout flag(s)' i ''lock'' not the ability of ceph to recognize an OSD down/out, but the effect of that (eg, rebalancing). 3) the default timeout of setting OSD out/down are, for me, absolutely far from a reasonable value. The example config (/usr/share/doc/ceph/sample.ceph.conf.gz) say: # The number of seconds Ceph waits before marking a Ceph OSD # Daemon "down" and "out" if it doesn't respond. # Type: 32-bit Integer # (Default: 300) ;mon osd down out interval = 300 # The grace period in seconds before declaring unresponsive Ceph OSD # Daemons "down". # Type: 32-bit Integer # (Default: 900) ;mon osd report timeout = 300 (the same in http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/). So, i've to wait 5 minutes, at the best options, to get an OSD really marked down. In 5 miutes, the server get back from a reboot, so i've never had an OSD down... but with all the VMs in stall! Seems to me a totally unreasonable ''timeout''. I think a reasonable value could be 5-15 seconds, but i'm confused and so i'm seeking feedback. Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From contact at makz.me Fri Dec 16 11:03:14 2016 From: contact at makz.me (Maxence Sartiaux) Date: Fri, 16 Dec 2016 10:03:14 +0000 Subject: [PVE-User] Windows & Gluster 3.8 In-Reply-To: <1041423255.15.1481790210328@webmail.proxmox.com> References: <7fdbbhhrucscnmhs1y690a02z-0@mailer.nylas.com> <75f102d4-6525-7fd0-d892-8da8bb610f18@gmail.com> <1041423255.15.1481790210328@webmail.proxmox.com> Message-ID: <2h89jtmekq9xiur4r2l5p5lqz-0@mailer.nylas.com> Hello, I've found my problem, i don't know why, gluster started to store data on my arbiter brick and the parition was full. I've recreated the brick and now all VM run fine. Btw i've upgraded to 3.8.7, if i run some troubles, i'll keep you informed. Another little question, why in the log of kvm, i found traces of gluster client 3.3 ? [2016-12-14 11:08:29.212590] I [MSGID: 114057] [client- handshake.c:1446:select_server_supported_programs] 0-vm-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) Thank you. On Dec 15 2016, at 9:23 am, Wolfgang Bumiller wrote: > > On December 14, 2016 at 1:25 PM Lindsay Mathieson wrote: > > > When host VM's on gluster you should be setting the following: > > (...) > > And possibly consider downgrading to 3.8.4? > > Unfortunately I'll have to confirm that there are a few bugs in versions prior and after 3.8.4 which are easily triggered with qemu. > > Though I just saw 3.8.7 is available by now which should also contain the fixes. Seems to work in my local tests. Would be nice if some more people could test it. > > _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com From adocampo at dltec.net Fri Dec 16 12:33:11 2016 From: adocampo at dltec.net (Angel Docampo) Date: Fri, 16 Dec 2016 12:33:11 +0100 Subject: [PVE-User] Windows & Gluster 3.8 In-Reply-To: <2h89jtmekq9xiur4r2l5p5lqz-0@mailer.nylas.com> References: <7fdbbhhrucscnmhs1y690a02z-0@mailer.nylas.com> <75f102d4-6525-7fd0-d892-8da8bb610f18@gmail.com> <1041423255.15.1481790210328@webmail.proxmox.com> <2h89jtmekq9xiur4r2l5p5lqz-0@mailer.nylas.com> Message-ID: On 16/12/16 11:03, Maxence Sartiaux wrote: > [2016-12-14 11:08:29.212590] I [MSGID: 114057] [client- > handshake.c:1446:select_server_supported_programs] 0-vm-client-2: Using > Program GlusterFS 3.3, Num (1298437), Version (330) AFAIK, this is unrelevant, some developer put the version on the log and never was updated. -- *Angel Docampo * *Datalab Tecnologia, s.a.* Castillejos, 352 - 08025 Barcelona Tel. 93 476 69 14 - Ext: 114 Mob. 670.299.381 From gaio at sv.lnf.it Fri Dec 16 12:47:31 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Fri, 16 Dec 2016 12:47:31 +0100 Subject: [PVE-User] Again, Ceph: default timeout for osd? In-Reply-To: <14416315.4581108.1481881632874.JavaMail.zimbra@oxygem.tv> References: <20161216091445.GA4063@sv.lnf.it> <14416315.4581108.1481881632874.JavaMail.zimbra@oxygem.tv> Message-ID: <20161216114731.GC4063@sv.lnf.it> Mandi! Alexandre DERUMIER In chel di` si favelave... > >>mon osd down out interval > This is the time between when a monitor marks an OSD "down" (not > currently serving data) and "out" (not considered *responsible* for > data by the cluster). IO will resume once the OSD is down (assuming > the PG has its minimum number of live replicas); it's just that data > will be re-replicated to other nodes once an OSD is marked "out". Seems clear to me. I try to make an example to be sure. If i set: mon osd report timeout = 15 mon osd down out interval = 300 happen: a) after 15 seconds, irresponsive OSD get 'down', so IO resume b) after 5 minutes, the OSD get marked 'out', and so rebalancing start. I've still a doubt. If i set 'ceph osd set nodown', simply i put the first timeout to 'never'? Explained as above, could be... and so it is my fault that i've set the 'nodown'... Wait... http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-June/002438.html ok, seems that 'noout' flag is the right thing to do. 'nodown' have to be used only in 'bouncing' situation. If simply i need to stop rebalancing, it suffices to set 'noout'. > osd should go down in around 30s max. (in this time, the cluster will be stale).. > but not 5min. My experience say no. And if the parameter is 'mon osd report timeout', also the docs say '300'. Seems to me a total unreasonable value... > (in ceph kraken, they have done optimisation for this detection > https://github.com/ceph/ceph/pull/8558) Interesting. This is not my case, anyway, because i've rebooted all the server. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From elacunza at binovo.es Fri Dec 16 12:49:06 2016 From: elacunza at binovo.es (Eneko Lacunza) Date: Fri, 16 Dec 2016 12:49:06 +0100 Subject: [PVE-User] Cross CPD HA Message-ID: Hi all, We are doing a preliminary study for a VMWare installation migration to Proxmox. Currently, customer has 2 CPDs in HA, so that if the main CPD goes down, all VMs are restarted in backup CPD. Storage is SAN and storage data is replicated using SAN capabilities. What can be done with Proxmox to get the same HA capabilities? Would it be better to have 2 independent clusters, one in each CPD, or a unique cluster with two failure domains? I have some worries about quorum for a single-cluster, but I don't think Proxmox has anything prepared for cross-cluster HA? Thanks Eneko -- Zuzendari Teknikoa / Director T?cnico Binovo IT Human Project, S.L. Telf. 943493611 943324914 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es From jeff at palmerit.net Fri Dec 16 15:22:40 2016 From: jeff at palmerit.net (Jeff Palmer) Date: Fri, 16 Dec 2016 09:22:40 -0500 Subject: [PVE-User] Again, Ceph: default timeout for osd? In-Reply-To: <20161216114731.GC4063@sv.lnf.it> References: <20161216091445.GA4063@sv.lnf.it> <14416315.4581108.1481881632874.JavaMail.zimbra@oxygem.tv> <20161216114731.GC4063@sv.lnf.it> Message-ID: Marco, I think you've answered the nodown/noout question. As for the "total unreasonable value" for the default.. In my experience the "defaults" become the defaults in 1 of 2 primary ways. The upstream vendor (ceph in this case) has a default that they select based on their expected typical use case, and the downstream vendor didn't override it, OR the downstream changes it to match the typical expected use-case Either way, as with most things in the unix world, the defaults aren't for everyone, which is why you can tune them. If the defaults aren't suitable for you, feel free to change them in your environment. On Fri, Dec 16, 2016 at 6:47 AM, Marco Gaiarin wrote: > Mandi! Alexandre DERUMIER > In chel di` si favelave... > >> >>mon osd down out interval >> This is the time between when a monitor marks an OSD "down" (not >> currently serving data) and "out" (not considered *responsible* for >> data by the cluster). IO will resume once the OSD is down (assuming >> the PG has its minimum number of live replicas); it's just that data >> will be re-replicated to other nodes once an OSD is marked "out". > > Seems clear to me. I try to make an example to be sure. > > If i set: > mon osd report timeout = 15 > mon osd down out interval = 300 > > happen: > > a) after 15 seconds, irresponsive OSD get 'down', so IO resume > > b) after 5 minutes, the OSD get marked 'out', and so rebalancing > start. > > I've still a doubt. If i set 'ceph osd set nodown', simply i put the > first timeout to 'never'? Explained as above, could be... and so it is > my fault that i've set the 'nodown'... > > Wait... > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-June/002438.html > > ok, seems that 'noout' flag is the right thing to do. 'nodown' have to > be used only in 'bouncing' situation. > > If simply i need to stop rebalancing, it suffices to set 'noout'. > > >> osd should go down in around 30s max. (in this time, the cluster will be stale).. >> but not 5min. > > My experience say no. And if the parameter is 'mon osd report timeout', > also the docs say '300'. > Seems to me a total unreasonable value... > > >> (in ceph kraken, they have done optimisation for this detection >> https://github.com/ceph/ceph/pull/8558) > > Interesting. This is not my case, anyway, because i've rebooted all the > server. > > -- > dott. Marco Gaiarin GNUPG Key ID: 240A3D66 > Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ > Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) > marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 > > Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! > http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 > (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user -- Jeff Palmer https://PalmerIT.net From tom at yoda.pw Sat Dec 17 23:07:48 2016 From: tom at yoda.pw (Tom) Date: Sat, 17 Dec 2016 22:07:48 +0000 Subject: [PVE-User] Cluster is functioning properly but showing all nodes as OFFLINE on web GUI Message-ID: Hello all, I'm sure you've all come across this issue before. I just managed to setup a Proxmox 4.4.1 cluster (pve-manager/4.4-1/eb2d6f1e (running kernel: 4.4.35-1-pve)) and everything seems to be working perfectly. There is quorum, all nodes see each other, they communicate fine (omping), the only storage is 'local' (/var/lib/vz) and no errors are being thrown (in syslog, etc). My problem comes to the web GUI, where all nodes show offline (see: http://i.imgur.com/OIpgX5w.png ). On the Datacenter -> Summary page, proxmox shows everything as okay (see: http://i.imgur.com/hp99UxA.png ) but the nodes as 'offline' and it reflects below (see: http://i.imgur.com/kSaYkLc.png ) VM's are running perfectly on the 'kappa' node as I had them all on there before, but as the nodes are showing offline I cannot create a new VM as it tells me the specified node is offline. If I navigate to the 'zeus' web panel, everything above is reflected apart from it showing the 'zeus' server IP address as apposed to the 'kappa' IP address (see: http://i.imgur.com/b9C0ZE6.png ) (note: both IP's start in 3 but they *are* different, just blurred) I've been through google, restarted pvestatd, pve-cluster, corosync, etc multiple times but no avail. Does anyone have any solutions/pointers? Thanks in advance. From dietmar at proxmox.com Sun Dec 18 09:02:36 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Sun, 18 Dec 2016 09:02:36 +0100 (CET) Subject: [PVE-User] Cluster is functioning properly but showing all nodes as OFFLINE on web GUI In-Reply-To: References: Message-ID: <1928017076.1.1482048156987@webmail.proxmox.com> > Does anyone have any solutions/pointers? And "pvesm status" runs without any delay? # pvesm status Or is there a storage which hangs? From tom at yoda.pw Sun Dec 18 10:04:52 2016 From: tom at yoda.pw (Tom) Date: Sun, 18 Dec 2016 09:04:52 +0000 Subject: [PVE-User] Cluster is functioning properly but showing all nodes as OFFLINE on web GUI In-Reply-To: <1928017076.1.1482048156987@webmail.proxmox.com> References: <1928017076.1.1482048156987@webmail.proxmox.com> Message-ID: pvecm status runs fine showing everything is okay, and only storage thats there is the local /var/lib/vz Thanks On Sun, 18 Dec 2016 at 08:02, Dietmar Maurer wrote: > > Does anyone have any solutions/pointers? > > > > And "pvesm status" runs without any delay? > > > > # pvesm status > > > > Or is there a storage which hangs? > > > > _______________________________________________ > > pve-user mailing list > > pve-user at pve.proxmox.com > > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From subchee at gmail.com Sun Dec 18 13:45:22 2016 From: subchee at gmail.com (Szabolcs F.) Date: Sun, 18 Dec 2016 13:45:22 +0100 Subject: [PVE-User] Cluster is functioning properly but showing all nodes as OFFLINE on web GUI In-Reply-To: References: <1928017076.1.1482048156987@webmail.proxmox.com> Message-ID: Hi, I've had a similar issue. Someone kindly suggested me to set the 'token' value to 4000 in the corosync.cnf. /etc/pve/corosync.conf totem { cluster_name: xxxxx config_version: 35 ip_version: ipv4 version: 2 token: 4000 interface { bindnetaddr: X.X.X.X ringnumber: 0 } } Then do this on all nodes: killall -9 corosync /etc/init.d/pve-cluster restart service pveproxy restart This solved the similar problem for me and my cluster of 12 nodes is working properly ever since. On Sun, Dec 18, 2016 at 10:04 AM, Tom wrote: > pvecm status runs fine showing everything is okay, and only storage thats > there is the local /var/lib/vz > > Thanks > > > On Sun, 18 Dec 2016 at 08:02, Dietmar Maurer wrote: > > > > Does anyone have any solutions/pointers? > > > > > > > > And "pvesm status" runs without any delay? > > > > > > > > # pvesm status > > > > > > > > Or is there a storage which hangs? > > > > > > > > _______________________________________________ > > > > pve-user mailing list > > > > pve-user at pve.proxmox.com > > > > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From dietmar at proxmox.com Sun Dec 18 13:59:26 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Sun, 18 Dec 2016 13:59:26 +0100 (CET) Subject: [PVE-User] Cluster is functioning properly but showing all nodes as OFFLINE on web GUI In-Reply-To: References: <1928017076.1.1482048156987@webmail.proxmox.com> Message-ID: <882207347.9.1482065967297@webmail.proxmox.com> > On December 18, 2016 at 10:04 AM Tom wrote: > > > pvecm status runs fine showing everything is okay, and only storage thats > there is the local /var/lib/vz I asked for the output of # pvesm status Also, please make sure the system time is correct on all hosts. From dietmar at proxmox.com Sun Dec 18 14:04:51 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Sun, 18 Dec 2016 14:04:51 +0100 (CET) Subject: [PVE-User] Cluster is functioning properly but showing all nodes as OFFLINE on web GUI In-Reply-To: References: <1928017076.1.1482048156987@webmail.proxmox.com> Message-ID: <189103426.11.1482066291386@webmail.proxmox.com> > I've had a similar issue. Someone kindly suggested me to set the 'token' > value to 4000 in the corosync.cnf. Tom already told us that the corosync cluster status is OK, so why do you think this would help? The cluster works already. From tom at yoda.pw Sun Dec 18 14:07:35 2016 From: tom at yoda.pw (Tom) Date: Sun, 18 Dec 2016 13:07:35 +0000 Subject: [PVE-User] Cluster is functioning properly but showing all nodes as OFFLINE on web GUI In-Reply-To: <882207347.9.1482065967297@webmail.proxmox.com> References: <1928017076.1.1482048156987@webmail.proxmox.com> <882207347.9.1482065967297@webmail.proxmox.com> Message-ID: Tried adding the token thing to the config, no change. ------- Apologies, outputs: root at kappa:~# pvesm status local dir 1 5660670480 3887284884 1488081380 72.82% root at zeus:~# pvesm status local dir 1 5660763656 26200924 5349253804 0.99% Date and time are fully synced via ntp. Thanks On Sun, 18 Dec 2016 at 12:59, Dietmar Maurer wrote: > > > > > > On December 18, 2016 at 10:04 AM Tom wrote: > > > > > > > > > pvecm status runs fine showing everything is okay, and only storage thats > > > there is the local /var/lib/vz > > > > I asked for the output of > > > > # pvesm status > > > > Also, please make sure the system time is correct on all hosts. > > > > _______________________________________________ > > pve-user mailing list > > pve-user at pve.proxmox.com > > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From dietmar at proxmox.com Sun Dec 18 14:49:02 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Sun, 18 Dec 2016 14:49:02 +0100 (CET) Subject: [PVE-User] Cluster is functioning properly but showing all nodes as OFFLINE on web GUI In-Reply-To: References: <1928017076.1.1482048156987@webmail.proxmox.com> <882207347.9.1482065967297@webmail.proxmox.com> Message-ID: <1782707044.18.1482068942602@webmail.proxmox.com> > Tried adding the token thing to the config, no change. Please revert that change. It makes no sense to fix things which are already working ;-) Is there any hint in /var/log/syslog? From tom at yoda.pw Sun Dec 18 15:10:06 2016 From: tom at yoda.pw (Tom) Date: Sun, 18 Dec 2016 14:10:06 +0000 Subject: [PVE-User] Cluster is functioning properly but showing all nodes as OFFLINE on web GUI In-Reply-To: <1782707044.18.1482068942602@webmail.proxmox.com> References: <1928017076.1.1482048156987@webmail.proxmox.com> <882207347.9.1482065967297@webmail.proxmox.com> <1782707044.18.1482068942602@webmail.proxmox.com> Message-ID: Change reverted. A friend pointed this out to me: Dec 18 13:53:53 kappa corosync[7047]: [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No Unsure though, I have pastebinned the syslog from zeus from startup to when the cluster tick goes to green below: https://paste.ee/p/ybuBv#op2dyxxtu9MPFRVTDFuQT02pS69K5iFb Thanks On 18 December 2016 at 13:49, Dietmar Maurer wrote: > > Tried adding the token thing to the config, no change. > > Please revert that change. It makes no sense to fix things which are > already working ;-) > > Is there any hint in /var/log/syslog? > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From dietmar at proxmox.com Sun Dec 18 15:36:15 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Sun, 18 Dec 2016 15:36:15 +0100 (CET) Subject: [PVE-User] Cluster is functioning properly but showing all nodes as OFFLINE on web GUI In-Reply-To: References: <1928017076.1.1482048156987@webmail.proxmox.com> <882207347.9.1482065967297@webmail.proxmox.com> <1782707044.18.1482068942602@webmail.proxmox.com> Message-ID: <1404792058.20.1482071775574@webmail.proxmox.com> What is the output of # pveversion -v > On December 18, 2016 at 3:10 PM Tom wrote: > > > Change reverted. > > A friend pointed this out to me: Dec 18 13:53:53 kappa corosync[7047]: > [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: > No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No From tom at yoda.pw Sun Dec 18 15:52:18 2016 From: tom at yoda.pw (Tom) Date: Sun, 18 Dec 2016 14:52:18 +0000 Subject: [PVE-User] Cluster is functioning properly but showing all nodes as OFFLINE on web GUI In-Reply-To: References: <1928017076.1.1482048156987@webmail.proxmox.com> <882207347.9.1482065967297@webmail.proxmox.com> <1782707044.18.1482068942602@webmail.proxmox.com> Message-ID: Alright, finally managed to fix it. The problem was to do with hosts, as I was using a local 10.10.10.x VPN (I am using two OVH nodes), because I cannot multicast over the public network. By adding a few more entries in /etc/hosts (thanks to step 4 of this guide: https://forum.ovh.co.uk/showthread.php?7071-Poor-man-s-Proxmox-cluster-with-NAT ) and giving zeus a kick, they are now clustering perfectly and both showing online. To show it better, here's a screenshot from before: https://i.imgur.com/kSaYkLc.png - you can see the IP for kappa starts with 3 which is my public IPv4. Once fixed, it's now showing the following: http://i.imgur.com/Wbi7zqG.png which is now 100% working Thank you all for your guidance :) On 18 December 2016 at 14:10, Tom wrote: > Change reverted. > > A friend pointed this out to me: Dec 18 13:53:53 kappa corosync[7047]: > [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: > No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No > > Unsure though, I have pastebinned the syslog from zeus from startup to > when the cluster tick goes to green below: > https://paste.ee/p/ybuBv#op2dyxxtu9MPFRVTDFuQT02pS69K5iFb > > Thanks > > > On 18 December 2016 at 13:49, Dietmar Maurer wrote: > >> > Tried adding the token thing to the config, no change. >> >> Please revert that change. It makes no sense to fix things which are >> already working ;-) >> >> Is there any hint in /var/log/syslog? >> >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> > > From gaio at sv.lnf.it Mon Dec 19 12:25:53 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Mon, 19 Dec 2016 12:25:53 +0100 Subject: [PVE-User] Again, Ceph: default timeout for osd? In-Reply-To: References: <20161216091445.GA4063@sv.lnf.it> <14416315.4581108.1481881632874.JavaMail.zimbra@oxygem.tv> <20161216114731.GC4063@sv.lnf.it> Message-ID: <20161219112553.GN3745@sv.lnf.it> Mandi! Jeff Palmer In chel di` si favelave... > I think you've answered the nodown/noout question. I was not able to get it back, but i'm sure i've found docs, that seems to me valuable, that suggest to set both noout and nodown flags. Anyway, i've just wrote 100 times on the blackboard ?don't set the 'nodown' flag?... ;-) > As for the "total unreasonable value" for the default.. In my > experience the "defaults" become the defaults in 1 of 2 primary ways. It did not seem polemic. Simpy looking at the docs (link in my past email) seems to me that really default value for 'mon osd report timeout' is 300 seconds, that speaking of I/O seems to me really high. Or, at least, if there's a reason for a such default value, i'm not able to see it... Sorry, thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From gaio at sv.lnf.it Mon Dec 19 12:51:23 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Mon, 19 Dec 2016 12:51:23 +0100 Subject: [PVE-User] Reload ceph.conf? Message-ID: <20161219115123.GO3745@sv.lnf.it> Still another ceph question, i suppose a simple one. AFAIK ceph-deploy is ''incompatible'' with PVE, meaning that PVE put ceph.conf in sync with all node, and so a tool like ceph-deploy is not needed. https://pve.proxmox.com/wiki/Ceph_Server#Why_do_we_need_a_new_command_line_tool_.28pveceph.29.3F But a thing i've not clear: if i modify /etc/ceph/ceph.conf (linked to /etc/pve/ceph.conf) there's some way to ''reload'' config file, apart of course restarting mons and osds? Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From lindsay.mathieson at gmail.com Mon Dec 19 14:43:22 2016 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Mon, 19 Dec 2016 23:43:22 +1000 Subject: [PVE-User] Windows & Gluster 3.8 In-Reply-To: <2h89jtmekq9xiur4r2l5p5lqz-0@mailer.nylas.com> References: <7fdbbhhrucscnmhs1y690a02z-0@mailer.nylas.com> <75f102d4-6525-7fd0-d892-8da8bb610f18@gmail.com> <1041423255.15.1481790210328@webmail.proxmox.com> <2h89jtmekq9xiur4r2l5p5lqz-0@mailer.nylas.com> Message-ID: On 16/12/2016 8:03 PM, Maxence Sartiaux wrote: > Btw i've upgraded to 3.8.7, if i run some troubles, i'll keep you > informed. Yah, I've been running 3.8.7 since Sunday, no issues here either. Live Migrations, Snapshots etc all working. -- Lindsay Mathieson From e.kasper at proxmox.com Tue Dec 20 12:01:39 2016 From: e.kasper at proxmox.com (Emmanuel Kasper) Date: Tue, 20 Dec 2016 12:01:39 +0100 Subject: [PVE-User] Cross CPD HA In-Reply-To: References: Message-ID: <04ef353c-30ab-0426-8982-230c10550bc1@proxmox.com> On 12/16/2016 12:49 PM, Eneko Lacunza wrote: > Hi all, > > We are doing a preliminary study for a VMWare installation migration to > Proxmox. > > Currently, customer has 2 CPDs in HA, so that if the main CPD goes down, > all VMs are restarted in backup CPD. Storage is SAN and storage data is > replicated using SAN capabilities. > > What can be done with Proxmox to get the same HA capabilities? Would it > be better to have 2 independent clusters, one in each CPD, or a unique > cluster with two failure domains? > > I have some worries about quorum for a single-cluster, but I don't think > Proxmox has anything prepared for cross-cluster HA? > > Thanks > Eneko > Hi Eneko What is a CPD ? Is that a cluster ? 'VmWare CPD' in google informs me that I should not join it: https://www.glassdoor.com/Reviews/Employee-Review-VMware-RVW2481385.htm But I guess you mean something different. Emmanuel From gankarloo at gmail.com Tue Dec 20 12:28:23 2016 From: gankarloo at gmail.com (Gustaf Ankarloo) Date: Tue, 20 Dec 2016 12:28:23 +0100 Subject: [PVE-User] Cross CPD HA In-Reply-To: <04ef353c-30ab-0426-8982-230c10550bc1@proxmox.com> References: <04ef353c-30ab-0426-8982-230c10550bc1@proxmox.com> Message-ID: I guess he means Centro de Proceso de Datos (Spanish: Data Processing Center) On Dec 20, 2016 12:15, "Emmanuel Kasper" wrote: > On 12/16/2016 12:49 PM, Eneko Lacunza wrote: > > Hi all, > > > > We are doing a preliminary study for a VMWare installation migration to > > Proxmox. > > > > Currently, customer has 2 CPDs in HA, so that if the main CPD goes down, > > all VMs are restarted in backup CPD. Storage is SAN and storage data is > > replicated using SAN capabilities. > > > > What can be done with Proxmox to get the same HA capabilities? Would it > > be better to have 2 independent clusters, one in each CPD, or a unique > > cluster with two failure domains? > > > > I have some worries about quorum for a single-cluster, but I don't think > > Proxmox has anything prepared for cross-cluster HA? > > > > Thanks > > Eneko > > > > Hi Eneko > What is a CPD ? Is that a cluster ? > 'VmWare CPD' in google informs me that I should not join it: > https://www.glassdoor.com/Reviews/Employee-Review-VMware-RVW2481385.htm > > But I guess you mean something different. > > Emmanuel > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From elacunza at binovo.es Tue Dec 20 12:43:59 2016 From: elacunza at binovo.es (Eneko Lacunza) Date: Tue, 20 Dec 2016 12:43:59 +0100 Subject: [PVE-User] Cross datacenter HA In-Reply-To: References: <04ef353c-30ab-0426-8982-230c10550bc1@proxmox.com> Message-ID: <7c624781-b389-2d60-179b-c35da2e44495@binovo.es> Hi, Sure, I meant datacenter, sorry; I didn't realize the acronym was in spanish :) So let me rewrite the question :-) We are doing a preliminary study for a VMWare installation migration to Proxmox (14 hosts total, 7 in each datacenter). Currently, customer has 2 datacenters in HA, so that if the main datacenter goes down, all VMs are restarted in backup datacenter. Storage is SAN and storage data is replicated using SAN capabilities. What can be done with Proxmox to get the same HA capabilities? Would it be better to have 2 independent clusters, one in each datacenter, or a unique cluster with two failure domains? I have some worries about quorum for a single-cluster, but I don't think Proxmox has anything prepared for cross-cluster HA? Thanks a lot Eneko El 20/12/16 a las 12:28, Gustaf Ankarloo escribi?: > I guess he means Centro de Proceso de Datos (Spanish: Data Processing > Center) > > On Dec 20, 2016 12:15, "Emmanuel Kasper" wrote: > >> On 12/16/2016 12:49 PM, Eneko Lacunza wrote: >>> Hi all, >>> >>> We are doing a preliminary study for a VMWare installation migration to >>> Proxmox. >>> >>> Currently, customer has 2 CPDs in HA, so that if the main CPD goes down, >>> all VMs are restarted in backup CPD. Storage is SAN and storage data is >>> replicated using SAN capabilities. >>> >>> What can be done with Proxmox to get the same HA capabilities? Would it >>> be better to have 2 independent clusters, one in each CPD, or a unique >>> cluster with two failure domains? >>> >>> I have some worries about quorum for a single-cluster, but I don't think >>> Proxmox has anything prepared for cross-cluster HA? >>> >>> Thanks >>> Eneko >>> >> Hi Eneko >> What is a CPD ? Is that a cluster ? >> 'VmWare CPD' in google informs me that I should not join it: >> https://www.glassdoor.com/Reviews/Employee-Review-VMware-RVW2481385.htm >> >> But I guess you mean something different. >> >> Emmanuel >> >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user -- Zuzendari Teknikoa / Director T?cnico Binovo IT Human Project, S.L. Telf. 943493611 943324914 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es From IMMO.WETZEL at adtran.com Tue Dec 20 13:21:52 2016 From: IMMO.WETZEL at adtran.com (IMMO WETZEL) Date: Tue, 20 Dec 2016 12:21:52 +0000 Subject: [PVE-User] pvesh set config description multiline - any hint ? Message-ID: How can I set multiline config descriptions ? root at node01:~# pvesh set /nodes/node04/qemu/315/config -description "line1\nline2" ? Description shows a n not a line break ? update VM 315: -description line1nline2 ? 200 OK root at node01:~# pvesh set /nodes/node04/qemu/315/config -description "line1 \n line2" ? 400 too many arguments root at node01:~# pvesh set /nodes/node04/qemu/315/config -description "line1 \\n line2" ? Description shows a \n not a line break ? update VM 315: -description line1\nline2 ? 200 OK Some with single quotes Mit freundlichen Gr??en / With kind regards Immo From t.lamprecht at proxmox.com Tue Dec 20 13:32:39 2016 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Tue, 20 Dec 2016 13:32:39 +0100 Subject: [PVE-User] pvesh set config description multiline - any hint ? In-Reply-To: References: Message-ID: <43994f4c-27fd-c011-a60b-497708f5f2f1@proxmox.com> On 12/20/2016 01:21 PM, IMMO WETZEL wrote: > How can I set multiline config descriptions ? > > root at node01:~# pvesh set /nodes/node04/qemu/315/config -description "line1\nline2" > > ? Description shows a n not a line break > > ? update VM 315: -description line1nline2 > > ? 200 OK > > > root at node01:~# pvesh set /nodes/node04/qemu/315/config -description "line1 \n line2" > > ? 400 too many arguments > > root at node01:~# pvesh set /nodes/node04/qemu/315/config -description "line1 \\n line2" > > ? Description shows a \n not a line break > > ? update VM 315: -description line1\nline2 > > ? 200 OK > > Some with single quotes The description parameter is url-encoded, see https://en.wikipedia.org/wiki/Percent-encoding pvesh set /nodes/node04/qemu/315/config -description 'line1%0D%0Aline2' produces your desired effect :) From f.gruenbichler at proxmox.com Tue Dec 20 14:09:00 2016 From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=) Date: Tue, 20 Dec 2016 14:09:00 +0100 Subject: [PVE-User] pvesh set config description multiline - any hint ? In-Reply-To: <43994f4c-27fd-c011-a60b-497708f5f2f1@proxmox.com> References: <43994f4c-27fd-c011-a60b-497708f5f2f1@proxmox.com> Message-ID: <20161220130900.cnhqfmzeyzbgpafj@nora.maurer-it.com> On Tue, Dec 20, 2016 at 01:32:39PM +0100, Thomas Lamprecht wrote: > On 12/20/2016 01:21 PM, IMMO WETZEL wrote: > > How can I set multiline config descriptions ? > > > > root at node01:~# pvesh set /nodes/node04/qemu/315/config -description "line1\nline2" > > > > ? Description shows a n not a line break > > > > ? update VM 315: -description line1nline2 > > > > ? 200 OK > > > > > > root at node01:~# pvesh set /nodes/node04/qemu/315/config -description "line1 \n line2" > > > > ? 400 too many arguments > > > > root at node01:~# pvesh set /nodes/node04/qemu/315/config -description "line1 \\n line2" > > > > ? Description shows a \n not a line break > > > > ? update VM 315: -description line1\nline2 > > > > ? 200 OK > > > > Some with single quotes > > The description parameter is url-encoded, see > https://en.wikipedia.org/wiki/Percent-encoding > > pvesh set /nodes/node04/qemu/315/config -description 'line1%0D%0Aline2' > > produces your desired effect :) pvesh set /nodes/NODE/qemu/VMID/config -description 'line1 line2 line3' works as well (YMMV depending on your shell ;) ) From mjoigny at neteven.com Tue Dec 20 17:57:58 2016 From: mjoigny at neteven.com (Michael JOIGNY) Date: Tue, 20 Dec 2016 17:57:58 +0100 Subject: [PVE-User] Checking nfs mount point inside lxc container Message-ID: Hi everybody, I would like to write a bash script to check my nfs mount point (content) inside my lxc container. My nfs mount point are mounted on the host with fstab and all my lxc container have an entry like "mp0: /xxx,mp=/xxx" in their /etc/pve/lxc/[id].conf. I found that the files's container are located at /var/lib/vz/images/[id] on the host but the nfs folder (xxx) is empty while files are reachable from the container. My configuration : * Host : o pve-kernel 4.4.15-1-pve o lxcfs 2.0.5-pve1 * Container config file : o arch: i386 cpulimit: 4 cpuunits: 1024 hostname: xxx memory: 1024 mp0: /xxx,mp=/xxx nameserver: xxx net0: yyy net1: xxx onboot: 1 ostype: debian rootfs: local:221/vm-221-disk-2.subvol,size=0T searchdomain: yyy swap: 512 If you have an idea, please let me know. Kind regards. Michael. -- From e.kasper at proxmox.com Thu Dec 29 11:09:48 2016 From: e.kasper at proxmox.com (Emmanuel Kasper) Date: Thu, 29 Dec 2016 11:09:48 +0100 Subject: [PVE-User] New node not showing in datacenter In-Reply-To: <968ccf82-341f-afed-21f5-f87cc2e63228@coppint.com> References: <968ccf82-341f-afed-21f5-f87cc2e63228@coppint.com> Message-ID: <1ff11790-3207-1650-8849-710b15fec873@proxmox.com> On 12/29/2016 10:57 AM, Florent B wrote: > Hi everyone, > > Today I added a new node to my PVE cluster (every node up-to-date). > > When I connect to one of my 3 old nodes, the new node is not displayed > (at all, not in the list) in "datacenter" list in GUI. > > If I connect to new node GUI, all nodes are OK. > > "pvecm status" is OK everywhere, and "/etc/pve/.members" too. > > What could be the problem ? To rule out first a possible GUI problem: * does the following command executed on one of you "old" nodes return the whole list ? pvesh get cluster/config/nodes From e.kasper at proxmox.com Thu Dec 29 11:23:49 2016 From: e.kasper at proxmox.com (Emmanuel Kasper) Date: Thu, 29 Dec 2016 11:23:49 +0100 Subject: [PVE-User] Cross datacenter HA In-Reply-To: <7c624781-b389-2d60-179b-c35da2e44495@binovo.es> References: <04ef353c-30ab-0426-8982-230c10550bc1@proxmox.com> <7c624781-b389-2d60-179b-c35da2e44495@binovo.es> Message-ID: On 12/20/2016 12:43 PM, Eneko Lacunza wrote: > Hi, > > Sure, I meant datacenter, sorry; I didn't realize the acronym was in > spanish :) > > So let me rewrite the question :-) > > We are doing a preliminary study for a VMWare installation migration to > Proxmox (14 hosts total, 7 in each datacenter). > > Currently, customer has 2 datacenters in HA, so that if the main > datacenter goes down, all VMs are restarted in backup datacenter. > Storage is SAN and storage data is replicated using SAN capabilities. > > What can be done with Proxmox to get the same HA capabilities? Would it > be better to have 2 independent clusters, one in each datacenter, or a > unique cluster with two failure domains? > > I have some worries about quorum for a single-cluster, but I don't think > Proxmox has anything prepared for cross-cluster HA? > > Thanks a lot > Eneko Hi Eneko Proxmox clustering uses Corosync for cluster connection, and corosync needs a low latency link between nodes, in the 5 ms range. If you have a direct *high reliable* fiber link between your two DC / CPD this might work. See this latency calculator http://wintelguy.com/wanlat.html In that case you could build a multi site cluster. You could also set up HA beween your two DCs at the application level, for instance using a bunch of stateless applications servers connecting to a DB, where you want only need to setup HA at the DB level (Hint: streaming replication if using PostgreSQL) From elacunza at binovo.es Thu Dec 29 11:38:59 2016 From: elacunza at binovo.es (Eneko Lacunza) Date: Thu, 29 Dec 2016 11:38:59 +0100 Subject: [PVE-User] Cross datacenter HA In-Reply-To: References: <04ef353c-30ab-0426-8982-230c10550bc1@proxmox.com> <7c624781-b389-2d60-179b-c35da2e44495@binovo.es> Message-ID: <4359fa04-4b56-0392-158c-56019bb20e9d@binovo.es> Hi Emmanuel, El 29/12/16 a las 11:23, Emmanuel Kasper escribi?: > On 12/20/2016 12:43 PM, Eneko Lacunza wrote: >> Hi, >> >> Sure, I meant datacenter, sorry; I didn't realize the acronym was in >> spanish :) >> >> So let me rewrite the question :-) >> >> We are doing a preliminary study for a VMWare installation migration to >> Proxmox (14 hosts total, 7 in each datacenter). >> >> Currently, customer has 2 datacenters in HA, so that if the main >> datacenter goes down, all VMs are restarted in backup datacenter. >> Storage is SAN and storage data is replicated using SAN capabilities. >> >> What can be done with Proxmox to get the same HA capabilities? Would it >> be better to have 2 independent clusters, one in each datacenter, or a >> unique cluster with two failure domains? >> >> I have some worries about quorum for a single-cluster, but I don't think >> Proxmox has anything prepared for cross-cluster HA? >> >> Thanks a lot >> Eneko > Proxmox clustering uses Corosync for cluster connection, and corosync > needs a low latency link between nodes, in the 5 ms range. > If you have a direct *high reliable* fiber link between your two DC / > CPD this might work. See this latency calculator > http://wintelguy.com/wanlat.html > In that case you could build a multi site cluster. > > You could also set up HA beween your two DCs at the application level, > for instance using a bunch of stateless applications servers connecting > to a DB, where you want only need to setup HA at the DB level (Hint: > streaming replication if using PostgreSQL) Thanks for the hints! Eneko -- Zuzendari Teknikoa / Director T?cnico Binovo IT Human Project, S.L. Telf. 943493611 943324914 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es From lemonnierk at ulrar.net Thu Dec 29 11:46:04 2016 From: lemonnierk at ulrar.net (Kevin Lemonnier) Date: Thu, 29 Dec 2016 11:46:04 +0100 Subject: [PVE-User] Cross datacenter HA In-Reply-To: <7c624781-b389-2d60-179b-c35da2e44495@binovo.es> References: <04ef353c-30ab-0426-8982-230c10550bc1@proxmox.com> <7c624781-b389-2d60-179b-c35da2e44495@binovo.es> Message-ID: <20161229104604.GC24100@luwin.ulrar.net> > What can be done with Proxmox to get the same HA capabilities? Would it be better to have 2 independent clusters, one in each datacenter, or a unique cluster with two failure domains? > We've done that for a while early this year, works fine. We had about 10ms between the datacenters with no trouble at all. We stopped because glusterFS with 10ms between the bricks is horribly horribly slow, but that's just implementation details, you don't seem to be using this anyway. -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Digital signature URL: From mir at miras.org Thu Dec 29 11:50:24 2016 From: mir at miras.org (Michael Rasmussen) Date: Thu, 29 Dec 2016 11:50:24 +0100 Subject: [PVE-User] Cross datacenter HA In-Reply-To: References: <04ef353c-30ab-0426-8982-230c10550bc1@proxmox.com> <7c624781-b389-2d60-179b-c35da2e44495@binovo.es> Message-ID: <20161229115024.401ae790@sleipner.datanom.net> On Thu, 29 Dec 2016 11:23:49 +0100 Emmanuel Kasper wrote: > You could also set up HA beween your two DCs at the application level, > for instance using a bunch of stateless applications servers connecting > to a DB, where you want only need to setup HA at the DB level (Hint: > streaming replication if using PostgreSQL) A word of caution. If you use Postgres streaming replication in synchronous mode you must have at least 2 slaves since a write in this mode requires minimum 2 commits to finish so if you only have one slave then your cluster will hang if the slave or the master is down which effectively means you have a single point of failure. Adding an extra slave makes you resilient to one node down. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Most of the fear that spoils our life comes from attacking difficulties before we get to them. -- Dr. Frank Crane -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: OpenPGP digital signature URL: From jfranco at maila.net.br Thu Dec 29 12:41:47 2016 From: jfranco at maila.net.br (Jean R. Franco) Date: Thu, 29 Dec 2016 11:41:47 +0000 (UTC) Subject: [PVE-User] New node not showing in datacenter In-Reply-To: <1ff11790-3207-1650-8849-710b15fec873@proxmox.com> References: <968ccf82-341f-afed-21f5-f87cc2e63228@coppint.com> <1ff11790-3207-1650-8849-710b15fec873@proxmox.com> Message-ID: <2006641882.587496.1483011707870.JavaMail.zimbra@maila.net.br> Hi, I'm having the same problem, the commands below return all fine. It only affects the GUI on the old nodes, if I login on the new node it works. I rebooted the whole cluster but it didn't fix it. It only relates to the newest version: proxmox-ve: 4.4-76 (running kernel: 4.4.35-1-pve) pve-manager: 4.4-1 (running version: 4.4-1/eb2d6f1e) pve-kernel-4.4.35-1-pve: 4.4.35-76 pve-kernel-3.19.8-1-pve: 3.19.8-3 lvm2: 2.02.116-pve3 corosync-pve: 2.4.0-1 libqb0: 1.0-1 pve-cluster: 4.0-48 qemu-server: 4.0-101 pve-firmware: 1.1-10 libpve-common-perl: 4.0-83 libpve-access-control: 4.0-19 libpve-storage-perl: 4.0-70 pve-libspice-server1: 0.12.8-1 vncterm: 1.2-1 pve-docs: 4.4-1 pve-qemu-kvm: 2.7.0-9 pve-container: 1.0-88 pve-firewall: 2.0-33 pve-ha-manager: 1.0-38 ksm-control-daemon: 1.2-1 glusterfs-client: 3.5.2-2+deb8u2 lxc-pve: 2.0.6-2 lxcfs: 2.0.5-pve1 criu: 1.6.0-1 novnc-pve: 0.5-8 smartmontools: 6.5+svn4324-1~pve80 zfsutils: 0.6.5.8-pve13~bpo80 Thanks, ----- Mensagem original ----- De: "Emmanuel Kasper" Para: "PVE User List" Enviadas: Quinta-feira, 29 de dezembro de 2016 8:09:48 Assunto: Re: [PVE-User] New node not showing in datacenter On 12/29/2016 10:57 AM, Florent B wrote: > Hi everyone, > > Today I added a new node to my PVE cluster (every node up-to-date). > > When I connect to one of my 3 old nodes, the new node is not displayed > (at all, not in the list) in "datacenter" list in GUI. > > If I connect to new node GUI, all nodes are OK. > > "pvecm status" is OK everywhere, and "/etc/pve/.members" too. > > What could be the problem ? To rule out first a possible GUI problem: * does the following command executed on one of you "old" nodes return the whole list ? pvesh get cluster/config/nodes _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From elacunza at binovo.es Thu Dec 29 14:55:40 2016 From: elacunza at binovo.es (Eneko Lacunza) Date: Thu, 29 Dec 2016 14:55:40 +0100 Subject: [PVE-User] Cross datacenter HA In-Reply-To: <20161229104604.GC24100@luwin.ulrar.net> References: <04ef353c-30ab-0426-8982-230c10550bc1@proxmox.com> <7c624781-b389-2d60-179b-c35da2e44495@binovo.es> <20161229104604.GC24100@luwin.ulrar.net> Message-ID: <075323dd-32b2-25c3-62a6-afc938ef3e00@binovo.es> Thanks for the report :) El 29/12/16 a las 11:46, Kevin Lemonnier escribi?: >> What can be done with Proxmox to get the same HA capabilities? Would it be better to have 2 independent clusters, one in each datacenter, or a unique cluster with two failure domains? >> > We've done that for a while early this year, works fine. We had about 10ms between the datacenters with no trouble at all. > We stopped because glusterFS with 10ms between the bricks is horribly horribly slow, but that's just implementation details, > you don't seem to be using this anyway. > > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user -- Zuzendari Teknikoa / Director T?cnico Binovo IT Human Project, S.L. Telf. 943493611 943324914 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es