From yannick.ml at palanque.name Thu Nov 1 16:05:16 2018 From: yannick.ml at palanque.name (Yannick Palanque) Date: Thu, 1 Nov 2018 16:05:16 +0100 Subject: [PVE-User] View firewall rule with pvesh Message-ID: <20181101160516.0c875ece@kafka> Hello, I run PVE 5.2. I hope it isn't a silly question but I searched quite long. I want to view a firewall rule with the API and pvesh. With pvesh get /nodes/toto/qemu/107/firewall/rules/ I can view that I have 6 firewall rules. I can easily delete a rule with pvesh delete /nodes/toto/qemu/107/firewall/rules/7 I thought that I could view a particular rule but I can't have more information than: pvesh get /nodes/toto/qemu/107/firewall/rules/5 ??????????????? ? key ? value ? ??????????????? ? pos ? 5 ? ??????????????? The API viewer describes "Get single rule data" still. Is there a way to view the settings for a particular rule? And a remark: It's written on that pvesh "can be run interactively" (entering PVE shell) but it doesn't work ("ERROR: no command specified"). Yannick From dietmar at proxmox.com Thu Nov 1 18:26:47 2018 From: dietmar at proxmox.com (Dietmar Maurer) Date: Thu, 1 Nov 2018 18:26:47 +0100 (CET) Subject: [PVE-User] View firewall rule with pvesh In-Reply-To: <20181101160516.0c875ece@kafka> References: <20181101160516.0c875ece@kafka> Message-ID: <1037631417.5.1541093208517@webmail.proxmox.com> > I thought that I could view a particular rule but I can't have more > information than: > pvesh get /nodes/toto/qemu/107/firewall/rules/5 > ??????????????? > ? key ? value ? > ??????????????? > ? pos ? 5 ? > ??????????????? > > The API viewer describes "Get single rule data" still. > Is there a way to view the settings for a particular rule? The default formatter does not show all values (bug), but you can get the full information with: # pvesh get /nodes/toto/qemu/107/firewall/rules/5 --output-format json (you can use output format 'json', 'yaml', or 'json-pretty') From uwe.sauter.de at gmail.com Tue Nov 6 12:56:23 2018 From: uwe.sauter.de at gmail.com (Uwe Sauter) Date: Tue, 6 Nov 2018 12:56:23 +0100 Subject: [PVE-User] Quick question regarding node removal Message-ID: <03c23533-dbb1-a61e-57d0-41398a016157@gmail.com> Hi, in the documentation to pvecm [1] it says: At this point you must power off hp4 and make sure that it will not power on again (in the network) as it is. Important: As said above, it is critical to power off the node before removal, and make sure that it will never power on again (in the existing cluster network) as it is. If you power on the node as it is, your cluster will be screwed up and it could be difficult to restore a clean cluster state. Am I right to assume that this is due to the configuration on the node which is to be removed? If I reinstall that node I can reuse hostname and IP addresses? Thanks, Uwe [1] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_remove_a_cluster_node From t.lamprecht at proxmox.com Tue Nov 6 14:50:02 2018 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Tue, 6 Nov 2018 14:50:02 +0100 Subject: [PVE-User] Quick question regarding node removal In-Reply-To: <03c23533-dbb1-a61e-57d0-41398a016157@gmail.com> References: <03c23533-dbb1-a61e-57d0-41398a016157@gmail.com> Message-ID: <1d6c0e1a-3b91-da40-2c9f-9d4f7ee05712@proxmox.com> Hi, On 11/6/18 12:56 PM, Uwe Sauter wrote: > Hi, > > in the documentation to pvecm [1] it says: > > > At this point you must power off hp4 and make sure that it will not power on again (in the network) as it is. > Important: > As said above, it is critical to power off the node before removal, and make sure that it will never power on again (in the > existing cluster network) as it is. If you power on the node as it is, your cluster will be screwed up and it could be difficult > to restore a clean cluster state. > > > Am I right to assume that this is due to the configuration on the node which is to be removed? If I reinstall that node I can > reuse hostname and IP addresses? Yes, exactly. It's more for the reason that the removed node still thinks that it is part of the cluster and has still access to the cluster communication (through the '/etc/corosync/authkey'). So re-installing works fine. You could also separate it without re-installing, see: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_separate_node_without_reinstall Here I recommend that you test this first (if the target is anything production related) - e.g. in a virtual PVE cluster (PVE in VMs). cheers, Thomas From gilberto.nunes32 at gmail.com Tue Nov 6 15:03:00 2018 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Tue, 6 Nov 2018 12:03:00 -0200 Subject: [PVE-User] NIC invertion after reboot Message-ID: Hi there... I am using this in /etc/default/grub: net.ifnames=0 and biosdevname=0 in order to use eth0, instead eno1, and so on... Today the server was rebooted and after that occur a invertion of the NIC... So vmbr3 which supposed to be eth2, turns vmbr4, something like that, because I wasn't present in the place, when this event happened... My question is if this invertion ( or swaping ) are regard the grub line inserted... Thanks a lot --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 From a.antreich at proxmox.com Tue Nov 6 15:25:04 2018 From: a.antreich at proxmox.com (Alwin Antreich) Date: Tue, 6 Nov 2018 15:25:04 +0100 Subject: [PVE-User] NIC invertion after reboot In-Reply-To: References: Message-ID: <20181106142504.752xduo2wybfsgsz@dona.proxmox.com> Hi Gilberto, On Tue, Nov 06, 2018 at 12:03:00PM -0200, Gilberto Nunes wrote: > Hi there... > I am using this in /etc/default/grub: > net.ifnames=0 and biosdevname=0 > in order to use eth0, instead eno1, and so on... > Today the server was rebooted and after that occur a invertion of the > NIC... > So vmbr3 which supposed to be eth2, turns vmbr4, something like that, > because I wasn't present in the place, when this event happened... > My question is if this invertion ( or swaping ) are regard the grub line > inserted... > Thanks a lot Yes, exactly. This sets the old naming behaviour that is, naming the first device visible first. This is why there was the change to a more predictable naming scheme. -- Cheers, Alwin From luiscoralle at fi.uncoma.edu.ar Tue Nov 6 20:24:09 2018 From: luiscoralle at fi.uncoma.edu.ar (Luis G. Coralle) Date: Tue, 6 Nov 2018 16:24:09 -0300 Subject: [PVE-User] NIC invertion after reboot In-Reply-To: References: Message-ID: Hi, you can set if names in this file: root at pve1:~# cat /etc/udev/rules.d/70-persistent-net.rules El mar., 6 de nov. de 2018 a la(s) 11:04, Gilberto Nunes ( gilberto.nunes32 at gmail.com) escribi?: > Hi there... > I am using this in /etc/default/grub: > net.ifnames=0 and biosdevname=0 > in order to use eth0, instead eno1, and so on... > Today the server was rebooted and after that occur a invertion of the > NIC... > So vmbr3 which supposed to be eth2, turns vmbr4, something like that, > because I wasn't present in the place, when this event happened... > My question is if this invertion ( or swaping ) are regard the grub line > inserted... > Thanks a lot > > --- > Gilberto Nunes Ferreira > > (47) 3025-5907 > (47) 99676-7530 - Whatsapp / Telegram > > Skype: gilberto.nunes36 > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > -- Luis G. Coralle Secretar?a de TIC Facultad de Inform?tica Universidad Nacional del Comahue (+54) 299-4490300 Int 647 From sir_Misiek1 at o2.pl Wed Nov 7 15:41:45 2018 From: sir_Misiek1 at o2.pl (lord_Niedzwiedz) Date: Wed, 7 Nov 2018 15:41:45 +0100 Subject: [PVE-User] NVMe - RAID Z - Proxmox In-Reply-To: <786a619c-d1d7-498d-2b2b-6ba659aaffd6@o2.pl> References: <3b730159-aa65-8177-474b-9d711b5eb906@o2.pl> <20181029152931.zmqmqkmrdri2m625@dona.proxmox.com> <786a619c-d1d7-498d-2b2b-6ba659aaffd6@o2.pl> Message-ID: ??? ??? Good morning, After a longer (unfortunately) fight, I was able to install on your Proxmox server on RAIDZ from 1 to 3. To do this I had to change some settings in the BIOS (the system did not want to start without the "legacy" option). https://help.komandor.pl/Wymiana/iKVM_capture.jpg https://help.komandor.pl/Wymiana/iKVM_capture1.jpg https://help.komandor.pl/Wymiana/iKVM_capture2.jpg And manually edit the MBR on each of the disks. root at gandalf8:~# sfdisk -d /dev/nvme0n1 ??? ??? (drives from nvme0n1 to nvme5n1) label: gpt label-id: AD03123E-3D5D-4FD2-A7F3-9B6247F88CEA device: / dev / nvme0n1 unit: sectors first-lba: 34??? ??? (this must be necesary set) last-lba: 976773134 /dev/nvme0n1p1: start = 34, size = 2014, type = 21686148-6449-6E6F-744E-656564454649, uuid = CA6B9D71-BFEC-4EC2-8DB5-BB79B426D20D cfdisk - the first sector must begin: ??? ??? Start 34 End 2047 Example for disk 500GB. 10007K??? ?? ?? ????? 2014S??? ??? ??? ??? ??? ????? BIOS boot 465.8G??? ??? ??? ??? 976754702S??? ??? ??? ??? Solaris / usr & Apple ZFS 8M ??? ??? ??? ??? ??? ? 16385S??? ??? ??? ??? ??? ??? Solaris reserved 1 And so for 5 drives. Without this, the proxmox crashed at the installation every time. W dniu 31.10.2018 o?15:24, lord_Niedzwiedz pisze: > I upgrade bios/firmware server motherboard Supermicro. > I set everything in the BIOS to "legacy" mode. > Not only in the boot menu. > Supermicro -> Bios -> Advence -> PCIe/PCI/Pnp Configuration > (everything on legacy). > > Weird, because in my PC (processor Ryzen), everything is on UEFI > (windows 10 and Fedora works perfecly). > > Its work now. > But.... > > 1) On sysrescue-cd one nvme disk speed = 2600MB/s. > RAID 5/6? = max 3600MB/s (on 4-5 drives). > Why not N*2600 - 2600 MB/s? ??!! > > *2) *I create RAID 1??? or??? RAID 10.??? It works.* > **But Proxmox is displayed ***a message * with RAID Z1-2.** > **https://help.komandor.pl/Wymiana/iKVM_capture.jpg** > * > 3) I make install Proxmox on one m2 disk (lvm) - boot system. > I have 5 disks. > I can of course install proxmox on 1 disk (or on raid 1, two disks). > The question is how to add other disks? > Is it worth creating a RAID Z from the other 3-4 disks? > What configuration would you recommend? > >>>>> Im trying to install Proxmox on 4 NVMe drives. >>>>> One on the motherboard, two on the PCIe. >>>>> >>>>> Proxmox see everything at the installation. >>>>> I give the option zfs (RAIDZ-1). >>>>> >>>>> And I get a mistake error at the end. >>>>> "unable to create zfs root pool" >>>> GRUB is not yet working with ZFS on EFI. Try to switch to legacy >>>> boot in >>>> BIOS if possible or use LVM for the installation. >>>> >>>>> Attached pictures (1-5) .jpg. >>>>> https://help.komandor.pl/Wymiana/1.jpg >>>>> https://help.komandor.pl/Wymiana/2.jpg >>>>> https://help.komandor.pl/Wymiana/3.jpg >>>>> https://help.komandor.pl/Wymiana/4.jpg >>>>> https://help.komandor.pl/Wymiana/5.jpg >>>>> >> > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From sir_Misiek1 at o2.pl Wed Nov 7 15:47:51 2018 From: sir_Misiek1 at o2.pl (lord_Niedzwiedz) Date: Wed, 7 Nov 2018 15:47:51 +0100 Subject: [PVE-User] NVMe - RAID faster than 4GB/s In-Reply-To: References: <3b730159-aa65-8177-474b-9d711b5eb906@o2.pl> <20181029152931.zmqmqkmrdri2m625@dona.proxmox.com> <786a619c-d1d7-498d-2b2b-6ba659aaffd6@o2.pl> Message-ID: <223c162c-d7ba-5a3a-12fc-12a2c3dcd82e@o2.pl> ??? ??? How can I do an AMD processor, implements RAID faster than 4GB/s? ? Hardware ? Software ? One disk work 3GB/s. I heave 5. From sir_Misiek1 at o2.pl Wed Nov 7 15:51:37 2018 From: sir_Misiek1 at o2.pl (lord_Niedzwiedz) Date: Wed, 7 Nov 2018 15:51:37 +0100 Subject: [PVE-User] hdparm and fio in ProxMox raidZ In-Reply-To: <223c162c-d7ba-5a3a-12fc-12a2c3dcd82e@o2.pl> References: <3b730159-aa65-8177-474b-9d711b5eb906@o2.pl> <20181029152931.zmqmqkmrdri2m625@dona.proxmox.com> <786a619c-d1d7-498d-2b2b-6ba659aaffd6@o2.pl> <223c162c-d7ba-5a3a-12fc-12a2c3dcd82e@o2.pl> Message-ID: ??? I heave RAIDZ partition in Proxmox. How can I test the speed of this? ??!! I lvm i do: hdparm -tT /dev/nvme0n1 hdparm -tT /dev/mapper/pve-root fio --filename=/dev/mapper/pve-root --direct=1 --rw=read --bs=1m --size=2G --numjobs=200 --runtime=60 --group_reporting --name=file1 But in RAIDZ i dont heave /dev/mapper/pve-root. From uwe.sauter.de at gmail.com Wed Nov 7 21:01:09 2018 From: uwe.sauter.de at gmail.com (Uwe Sauter) Date: Wed, 7 Nov 2018 21:01:09 +0100 Subject: [PVE-User] Request for backport of Ceph bugfix from 12.2.9 Message-ID: Hi, I'm trying to manually migrate VM images with snapshots from pool "vms" to pool "vdisks" but it fails: # rbd export --export-format 2 vms/vm-102-disk-2 - | rbd import --export-format 2 - vdisks/vm-102-disk-2 rbd: import header failed. rbd: import failed: (22) Invalid argument Exporting image: 0% complete...failed. rbd: export error: (32) Broken pipe This is a bug in 12.2.8 [1] and has been fixed in this PR [2]. Would it be possible to get this backported as it is not recommended to upgrade to 12.2.9? Regards, Uwe [1] http://tracker.ceph.com/issues/34536 [2] https://github.com/ceph/ceph/pull/23835 From raul.alonso at tpartner.net Thu Nov 8 10:31:20 2018 From: raul.alonso at tpartner.net (Raul Alonso) Date: Thu, 8 Nov 2018 10:31:20 +0100 Subject: [PVE-User] IO delay 3-5% VMs freezed Message-ID: <0f1801d47745$ced632e0$6c8298a0$@tpartner.net> Hello, I have a 3 nodes cluster PVE versi?n: pve-manager/5.1-41/0b958203 (running kernel: 4.13.13-2-pve) with ceph. Each node 2x 300GB SAS (Raid 1) for system and 3 x 600GB SAS 15krpm for ceph, the ceph has been created on a 10GB network card. OSD config are: osd journal size = 5120 osd pool default min size = 2 osd pool default size = 3 We are observing sometime the IO delay goes up to 3-5%, in these cases the VMs are blocked for a few seconds. Any idea why these IO rises and VM blocking?? Regards, Raul. From a.antreich at proxmox.com Thu Nov 8 13:43:24 2018 From: a.antreich at proxmox.com (Alwin Antreich) Date: Thu, 8 Nov 2018 13:43:24 +0100 Subject: [PVE-User] Request for backport of Ceph bugfix from 12.2.9 In-Reply-To: References: Message-ID: <20181108124324.4pnvhbngaqcsuaks@dona.proxmox.com> Hello Uwe, On Wed, Nov 07, 2018 at 09:01:09PM +0100, Uwe Sauter wrote: > Hi, > > I'm trying to manually migrate VM images with snapshots from pool "vms" to pool "vdisks" but it fails: > > # rbd export --export-format 2 vms/vm-102-disk-2 - | rbd import --export-format 2 - vdisks/vm-102-disk-2 > rbd: import header failed. > rbd: import failed: (22) Invalid argument > Exporting image: 0% complete...failed. > rbd: export error: (32) Broken pipe > > This is a bug in 12.2.8 [1] and has been fixed in this PR [2]. > > Would it be possible to get this backported as it is not recommended to upgrade to 12.2.9? Possible yes, but it looks like that the Ceph version 12.2.10 may be soon released, including this fix. https://www.spinics.net/lists/ceph-users/msg49112.html For now I would wait with backporting, as we would need to test a backported 12.2.8 as well as we will with a new 12.2.10. > > > Regards, > > Uwe > > > [1] http://tracker.ceph.com/issues/34536 > [2] https://github.com/ceph/ceph/pull/23835 -- Cheers, Alwin From t.lamprecht at proxmox.com Thu Nov 8 16:38:34 2018 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Thu, 8 Nov 2018 16:38:34 +0100 Subject: [PVE-User] Request for backport of Ceph bugfix from 12.2.9 In-Reply-To: <20181108124324.4pnvhbngaqcsuaks@dona.proxmox.com> References: <20181108124324.4pnvhbngaqcsuaks@dona.proxmox.com> Message-ID: <670d2729-695c-d90f-2e06-7568f0b2b7d7@proxmox.com> On 11/8/18 1:43 PM, Alwin Antreich wrote: > On Wed, Nov 07, 2018 at 09:01:09PM +0100, Uwe Sauter wrote: >> This is a bug in 12.2.8 [1] and has been fixed in this PR [2]. >> >> Would it be possible to get this backported as it is not recommended to upgrade to 12.2.9? > Possible yes, but it looks like that the Ceph version 12.2.10 may be > soon released, including this fix. > https://www.spinics.net/lists/ceph-users/msg49112.html > > For now I would wait with backporting, as we would need to test a > backported 12.2.8 as well as we will with a new 12.2.10. This is a minimal proposed change which looks just right, so much testing may not be needed, i.e., just the change part once - at least if it applies cleanly. But as a ceph rollout is quite a bit of work besides that I agree with Alwin that it probably makes sense to wait at the soon arriving 12.2.10 cheers, Thomas From uwe.sauter.de at gmail.com Thu Nov 8 17:32:21 2018 From: uwe.sauter.de at gmail.com (Uwe Sauter) Date: Thu, 8 Nov 2018 17:32:21 +0100 Subject: [PVE-User] Request for backport of Ceph bugfix from 12.2.9 In-Reply-To: <670d2729-695c-d90f-2e06-7568f0b2b7d7@proxmox.com> References: <20181108124324.4pnvhbngaqcsuaks@dona.proxmox.com> <670d2729-695c-d90f-2e06-7568f0b2b7d7@proxmox.com> Message-ID: Hi all, thanks for looking into this. With help from the ceph-users list I was able to migrate my images. So no need anymore. Best, Uwe Am 08.11.18 um 16:38 schrieb Thomas Lamprecht: > On 11/8/18 1:43 PM, Alwin Antreich wrote: >> On Wed, Nov 07, 2018 at 09:01:09PM +0100, Uwe Sauter wrote: >>> This is a bug in 12.2.8 [1] and has been fixed in this PR [2]. >>> >>> Would it be possible to get this backported as it is not recommended to upgrade to 12.2.9? >> Possible yes, but it looks like that the Ceph version 12.2.10 may be >> soon released, including this fix. >> https://www.spinics.net/lists/ceph-users/msg49112.html >> >> For now I would wait with backporting, as we would need to test a >> backported 12.2.8 as well as we will with a new 12.2.10. > > This is a minimal proposed change which looks just right, so much testing > may not be needed, i.e., just the change part once - at least if it > applies cleanly. > > But as a ceph rollout is quite a bit of work besides that I agree with > Alwin that it probably makes sense to wait at the soon arriving 12.2.10 > > cheers, > Thomas > From gilberto.nunes32 at gmail.com Thu Nov 8 18:22:48 2018 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Thu, 8 Nov 2018 15:22:48 -0200 Subject: [PVE-User] LVM issue Message-ID: Hi there I have some problem with LVM here. The disk on the server has 1.6 TB, but I see just 16GB ? proxmox01:~# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert data pve twi-a-tz-- 1.49t 0.00 0.04 root pve -wi-ao---- 96.00g swap pve -wi-ao---- 8.00g proxmox01:~# pvs PV VG Fmt Attr PSize PFree /dev/sda3 pve lvm2 a-- 1.63t 16.00g proxmox01:~# vgs VG #PV #LV #SN Attr VSize VFree pve 1 3 0 wz--n- 1.63t 16.00g proxmox01:~# df -h Filesystem Size Used Avail Use% Mounted on udev 95G 0 95G 0% /dev tmpfs 19G 53M 19G 1% /run /dev/mapper/pve-root 94G 1.9G 88G 3% / tmpfs 95G 66M 95G 1% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 95G 0 95G 0% /sys/fs/cgroup /dev/fuse 30M 48K 30M 1% /etc/pve tmpfs 19G 0 19G 0% /run/user/0 proxmox01:~#cat /etc/pve/storage.cfg dir: local path /var/lib/vz content images,iso,rootdir,vztmpl maxfiles 0 Whats is wrong in this case? Thanks --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 From uwe.sauter.de at gmail.com Thu Nov 8 20:02:01 2018 From: uwe.sauter.de at gmail.com (Uwe Sauter) Date: Thu, 8 Nov 2018 20:02:01 +0100 Subject: [PVE-User] LVM issue In-Reply-To: References: Message-ID: Hi, first problem is that you seem to be using some client that replaces verbose text with links to facebook. Could you please resend you mail using a plain text message (no html). This should also take care of the formating (currently no monospace font which make it much harder do find the right column in the output). Second: I think you refer to this lines? > PV VG Fmt Attr PSize PFree > /dev/sda3 pve lvm2 a-- 1.63t 16.00g This just says that the physical volume has 16G unallocated space left. The rest is already taken by the three LVs. If I got you wrong please give more details what you think the issue is. Regards, Uwe Am 08.11.18 um 18:22 schrieb Gilberto Nunes: > Hi there > > I have some problem with LVM here. > The disk on the server has 1.6 TB, but I see just 16GB ? > > proxmox01:~# lvs > LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert > data pve twi-a-tz-- 1.49t 0.00 0.04 > root pve -wi-ao---- 96.00g > swap pve -wi-ao---- 8.00g > proxmox01:~# pvs > PV VG Fmt Attr PSize PFree > /dev/sda3 pve lvm2 a-- 1.63t 16.00g > proxmox01:~# vgs > VG #PV #LV > #SN > Attr VSize VFree > pve 1 3 0 wz--n- 1.63t 16.00g > proxmox01:~# df -h > Filesystem Size Used Avail Use% Mounted on > udev 95G 0 95G 0% /dev > tmpfs 19G 53M 19G 1% /run > /dev/mapper/pve-root 94G 1.9G 88G 3% / > tmpfs 95G 66M 95G 1% /dev/shm > tmpfs 5.0M 0 5.0M 0% /run/lock > tmpfs 95G 0 95G 0% /sys/fs/cgroup > /dev/fuse 30M 48K 30M 1% /etc/pve > tmpfs 19G 0 19G 0% /run/user/0 > proxmox01:~#cat > /etc/pve/storage.cfg > dir: local > path /var/lib/vz > content images,iso,rootdir,vztmpl > maxfiles 0 > > > Whats is wrong in this case? > > > Thanks > --- > Gilberto Nunes Ferreira > > (47) 3025-5907 > (47) 99676-7530 - Whatsapp / Telegram > > Skype: gilberto.nunes36 > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From mark at tuxis.nl Mon Nov 12 14:45:41 2018 From: mark at tuxis.nl (Mark Schouten) Date: Mon, 12 Nov 2018 14:45:41 +0100 Subject: [PVE-User] Risks for using writeback on Ceph RBD Message-ID: Hi, We've noticed some performance wins on using writeback for Ceph RBD devices, but I'm wondering how we should project risks on using writeback. Writeback isn't very unsafe, but what are the risks in case of powerloss of a host? Thanks, -- Mark Schouten | Tuxis Internet Engineering KvK: 61527076 | http://www.tuxis.nl/ T: 0318 200208 | info at tuxis.nl From aderumier at odiso.com Tue Nov 13 06:44:03 2018 From: aderumier at odiso.com (Alexandre DERUMIER) Date: Tue, 13 Nov 2018 06:44:03 +0100 (CET) Subject: [PVE-User] Risks for using writeback on Ceph RBD In-Reply-To: References: Message-ID: <126320698.586454.1542087843702.JavaMail.zimbra@oxygem.tv> Like all writeback, you'll lost datas in memory before the fsync of the filesystem, but not corruption of the filesystem. (writeback = rbd_cache=true) note that rbd writeback help only for sequential of small writes (aggregate in 1big transaction to send to ceph). Also, read latency is bigger when writeback is enable. ----- Mail original ----- De: "Mark Schouten" ?: "proxmoxve" Envoy?: Lundi 12 Novembre 2018 14:45:41 Objet: [PVE-User] Risks for using writeback on Ceph RBD Hi, We've noticed some performance wins on using writeback for Ceph RBD devices, but I'm wondering how we should project risks on using writeback. Writeback isn't very unsafe, but what are the risks in case of powerloss of a host? Thanks, -- Mark Schouten | Tuxis Internet Engineering KvK: 61527076 | http://www.tuxis.nl/ T: 0318 200208 | info at tuxis.nl _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From kauffman at cs.uchicago.edu Tue Nov 13 18:54:57 2018 From: kauffman at cs.uchicago.edu (Phil Kauffman) Date: Tue, 13 Nov 2018 11:54:57 -0600 Subject: [PVE-User] zfs inappropriate ioctl for device Message-ID: <1dccbf3a-d058-3385-ecdb-e3309df60fa4@cs.uchicago.edu> I use my proxmox node at home to host VM's but also for my main ZFS array. I use a tool called 'znapzend' to mirror various datasets to other servers. Recently I noticed the following issue while using the 'znapzendzetup' command. 'znapzendzetup' will show you the backup plans you have set for any dataset. I was hoping someone could help me troubleshoot this issue. It feels very much like this one (https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1763067), but I'm not sure. # znapzendzetup list # ... it just hangs here OK. so strace it... # strace znapzendzetup list Here are the last few lines of the above strace read(6, "", 4) = 0 close(6) = 0 ioctl(4, TCGETS, 0x7ffef62d8970) = -1 ENOTTY (Inappropriate ioctl for device) lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) fcntl(4, F_SETFD, FD_CLOEXEC) = 0 read(4, "org.znapzend:dst_a_plan\t60minute"..., 8192) = 577 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=28633, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- rt_sigreturn({mask=[]}) = 0 read(4, "", 8192) = 0 close(4) = 0 wait4(28633, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 28633 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3585, ...}) = 0 pipe([4, 5]) = 0 pipe([6, 7]) = 0 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb932193590) = 28634 close(7) = 0 close(5) = 0 read(6, "", 4) = 0 close(6) = 0 ioctl(4, TCGETS, 0x7ffef62d8970) = -1 ENOTTY (Inappropriate ioctl for device) lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) fcntl(4, F_SETFD, FD_CLOEXEC) = 0 read(4, "tank/audio\n", 8192) = 11 read(4, "", 8192) = 0 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=28634, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- rt_sigreturn({mask=[]}) = 0 close(4) = 0 wait4(28634, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 28634 pipe([4, 5]) = 0 pipe([6, 7]) = 0 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb932193590) = 28635 close(7) = 0 close(5) = 0 read(6, "", 4) = 0 close(6) = 0 ioctl(4, TCGETS, 0x7ffef62d8970) = -1 ENOTTY (Inappropriate ioctl for device) lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) fcntl(4, F_SETFD, FD_CLOEXEC) = 0 read(4, # uname -a Linux luna 4.15.18-8-pve #1 SMP PVE 4.15.18-28 (Tue, 30 Oct 2018 14:27:50 +0100) x86_64 GNU/Linux # dmesg| grep ZFS [ 7.059616] ZFS: Loaded module v0.7.11-1, ZFS pool version 5000, ZFS filesystem version 5 # zpool status pool: proxmox state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(5) for details. scan: scrub repaired 0B in 0h54m with 0 errors on Sun Nov 11 01:18:13 2018 config: NAME STATE READ WRITE CKSUM proxmox ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ONLINE 0 0 0 ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scan: scrub repaired 0B in 17h32m with 0 errors on Sun Nov 11 17:57:00 2018 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 ONLINE 0 0 0 ONLINE 0 0 0 ONLINE 0 0 0 ONLINE 0 0 0 Cheers, Phil From gaio at sv.lnf.it Wed Nov 14 18:40:54 2018 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Wed, 14 Nov 2018 18:40:54 +0100 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: <20181023085858.GG4474@sv.lnf.it> References: <20181023085858.GG4474@sv.lnf.it> Message-ID: <20181114174054.GA18133@sv.lnf.it> I come back on this: > In a PVE 4.4 cluster i continue to get FS errors like: > Oct 22 20:51:10 vdmsv1 kernel: [268329.890910] EXT4-fs error (device sda6): ext4_mb_generate_buddy:758: group 932, block bitmap and bg descriptor inconsistent: 30722 vs 32768 free clusters > and > Oct 23 09:43:16 vdmsv1 kernel: [314655.032561] EXT4-fs error (device sdb1): ext4_validate_block_bitmap:384: comm kworker/u8:2: bg 12: bad block bitmap checksum > Oct 23 09:43:16 vdmsv1 kernel: [314655.034265] EXT4-fs (sdb1): Delayed block allocation failed for inode 2632026 at logical offset 2048 with max blocks 1640 with error 74 > Oct 23 09:43:16 vdmsv1 kernel: [314655.034335] EXT4-fs (sdb1): This should not happen!! Data will be lost > Host run 4.4.134-1-pve kernel, and guest is a debian stretch > (4.9.0-8-amd64), and in the same cluster, but also in other clusters, i > have other stretch VMs running in the same host kernel, without > troubles. > Googling around lead me to old jessie bugs (kernels 3.16): > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672 > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=818502#22 Seems the bug is really this. I've increased the RAM of the problematic VM, and FS corruption deasppeared. Effectively, all other VMs have plently of free ram, this was a bit full. I know that PVE 4.4 is EOL, but still i'm seeking feedback. For example, is a 'host' kernel bug, or a 'guest' kernel bug? Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From luiscoralle at fi.uncoma.edu.ar Wed Nov 14 20:03:45 2018 From: luiscoralle at fi.uncoma.edu.ar (Luis G. Coralle) Date: Wed, 14 Nov 2018 16:03:45 -0300 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: <20181114174054.GA18133@sv.lnf.it> References: <20181023085858.GG4474@sv.lnf.it> <20181114174054.GA18133@sv.lnf.it> Message-ID: Hi, I have a lot of VM ( debian 8 and debian 9 ) with 512 MB of RAM on PVE 4.4-24 version and have not problem. Have you enough free space on the storage? How much ram memory do you have on PVE? El mi?., 14 de nov. de 2018 a la(s) 14:41, Marco Gaiarin (gaio at sv.lnf.it) escribi?: > > I come back on this: > > > In a PVE 4.4 cluster i continue to get FS errors like: > > Oct 22 20:51:10 vdmsv1 kernel: [268329.890910] EXT4-fs error (device > sda6): ext4_mb_generate_buddy:758: group 932, block bitmap and bg > descriptor inconsistent: 30722 vs 32768 free clusters > > and > > Oct 23 09:43:16 vdmsv1 kernel: [314655.032561] EXT4-fs error (device > sdb1): ext4_validate_block_bitmap:384: comm kworker/u8:2: bg 12: bad block > bitmap checksum > > Oct 23 09:43:16 vdmsv1 kernel: [314655.034265] EXT4-fs (sdb1): Delayed > block allocation failed for inode 2632026 at logical offset 2048 with max > blocks 1640 with error 74 > > Oct 23 09:43:16 vdmsv1 kernel: [314655.034335] EXT4-fs (sdb1): This > should not happen!! Data will be lost > > Host run 4.4.134-1-pve kernel, and guest is a debian stretch > > (4.9.0-8-amd64), and in the same cluster, but also in other clusters, i > > have other stretch VMs running in the same host kernel, without > > troubles. > > > Googling around lead me to old jessie bugs (kernels 3.16): > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672 > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=818502#22 > > Seems the bug is really this. I've increased the RAM of the problematic > VM, and FS corruption deasppeared. > > > Effectively, all other VMs have plently of free ram, this was a bit > full. > > > I know that PVE 4.4 is EOL, but still i'm seeking feedback. For > example, is a 'host' kernel bug, or a 'guest' kernel bug? > > > Thanks. > > -- > dott. Marco Gaiarin GNUPG Key ID: > 240A3D66 > Associazione ``La Nostra Famiglia'' > http://www.lanostrafamiglia.it/ > Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento > (PN) > marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f > +39-0434-842797 > > Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! > http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 > (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > -- Luis G. Coralle Secretar?a de TIC Facultad de Inform?tica Universidad Nacional del Comahue (+54) 299-4490300 Int 647 From gaio at sv.lnf.it Thu Nov 15 11:56:42 2018 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Thu, 15 Nov 2018 11:56:42 +0100 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: References: <20181023085858.GG4474@sv.lnf.it> <20181114174054.GA18133@sv.lnf.it> Message-ID: <20181115105642.GD2709@sv.lnf.it> Mandi! Luis G. Coralle In chel di` si favelave... > Hi, I have a lot of VM ( debian 8 and debian 9 ) with 512 MB of RAM on PVE > 4.4-24 version and have not problem. ...i have a second cluster, but with ceph storage, not iSCSI/SAN, with simlar VM, but no troubles at all. True. > Have you enough free space on the storage? Now, yes. As just stated, i've had a temporary fill of SAN space (something on my trim tasks, or on the SAN, goes wrong) but now all are back as normal. > How much ram memory do you have on PVE? Nodes have 64GB of RAM, 52% full. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From mark at tuxis.nl Thu Nov 15 12:13:06 2018 From: mark at tuxis.nl (Mark Schouten) Date: Thu, 15 Nov 2018 12:13:06 +0100 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: <20181115105642.GD2709@sv.lnf.it> References: <20181023085858.GG4474@sv.lnf.it> <20181114174054.GA18133@sv.lnf.it> <20181115105642.GD2709@sv.lnf.it> Message-ID: Obviously, a misbehaving SAN is a much better explanation for filesystemcorruption.. Mark From: Marco Gaiarin (gaio at sv.lnf.it) Date: 15-11-2018 11:57 To: pve-user at pve.proxmox.com Subject: Re: [PVE-User] Filesystem corruption on a VM? Mandi! Luis G. Coralle ?In chel di` si favelave... > Hi, I have a lot of VM ( debian 8 and debian 9 ) with 512 MB of RAM on PVE > 4.4-24 version and have not problem. ...i have a second cluster, but with ceph storage, not iSCSI/SAN, with simlar VM, but no troubles at all. True. > Have you enough free space on the storage? Now, yes. As just stated, i've had a temporary fill of SAN space (something on my trim tasks, or on the SAN, goes wrong) but now all are back as normal. > How much ram memory do you have on PVE? Nodes have 64GB of RAM, 52% full. -- dott. Marco Gaiarin ? ? ? ? ? ? ? ? ? ? ? ? ? ?GNUPG Key ID: 240A3D66 ?Associazione ``La Nostra Famiglia'' ? ? ? ? ?http://www.lanostrafamiglia.it/ ?Polo FVG ? - ? Via della Bont?, 7 - 33078 ? - ? San Vito al Tagliamento (PN) ?marco.gaiarin(at)lanostrafamiglia.it ? t +39-0434-842711 ? f +39-0434-842797 ? ? ? ? ? Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! ? ? ?http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 ? ? ?(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user -- Mark Schouten ?| Tuxis Internet Engineering KvK: 61527076 ?|?http://www.tuxis.nl/ T: 0318 200208 |?info at tuxis.nl ? From gaio at sv.lnf.it Thu Nov 15 12:35:50 2018 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Thu, 15 Nov 2018 12:35:50 +0100 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: References: <20181023085858.GG4474@sv.lnf.it> <20181114174054.GA18133@sv.lnf.it> <20181115105642.GD2709@sv.lnf.it> Message-ID: <20181115113550.GE2709@sv.lnf.it> Mandi! Mark Schouten In chel di` si favelave... > Obviously, a misbehaving SAN is a much better explanation for filesystemcorruption.. Sure, but: a) errors start a bit befose the SAN trouble b) this is the only VM/LXC that have troubles c) i've tried to unmount, reformat and remount a disk/partition (was the squid spool) and errors come back again. It is really strange... -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From daniel at firewall-services.com Thu Nov 15 12:38:09 2018 From: daniel at firewall-services.com (Daniel Berteaud) Date: Thu, 15 Nov 2018 12:38:09 +0100 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: <20181115113550.GE2709@sv.lnf.it> References: <20181023085858.GG4474@sv.lnf.it> <20181114174054.GA18133@sv.lnf.it> <20181115105642.GD2709@sv.lnf.it> <20181115113550.GE2709@sv.lnf.it> Message-ID: <231749ab-a148-e10a-f6b6-6f9fc1c92e6d@firewall-services.com> Le 15/11/2018 ? 12:35, Marco Gaiarin a ?crit?: > Mandi! Mark Schouten > In chel di` si favelave... > >> Obviously, a misbehaving SAN is a much better explanation for filesystemcorruption.. > Sure, but: > > a) errors start a bit befose the SAN trouble > > b) this is the only VM/LXC that have troubles > > c) i've tried to unmount, reformat and remount a disk/partition (was > the squid spool) and errors come back again. > > > It is really strange... Not that strange. It's expected to have FS corruption if they resides on a thin provisionned volume, which itself has no space left. Lucky you only had one FS corrupted. ++ -- Logo FWS *Daniel Berteaud* FIREWALL-SERVICES SAS. Soci?t? de Services en Logiciels Libres Tel : 05 56 64 15 32 Matrix: @dani:fws.fr /www.firewall-services.com/ From gaio at sv.lnf.it Thu Nov 15 12:49:39 2018 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Thu, 15 Nov 2018 12:49:39 +0100 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: <231749ab-a148-e10a-f6b6-6f9fc1c92e6d@firewall-services.com> References: <20181023085858.GG4474@sv.lnf.it> <20181114174054.GA18133@sv.lnf.it> <20181115105642.GD2709@sv.lnf.it> <20181115113550.GE2709@sv.lnf.it> <231749ab-a148-e10a-f6b6-6f9fc1c92e6d@firewall-services.com> Message-ID: <20181115114939.GG2709@sv.lnf.it> Mandi! Daniel Berteaud In chel di` si favelave... > Not that strange. It's expected to have FS corruption if they resides on > a thin provisionned volume, which itself has no space left. Lucky you > only had one FS corrupted. ...but currently space is OK (really: space on VM images pool was never on shortage, was the 'DATA' pool...), and i've many time done 'e2fsck' on filesystem (as stated, i've also reformatted one...) and errors pop up back again... -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From daniel at firewall-services.com Thu Nov 15 12:58:16 2018 From: daniel at firewall-services.com (Daniel Berteaud) Date: Thu, 15 Nov 2018 12:58:16 +0100 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: <20181115114939.GG2709@sv.lnf.it> References: <20181023085858.GG4474@sv.lnf.it> <20181114174054.GA18133@sv.lnf.it> <20181115105642.GD2709@sv.lnf.it> <20181115113550.GE2709@sv.lnf.it> <231749ab-a148-e10a-f6b6-6f9fc1c92e6d@firewall-services.com> <20181115114939.GG2709@sv.lnf.it> Message-ID: Le 15/11/2018 ? 12:49, Marco Gaiarin a ?crit?: > Mandi! Daniel Berteaud > In chel di` si favelave... > >> Not that strange. It's expected to have FS corruption if they resides on >> a thin provisionned volume, which itself has no space left. Lucky you >> only had one FS corrupted. > ...but currently space is OK (really: space on VM images pool was never on > shortage, was the 'DATA' pool...), and i've many time done 'e2fsck' on > filesystem (as stated, i've also reformatted one...) and errors pop up back > again... If at one time, the storage pool went out of space, then the FS is most likely corrupted. Fixing the space issue will prevent further corruption, but won't fix the already corrupted FS. You said > As just stated, i've had a temporary fill of SAN space I don't know what this SAN hosted. Anyway, If errors come back after reformating the volume, then you still have something not fixed. Please tell us how are things configured, what kind of storage it's using, which layers are involved etc... (thin prov, iSCSI, LVM on top etc...) -- Logo FWS *Daniel Berteaud* FIREWALL-SERVICES SAS. Soci?t? de Services en Logiciels Libres Tel : 05 56 64 15 32 Matrix: @dani:fws.fr /www.firewall-services.com/ From gbr at majentis.com Thu Nov 15 13:10:03 2018 From: gbr at majentis.com (Gerald Brandt) Date: Thu, 15 Nov 2018 06:10:03 -0600 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: <20181023085858.GG4474@sv.lnf.it> References: <20181023085858.GG4474@sv.lnf.it> Message-ID: On 2018-10-23 3:58 a.m., Marco Gaiarin wrote: > In a PVE 4.4 cluster i continue to get FS errors like: > > Oct 22 20:51:10 vdmsv1 kernel: [268329.890910] EXT4-fs error (device sda6): ext4_mb_generate_buddy:758: group 932, block bitmap and bg descriptor inconsistent: 30722 vs 32768 free clusters > > and > > Oct 23 09:43:16 vdmsv1 kernel: [314655.032561] EXT4-fs error (device sdb1): ext4_validate_block_bitmap:384: comm kworker/u8:2: bg 12: bad block bitmap checksum > Oct 23 09:43:16 vdmsv1 kernel: [314655.034265] EXT4-fs (sdb1): Delayed block allocation failed for inode 2632026 at logical offset 2048 with max blocks 1640 with error 74 > Oct 23 09:43:16 vdmsv1 kernel: [314655.034335] EXT4-fs (sdb1): This should not happen!! Data will be lost > > Host run 4.4.134-1-pve kernel, and guest is a debian stretch > (4.9.0-8-amd64), and in the same cluster, but also in other clusters, i > have other stretch VMs running in the same host kernel, without > troubles. > > Googling around lead me to old jessie bugs (kernels 3.16): > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672 > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=818502#22 > > or things i make it hard to correlate with: > > https://access.redhat.com/solutions/155873 > > > Someone have some hints?! Thanks. > I've only had filesystem corruption when using XFS in a VM. Gerald From gaio at sv.lnf.it Thu Nov 15 14:24:20 2018 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Thu, 15 Nov 2018 14:24:20 +0100 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: References: <20181023085858.GG4474@sv.lnf.it> <20181114174054.GA18133@sv.lnf.it> <20181115105642.GD2709@sv.lnf.it> <20181115113550.GE2709@sv.lnf.it> <231749ab-a148-e10a-f6b6-6f9fc1c92e6d@firewall-services.com> <20181115114939.GG2709@sv.lnf.it> Message-ID: <20181115132420.GI2709@sv.lnf.it> Mandi! Daniel Berteaud In chel di` si favelave... > If at one time, the storage pool went out of space, then the FS is most > likely corrupted. Fixing the space issue will prevent further > corruption, but won't fix the already corrupted FS. You said But *I* fix every day FS corruption! Every night i reboot the VMs that have: fsck.mode=force as grub boot parameters. In logs, i can se that FS get fixed. Nov 13 23:44:20 vdmsv1 kernel: [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.9.0-8-amd64 root=UUID=587fe965-e914-4c0b-a497-a0c71c7e0301 ro quiet fsck.mode=force Nov 13 23:44:20 vdmsv1 systemd-fsck[644]: /dev/sda6: 15062/8495104 files (3.0% non-contiguous), 1687411/33949952 blocks Nov 13 23:44:20 vdmsv1 systemd-fsck[647]: /dev/sdb1: 113267/6553600 files (1.9% non-contiguous), 1590050/26214144 blocks > Anyway, If errors come back after reformating the volume, then you still > have something not fixed. Reading the Ubuntu, Debian and RH bugs in my initial posts, seems to me that this is not the case. The trouble seems exactly the same: same errors, same partial fix incrementing the available RAM to the VM. > Please tell us how are things configured, what > kind of storage it's using, which layers are involved etc... (thin prov, > iSCSI, LVM on top etc...) HS MSA 1040 SAN, exporting iSCSI volumes via LVM. The 'thin' part is on the SAN side, eg no thin-LVM, no ZFS on top of it, ... Another error popup now: Nov 15 13:44:44 vdmsv1 kernel: [136834.664486] EXT4-fs error (device sda6): ext4_mb_generate_buddy:759: group 957, block bitmap and bg descriptor inconsistent: 32747 vs 32768 free clusters Nov 15 13:44:44 vdmsv1 kernel: [136834.671565] EXT4-fs error (device sda6): ext4_mb_generate_buddy:759: group 958, block bitmap and bg descriptor inconsistent: 32765 vs 32768 free clusters Nov 15 13:44:44 vdmsv1 kernel: [136834.813465] JBD2: Spotted dirty metadata buffer (dev = sda6, blocknr = 0). There's a risk of filesystem corruption in case of system crash. increasing the VM ram from 8 to 12 GB lead to a 1,5 day interval between errors, while before errors was every 'less than a day'. This night another 4GB of RAM, another stop and start, ... -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From gaio at sv.lnf.it Thu Nov 15 14:25:08 2018 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Thu, 15 Nov 2018 14:25:08 +0100 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: References: <20181023085858.GG4474@sv.lnf.it> Message-ID: <20181115132508.GJ2709@sv.lnf.it> Mandi! Gerald Brandt In chel di` si favelave... > I've only had filesystem corruption when using XFS in a VM. The same VM have two XFS filesystem, that never get corrupted. ;( -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From daniel at firewall-services.com Thu Nov 15 14:32:11 2018 From: daniel at firewall-services.com (Daniel Berteaud) Date: Thu, 15 Nov 2018 14:32:11 +0100 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: References: <20181023085858.GG4474@sv.lnf.it> Message-ID: <98a38e23-55a5-aa5d-f8f4-0c1b650e1103@firewall-services.com> Le 15/11/2018 ? 13:10, Gerald Brandt a ?crit?: > I've only had filesystem corruption when using XFS in a VM. In my experience, XFS has been more reliable, and robust. But anyway, 99.9% of the time, FS corruption is caused by one of the underlying layers ++ -- Logo FWS *Daniel Berteaud* FIREWALL-SERVICES SAS. Soci?t? de Services en Logiciels Libres Tel : 05 56 64 15 32 Matrix: @dani:fws.fr /www.firewall-services.com/ From gbr at majentis.com Thu Nov 15 14:37:27 2018 From: gbr at majentis.com (Gerald Brandt) Date: Thu, 15 Nov 2018 07:37:27 -0600 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: <98a38e23-55a5-aa5d-f8f4-0c1b650e1103@firewall-services.com> References: <20181023085858.GG4474@sv.lnf.it> <98a38e23-55a5-aa5d-f8f4-0c1b650e1103@firewall-services.com> Message-ID: Interesting. My XFS VM was corrupted every night when I did a snapshot backup. I switched to a shutdown backup and the issue went away. Gerald On 2018-11-15 7:32 a.m., Daniel Berteaud wrote: > Le 15/11/2018 ? 13:10, Gerald Brandt a ?crit?: >> I've only had filesystem corruption when using XFS in a VM. > > In my experience, XFS has been more reliable, and robust. But anyway, > 99.9% of the time, FS corruption is caused by one of the underlying layers > > > ++ > From gaio at sv.lnf.it Thu Nov 15 15:56:00 2018 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Thu, 15 Nov 2018 15:56:00 +0100 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: <98a38e23-55a5-aa5d-f8f4-0c1b650e1103@firewall-services.com> References: <20181023085858.GG4474@sv.lnf.it> <98a38e23-55a5-aa5d-f8f4-0c1b650e1103@firewall-services.com> Message-ID: <20181115145600.GM2709@sv.lnf.it> Mandi! Daniel Berteaud In chel di` si favelave... > In my experience, XFS has been more reliable, and robust. But anyway, > 99.9% of the time, FS corruption is caused by one of the underlying layers ...but the 'underlying layers' is the same of half a dozen other VM/LXC, that have to trouble at all... -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From daniel at firewall-services.com Thu Nov 15 16:18:45 2018 From: daniel at firewall-services.com (Daniel Berteaud) Date: Thu, 15 Nov 2018 16:18:45 +0100 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: <20181115132420.GI2709@sv.lnf.it> References: <20181023085858.GG4474@sv.lnf.it> <20181114174054.GA18133@sv.lnf.it> <20181115105642.GD2709@sv.lnf.it> <20181115113550.GE2709@sv.lnf.it> <231749ab-a148-e10a-f6b6-6f9fc1c92e6d@firewall-services.com> <20181115114939.GG2709@sv.lnf.it> <20181115132420.GI2709@sv.lnf.it> Message-ID: Le 15/11/2018 ? 14:24, Marco Gaiarin a ?crit?: > Mandi! Daniel Berteaud > In chel di` si favelave... > >> If at one time, the storage pool went out of space, then the FS is most >> likely corrupted. Fixing the space issue will prevent further >> corruption, but won't fix the already corrupted FS. You said > But *I* fix every day FS corruption! Every night i reboot the VMs that > have: > fsck.mode=force Then probably the issue is somewhere on the underlying block on your SAN. You should destroy and recreate the image. ++ -- Logo FWS *Daniel Berteaud* FIREWALL-SERVICES SAS. Soci?t? de Services en Logiciels Libres Tel : 05 56 64 15 32 Matrix: @dani:fws.fr /www.firewall-services.com/ From ken.woods at alaska.gov Thu Nov 15 17:24:21 2018 From: ken.woods at alaska.gov (Woods, Ken A (DNR)) Date: Thu, 15 Nov 2018 16:24:21 +0000 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: <98a38e23-55a5-aa5d-f8f4-0c1b650e1103@firewall-services.com> References: <20181023085858.GG4474@sv.lnf.it> , <98a38e23-55a5-aa5d-f8f4-0c1b650e1103@firewall-services.com> Message-ID: <05B095CD-B773-4CC8-80CF-DD98E3FF6F7D@alaska.gov> > On Nov 15, 2018, at 04:32, Daniel Berteaud wrote: > >> Le 15/11/2018 ? 13:10, Gerald Brandt a ?crit : >> I've only had filesystem corruption when using XFS in a VM. > > > In my experience, XFS has been more reliable, and robust. But anyway, > 99.9% of the time, FS corruption is caused by one of the underlying layers Gerald?What is /dev/sda6/ ? I?m thinking it?s not healthy. Move the image to another device and see if the problem continues. > > > ++ > > -- > > Logo FWS > > *Daniel Berteaud* > > FIREWALL-SERVICES SAS. > Soci?t? de Services en Logiciels Libres > Tel : 05 56 64 15 32 > Matrix: @dani:fws.fr > /https://urldefense.proofpoint.com/v2/url?u=http-3A__www.firewall-2Dservices.com_&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=QGMkIpgehPOKsmLfNw6PIROaQjqtjXbSMlpBj5QrMj4&s=upSZV4QynZA1V5Ni9r86nH7oUVIuBMr-WOErXOiVuoM&e= > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=QGMkIpgehPOKsmLfNw6PIROaQjqtjXbSMlpBj5QrMj4&s=PevpVpqhrRp4m8QYDfbsjX6Uv1vbWGlL3dHiZLjiZpM&e= From ken.woods at alaska.gov Thu Nov 15 17:25:46 2018 From: ken.woods at alaska.gov (Woods, Ken A (DNR)) Date: Thu, 15 Nov 2018 16:25:46 +0000 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: <05B095CD-B773-4CC8-80CF-DD98E3FF6F7D@alaska.gov> References: <20181023085858.GG4474@sv.lnf.it> , <98a38e23-55a5-aa5d-f8f4-0c1b650e1103@firewall-services.com>, <05B095CD-B773-4CC8-80CF-DD98E3FF6F7D@alaska.gov> Message-ID: > On Nov 15, 2018, at 07:24, Woods, Ken A (DNR) wrote: > > >>> On Nov 15, 2018, at 04:32, Daniel Berteaud wrote: >>> >>> Le 15/11/2018 ? 13:10, Gerald Brandt a ?crit : >>> I've only had filesystem corruption when using XFS in a VM. >> >> >> In my experience, XFS has been more reliable, and robust. But anyway, >> 99.9% of the time, FS corruption is caused by one of the underlying layers > > Gerald?What is /dev/sda6/ ? s/Marco/Gerald > I?m thinking it?s not healthy. Move the image to another device and see if the problem continues. > >> >> >> ++ >> >> -- >> >> Logo FWS >> >> *Daniel Berteaud* >> >> FIREWALL-SERVICES SAS. >> Soci?t? de Services en Logiciels Libres >> Tel : 05 56 64 15 32 >> Matrix: @dani:fws.fr >> /https://urldefense.proofpoint.com/v2/url?u=http-3A__www.firewall-2Dservices.com_&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=QGMkIpgehPOKsmLfNw6PIROaQjqtjXbSMlpBj5QrMj4&s=upSZV4QynZA1V5Ni9r86nH7oUVIuBMr-WOErXOiVuoM&e= >> >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=QGMkIpgehPOKsmLfNw6PIROaQjqtjXbSMlpBj5QrMj4&s=PevpVpqhrRp4m8QYDfbsjX6Uv1vbWGlL3dHiZLjiZpM&e= From gaio at sv.lnf.it Fri Nov 16 11:32:32 2018 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Fri, 16 Nov 2018 11:32:32 +0100 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: References: <20181114174054.GA18133@sv.lnf.it> <20181115105642.GD2709@sv.lnf.it> <20181115113550.GE2709@sv.lnf.it> <231749ab-a148-e10a-f6b6-6f9fc1c92e6d@firewall-services.com> <20181115114939.GG2709@sv.lnf.it> <20181115132420.GI2709@sv.lnf.it> Message-ID: <20181116103232.GJ5638@sv.lnf.it> Mandi! Daniel Berteaud In chel di` si favelave... > Then probably the issue is somewhere on the underlying block on your > SAN. You should destroy and recreate the image. OK. Because the disks that expose the trouble is: 1) the one that contain / 2) the one that contain /var/cache/squid, and so is 'disposable'. Can i simply stop, backup the VM and recreate it back? Or can be risky, i can take with me some FS corruption? -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From tonci at suma-informatika.hr Sun Nov 18 15:22:39 2018 From: tonci at suma-informatika.hr (=?UTF-8?B?VG9uxI1pIFN0aXBpxI1ldmnEhw==?=) Date: Sun, 18 Nov 2018 15:22:39 +0100 Subject: [PVE-User] backup - long time gap between two vm archives Message-ID: <8a33ccc3-da21-9578-0550-f27f850bfbc2@suma-informatika.hr> Hello to all, Suddenly scheduled backup became? slower and I noticed that big time gap between two archives and it starts right after this line : INFO: status: 99% (111706374144/112751280128), sparse 7% (8948170752), duration 5679, read/write 10/10 MB/s INFO: status: 100% (112751280128/112751280128), sparse 7% (8956145664), duration 5692, read/write 80/79 MB/s INFO: transferred 112751 MB in 5692 seconds (19 MB/s) INFO: archive file size: 71.37GB INFO: delete old backup '/mnt/pve/rn314/dump/vzdump-qemu-2155-2018_10_21-00_01_30.vma.gz' ...? min 20 min pause Either deleting old archive lasts too long or new backup just won't start ... Is there anything we can do on prox side ? Thank you very much in advance BR Tonci -- /srda?an pozdrav / best regards / Ton?i Stipi?evi?, dipl. ing. elektr. /direktor / manager/** ** d.o.o. ltd. *podr?ka / upravljanje **IT*/?sustavima za male i srednje tvrtke/ /Small & Medium Business /*IT*//*support / management* Badali?eva 27 / 10000 Zagreb / Hrvatska ? Croatia url: www.suma-informatika.hr mob: +385 91 1234003 fax: +385 1? 5560007 From gaio at sv.lnf.it Mon Nov 19 15:08:22 2018 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Mon, 19 Nov 2018 15:08:22 +0100 Subject: [PVE-User] Filesystem corruption on a VM? In-Reply-To: <20181115132420.GI2709@sv.lnf.it> References: <20181023085858.GG4474@sv.lnf.it> <20181114174054.GA18133@sv.lnf.it> <20181115105642.GD2709@sv.lnf.it> <20181115113550.GE2709@sv.lnf.it> <231749ab-a148-e10a-f6b6-6f9fc1c92e6d@firewall-services.com> <20181115114939.GG2709@sv.lnf.it> <20181115132420.GI2709@sv.lnf.it> Message-ID: <20181119140822.GG2916@sv.lnf.it> > This night another 4GB of RAM, another stop and start, ... OK, with 16GB of ram 5 days passed without FS errors. Also, the other VM, same stretch kernel, roughly same configuration, start to expose same errors: Nov 18 10:12:21 vdmsv2 kernel: [584252.496880] EXT4-fs error (device sda6): ext4_mb_generate_buddy:758: group 104, block bitmap and bg descriptor inconsistent: 2048 vs 32768 free clusters Nov 18 10:12:21 vdmsv2 kernel: [584252.590564] JBD2: Spotted dirty metadata buffer (dev = sda6, blocknr = 0). There's a risk of filesystem corruption in case of system crash. Note that this VM was built *AFTER* my SAN glitches happens. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From frank.thommen at uni-heidelberg.de Thu Nov 22 19:29:56 2018 From: frank.thommen at uni-heidelberg.de (Frank Thommen) Date: Thu, 22 Nov 2018 19:29:56 +0100 Subject: [PVE-User] Cluster network via directly connected interfaces? Message-ID: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> Please excuse, if this is too basic, but after reading https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if the cluster/corosync network could be built by directly connected network interfaces. I.e not like this: +-------+ | pve01 |----------+ +-------+ | | +-------+ +----------------+ | pve02 |-----| network switch | +-------+ +----------------+ | +-------+ | | pve03 |----------+ +-------+ but like this: +-------+ | pve01 |---+ +-------+ | | | +-------+ | | pve02 | | +-------+ | | | +-------+ | | pve03 |---+ +-------+ (all connections 1Gbit, there are currently not plans to extend over three nodes) I can't see any drawback in that solution. It would remove one layer of hardware dependency and potential spof (the switch). If we don't trust the interfaces, we might be able to configure a second network with the three remaining interfaces. Is such a "direct-connection" topology feasible? Recommended? Strictly not recommended? I am currently just planning and thinking and there is no cluster (or even a PROXMOX server) in place. Cheers frank From mark at tuxis.nl Thu Nov 22 19:34:04 2018 From: mark at tuxis.nl (=?UTF-8?B?TWFyayBTY2hvdXRlbg==?=) Date: Thu, 22 Nov 2018 19:34:04 +0100 Subject: [PVE-User] Cluster network via directly connected interfaces? In-Reply-To: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> References: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> Message-ID: <9C62538C-EC05-4609-89F1-5B2D2A1DD6B5@tuxis.nl> Other than limited throughput, I can?t think of a problem. But limited throughput might cause unforeseen situations. Mark Schouten > Op 22 nov. 2018 om 19:30 heeft Frank Thommen het volgende geschreven: > > Please excuse, if this is too basic, but after reading https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if the cluster/corosync network could be built by directly connected network interfaces. I.e not like this: > > +-------+ > | pve01 |----------+ > +-------+ | > | > +-------+ +----------------+ > | pve02 |-----| network switch | > +-------+ +----------------+ > | > +-------+ | > | pve03 |----------+ > +-------+ > > > but like this: > > +-------+ > | pve01 |---+ > +-------+ | > | | > +-------+ | > | pve02 | | > +-------+ | > | | > +-------+ | > | pve03 |---+ > +-------+ > > (all connections 1Gbit, there are currently not plans to extend over three nodes) > > I can't see any drawback in that solution. It would remove one layer of hardware dependency and potential spof (the switch). If we don't trust the interfaces, we might be able to configure a second network with the three remaining interfaces. > > Is such a "direct-connection" topology feasible? Recommended? Strictly not recommended? > > I am currently just planning and thinking and there is no cluster (or even a PROXMOX server) in place. > > Cheers > frank > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From frank.thommen at uni-heidelberg.de Thu Nov 22 19:37:29 2018 From: frank.thommen at uni-heidelberg.de (Frank Thommen) Date: Thu, 22 Nov 2018 19:37:29 +0100 Subject: [PVE-User] Cluster network via directly connected interfaces? In-Reply-To: <9C62538C-EC05-4609-89F1-5B2D2A1DD6B5@tuxis.nl> References: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> <9C62538C-EC05-4609-89F1-5B2D2A1DD6B5@tuxis.nl> Message-ID: But the throughput would be higher when using a switch, would it? It's still just 1Gbit frank On 11/22/2018 07:34 PM, Mark Schouten wrote: > Other than limited throughput, I can?t think of a problem. But limited throughput might cause unforeseen situations. > > Mark Schouten > >> Op 22 nov. 2018 om 19:30 heeft Frank Thommen het volgende geschreven: >> >> Please excuse, if this is too basic, but after reading https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if the cluster/corosync network could be built by directly connected network interfaces. I.e not like this: >> >> +-------+ >> | pve01 |----------+ >> +-------+ | >> | >> +-------+ +----------------+ >> | pve02 |-----| network switch | >> +-------+ +----------------+ >> | >> +-------+ | >> | pve03 |----------+ >> +-------+ >> >> >> but like this: >> >> +-------+ >> | pve01 |---+ >> +-------+ | >> | | >> +-------+ | >> | pve02 | | >> +-------+ | >> | | >> +-------+ | >> | pve03 |---+ >> +-------+ >> >> (all connections 1Gbit, there are currently not plans to extend over three nodes) >> >> I can't see any drawback in that solution. It would remove one layer of hardware dependency and potential spof (the switch). If we don't trust the interfaces, we might be able to configure a second network with the three remaining interfaces. >> >> Is such a "direct-connection" topology feasible? Recommended? Strictly not recommended? >> >> I am currently just planning and thinking and there is no cluster (or even a PROXMOX server) in place. >> >> Cheers >> frank >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > -- Frank Thommen | HD-HuB / DKFZ Heidelberg | frank.thommen at uni-heidelberg.de | MMK: +49-6221-54-3637 (Mo-Mi, Fr) | IPMB: +49-6221-54-5823 (Do) From frank.thommen at uni-heidelberg.de Thu Nov 22 19:42:19 2018 From: frank.thommen at uni-heidelberg.de (Frank Thommen) Date: Thu, 22 Nov 2018 19:42:19 +0100 Subject: [PVE-User] Cluster network via directly connected interfaces? In-Reply-To: References: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> <9C62538C-EC05-4609-89F1-5B2D2A1DD6B5@tuxis.nl> Message-ID: <3f4ef213-24c5-b444-3c3c-2e5dc3b4cafc@uni-heidelberg.de> What I /really/ meant was "but the throughput would /not/ be higher when using a switch"... On 11/22/2018 07:37 PM, Frank Thommen wrote: > But the throughput would be higher when using a switch, would it?? It's > still just 1Gbit > > frank > > > On 11/22/2018 07:34 PM, Mark Schouten wrote: >> Other than limited throughput, I can?t think of a problem. But limited >> throughput might cause unforeseen situations. >> >> Mark Schouten >> >>> Op 22 nov. 2018 om 19:30 heeft Frank Thommen >>> het volgende geschreven: >>> >>> Please excuse, if this is too basic, but after reading >>> https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if the >>> cluster/corosync network could be built by directly connected network >>> interfaces.? I.e not like this: >>> >>> +-------+ >>> | pve01 |----------+ >>> +-------+????????? | >>> ??????????????????? | >>> +-------+???? +----------------+ >>> | pve02 |-----| network switch | >>> +-------+???? +----------------+ >>> ??????????????????? | >>> +-------+????????? | >>> | pve03 |----------+ >>> +-------+ >>> >>> >>> but like this: >>> >>> +-------+ >>> | pve01 |---+ >>> +-------+?? | >>> ???? |?????? | >>> +-------+?? | >>> | pve02 |?? | >>> +-------+?? | >>> ???? |?????? | >>> +-------+?? | >>> | pve03 |---+ >>> +-------+ >>> >>> (all connections 1Gbit, there are currently not plans to extend over >>> three nodes) >>> >>> I can't see any drawback in that solution.? It would remove one layer >>> of hardware dependency and potential spof (the switch).? If we don't >>> trust the interfaces, we might be able to configure a second network >>> with the three remaining interfaces. >>> >>> Is such a "direct-connection" topology feasible?? Recommended? >>> Strictly not recommended? >>> >>> I am currently just planning and thinking and there is no cluster (or >>> even a PROXMOX server) in place. >>> >>> Cheers >>> frank >>> _______________________________________________ >>> pve-user mailing list >>> pve-user at pve.proxmox.com >>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> > -- Frank Thommen | HD-HuB / DKFZ Heidelberg | frank.thommen at uni-heidelberg.de | MMK: +49-6221-54-3637 (Mo-Mi, Fr) | IPMB: +49-6221-54-5823 (Do) From uwe.sauter.de at gmail.com Thu Nov 22 19:51:14 2018 From: uwe.sauter.de at gmail.com (Uwe Sauter) Date: Thu, 22 Nov 2018 19:51:14 +0100 Subject: [PVE-User] Cluster network via directly connected interfaces? In-Reply-To: <3f4ef213-24c5-b444-3c3c-2e5dc3b4cafc@uni-heidelberg.de> References: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> <9C62538C-EC05-4609-89F1-5B2D2A1DD6B5@tuxis.nl> <3f4ef213-24c5-b444-3c3c-2e5dc3b4cafc@uni-heidelberg.de> Message-ID: <7b804260-f70f-8e0a-b3c5-e989da30d4bd@gmail.com> FYI: I had such a thing working. What you need to keep in mind is that you should configure both interfaces per host on the same (software) bridge and keep STP on? that way when you loose the link from node A to node B the traffic will be going through node C. +--------------------+ | | | Node A br0 | | / \ | | eth0 eth1 | +------/-----------\-+ / \ +----/------+ +-----\----+ | eth1 | | eth0 | | / | | \ | | br0--eth0-----eth1--br0 | | Node B | | Node C | +-----------+ +----------+ Am 22.11.18 um 19:42 schrieb Frank Thommen: > What I /really/ meant was "but the throughput would /not/ be higher when using a switch"... > > > On 11/22/2018 07:37 PM, Frank Thommen wrote: >> But the throughput would be higher when using a switch, would it?? It's still just 1Gbit >> >> frank >> >> >> On 11/22/2018 07:34 PM, Mark Schouten wrote: >>> Other than limited throughput, I can?t think of a problem. But limited throughput might cause unforeseen situations. >>> >>> Mark Schouten >>> >>>> Op 22 nov. 2018 om 19:30 heeft Frank Thommen het volgende geschreven: >>>> >>>> Please excuse, if this is too basic, but after reading https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if >>>> the cluster/corosync network could be built by directly connected network interfaces.? I.e not like this: >>>> >>>> +-------+ >>>> | pve01 |----------+ >>>> +-------+????????? | >>>> ??????????????????? | >>>> +-------+???? +----------------+ >>>> | pve02 |-----| network switch | >>>> +-------+???? +----------------+ >>>> ??????????????????? | >>>> +-------+????????? | >>>> | pve03 |----------+ >>>> +-------+ >>>> >>>> >>>> but like this: >>>> >>>> +-------+ >>>> | pve01 |---+ >>>> +-------+?? | >>>> ???? |?????? | >>>> +-------+?? | >>>> | pve02 |?? | >>>> +-------+?? | >>>> ???? |?????? | >>>> +-------+?? | >>>> | pve03 |---+ >>>> +-------+ >>>> >>>> (all connections 1Gbit, there are currently not plans to extend over three nodes) >>>> >>>> I can't see any drawback in that solution.? It would remove one layer of hardware dependency and potential spof (the >>>> switch).? If we don't trust the interfaces, we might be able to configure a second network with the three remaining >>>> interfaces. >>>> >>>> Is such a "direct-connection" topology feasible?? Recommended? Strictly not recommended? >>>> >>>> I am currently just planning and thinking and there is no cluster (or even a PROXMOX server) in place. >>>> >>>> Cheers >>>> frank >>>> _______________________________________________ >>>> pve-user mailing list >>>> pve-user at pve.proxmox.com >>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>> _______________________________________________ >>> pve-user mailing list >>> pve-user at pve.proxmox.com >>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>> >> > From frank.thommen at uni-heidelberg.de Thu Nov 22 19:55:28 2018 From: frank.thommen at uni-heidelberg.de (Frank Thommen) Date: Thu, 22 Nov 2018 19:55:28 +0100 Subject: [PVE-User] Cluster network via directly connected interfaces? In-Reply-To: <7b804260-f70f-8e0a-b3c5-e989da30d4bd@gmail.com> References: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> <9C62538C-EC05-4609-89F1-5B2D2A1DD6B5@tuxis.nl> <3f4ef213-24c5-b444-3c3c-2e5dc3b4cafc@uni-heidelberg.de> <7b804260-f70f-8e0a-b3c5-e989da30d4bd@gmail.com> Message-ID: Good point. Thanks a lot frank On 11/22/2018 07:51 PM, Uwe Sauter wrote: > FYI: > > I had such a thing working. What you need to keep in mind is that you > should configure both interfaces per host on the same (software) bridge > and keep STP on? that way when you loose the link from node A to node B > the traffic will be going through node C. > > +--------------------+ > |??????????????????? | > | Node A?? br0?????? | > |???????? /?? \????? | > |?????? eth0?? eth1? | > +------/-----------\-+ > ????? /???????????? \ > +----/------+? +-----\----+ > |? eth1???? |? |??? eth0? | > |? /??????? |? |?????? \? | > | br0--eth0-----eth1--br0 | > |?? Node B? |? |? Node C? | > +-----------+? +----------+ > > > > > Am 22.11.18 um 19:42 schrieb Frank Thommen: >> What I /really/ meant was "but the throughput would /not/ be higher >> when using a switch"... >> >> >> On 11/22/2018 07:37 PM, Frank Thommen wrote: >>> But the throughput would be higher when using a switch, would it? >>> It's still just 1Gbit >>> >>> frank >>> >>> >>> On 11/22/2018 07:34 PM, Mark Schouten wrote: >>>> Other than limited throughput, I can?t think of a problem. But >>>> limited throughput might cause unforeseen situations. >>>> >>>> Mark Schouten >>>> >>>>> Op 22 nov. 2018 om 19:30 heeft Frank Thommen >>>>> het volgende geschreven: >>>>> >>>>> Please excuse, if this is too basic, but after reading >>>>> https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if the >>>>> cluster/corosync network could be built by directly connected >>>>> network interfaces.? I.e not like this: >>>>> >>>>> +-------+ >>>>> | pve01 |----------+ >>>>> +-------+????????? | >>>>> ??????????????????? | >>>>> +-------+???? +----------------+ >>>>> | pve02 |-----| network switch | >>>>> +-------+???? +----------------+ >>>>> ??????????????????? | >>>>> +-------+????????? | >>>>> | pve03 |----------+ >>>>> +-------+ >>>>> >>>>> >>>>> but like this: >>>>> >>>>> +-------+ >>>>> | pve01 |---+ >>>>> +-------+?? | >>>>> ???? |?????? | >>>>> +-------+?? | >>>>> | pve02 |?? | >>>>> +-------+?? | >>>>> ???? |?????? | >>>>> +-------+?? | >>>>> | pve03 |---+ >>>>> +-------+ >>>>> >>>>> (all connections 1Gbit, there are currently not plans to extend >>>>> over three nodes) >>>>> >>>>> I can't see any drawback in that solution.? It would remove one >>>>> layer of hardware dependency and potential spof (the switch).? If >>>>> we don't trust the interfaces, we might be able to configure a >>>>> second network with the three remaining interfaces. >>>>> >>>>> Is such a "direct-connection" topology feasible?? Recommended? >>>>> Strictly not recommended? >>>>> >>>>> I am currently just planning and thinking and there is no cluster >>>>> (or even a PROXMOX server) in place. >>>>> >>>>> Cheers >>>>> frank >>>>> _______________________________________________ >>>>> pve-user mailing list >>>>> pve-user at pve.proxmox.com >>>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>>> _______________________________________________ >>>> pve-user mailing list >>>> pve-user at pve.proxmox.com >>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>>> >>> >> > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From uwe.sauter.de at gmail.com Thu Nov 22 20:12:56 2018 From: uwe.sauter.de at gmail.com (Uwe Sauter) Date: Thu, 22 Nov 2018 20:12:56 +0100 Subject: [PVE-User] Cluster network via directly connected interfaces? In-Reply-To: References: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> <9C62538C-EC05-4609-89F1-5B2D2A1DD6B5@tuxis.nl> <3f4ef213-24c5-b444-3c3c-2e5dc3b4cafc@uni-heidelberg.de> <7b804260-f70f-8e0a-b3c5-e989da30d4bd@gmail.com> Message-ID: <31538c50-c9e0-118b-6005-86a5a7eaf818@gmail.com> And one other thing. I don't think that multicast traffic is possible in this solution so you need to configure corosync to do unicast UDP. Make this change after creating the cluster on the first node but before joining any other nodes. Easiest point in time for that change. /etc/pve/corosync.conf totem { [?] config_version: +=1 ######### means: increment by one fore every change transport: udpu } And, as you already mentioned, having such a setup won't scale. Three nodes is the only size where this is sensible to do. Do you plan to use Ceph? Am 22.11.18 um 19:55 schrieb Frank Thommen: > Good point.? Thanks a lot > frank > > > On 11/22/2018 07:51 PM, Uwe Sauter wrote: >> FYI: >> >> I had such a thing working. What you need to keep in mind is that you should configure both interfaces per host on the >> same (software) bridge and keep STP on? that way when you loose the link from node A to node B the traffic will be >> going through node C. >> >> +--------------------+ >> |??????????????????? | >> | Node A?? br0?????? | >> |???????? /?? \????? | >> |?????? eth0?? eth1? | >> +------/-----------\-+ >> ?????? /???????????? \ >> +----/------+? +-----\----+ >> |? eth1???? |? |??? eth0? | >> |? /??????? |? |?????? \? | >> | br0--eth0-----eth1--br0 | >> |?? Node B? |? |? Node C? | >> +-----------+? +----------+ >> >> >> >> >> Am 22.11.18 um 19:42 schrieb Frank Thommen: >>> What I /really/ meant was "but the throughput would /not/ be higher when using a switch"... >>> >>> >>> On 11/22/2018 07:37 PM, Frank Thommen wrote: >>>> But the throughput would be higher when using a switch, would it? It's still just 1Gbit >>>> >>>> frank >>>> >>>> >>>> On 11/22/2018 07:34 PM, Mark Schouten wrote: >>>>> Other than limited throughput, I can?t think of a problem. But limited throughput might cause unforeseen situations. >>>>> >>>>> Mark Schouten >>>>> >>>>>> Op 22 nov. 2018 om 19:30 heeft Frank Thommen het volgende geschreven: >>>>>> >>>>>> Please excuse, if this is too basic, but after reading https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if >>>>>> the cluster/corosync network could be built by directly connected network interfaces.? I.e not like this: >>>>>> >>>>>> +-------+ >>>>>> | pve01 |----------+ >>>>>> +-------+????????? | >>>>>> ??????????????????? | >>>>>> +-------+???? +----------------+ >>>>>> | pve02 |-----| network switch | >>>>>> +-------+???? +----------------+ >>>>>> ??????????????????? | >>>>>> +-------+????????? | >>>>>> | pve03 |----------+ >>>>>> +-------+ >>>>>> >>>>>> >>>>>> but like this: >>>>>> >>>>>> +-------+ >>>>>> | pve01 |---+ >>>>>> +-------+?? | >>>>>> ???? |?????? | >>>>>> +-------+?? | >>>>>> | pve02 |?? | >>>>>> +-------+?? | >>>>>> ???? |?????? | >>>>>> +-------+?? | >>>>>> | pve03 |---+ >>>>>> +-------+ >>>>>> >>>>>> (all connections 1Gbit, there are currently not plans to extend over three nodes) >>>>>> >>>>>> I can't see any drawback in that solution.? It would remove one layer of hardware dependency and potential spof >>>>>> (the switch).? If we don't trust the interfaces, we might be able to configure a second network with the three >>>>>> remaining interfaces. >>>>>> >>>>>> Is such a "direct-connection" topology feasible?? Recommended? Strictly not recommended? >>>>>> >>>>>> I am currently just planning and thinking and there is no cluster (or even a PROXMOX server) in place. >>>>>> >>>>>> Cheers >>>>>> frank >>>>>> _______________________________________________ >>>>>> pve-user mailing list >>>>>> pve-user at pve.proxmox.com >>>>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>>>> _______________________________________________ >>>>> pve-user mailing list >>>>> pve-user at pve.proxmox.com >>>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>>>> >>>> >>> >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From t.lamprecht at proxmox.com Thu Nov 22 21:06:30 2018 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Thu, 22 Nov 2018 21:06:30 +0100 Subject: [PVE-User] Cluster network via directly connected interfaces? In-Reply-To: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> References: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> Message-ID: <5560d5c8-d15d-e03e-683a-085d5acdb3a3@proxmox.com> On 11/22/18 7:29 PM, Frank Thommen wrote: > Please excuse, if this is too basic, but after reading https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if the cluster/corosync network could be built by directly connected network interfaces.? I.e not like this: > > ?+-------+ > ?| pve01 |----------+ > ?+-------+????????? | > ??????????????????? | > ?+-------+???? +----------------+ > ?| pve02 |-----| network switch | > ?+-------+???? +----------------+ > ??????????????????? | > ?+-------+????????? | > ?| pve03 |----------+ > ?+-------+ > > > but like this: > > ?+-------+ > ?| pve01 |---+ > ?+-------+?? | > ???? |?????? | > ?+-------+?? | > ?| pve02 |?? | > ?+-------+?? | > ???? |?????? | > ?+-------+?? | > ?| pve03 |---+ > ?+-------+ > > (all connections 1Gbit, there are currently not plans to extend over three nodes) > > I can't see any drawback in that solution.? It would remove one layer of hardware dependency and potential spof (the switch).? If we don't trust the interfaces, we might be able to configure a second network with the three remaining interfaces. > > Is such a "direct-connection" topology feasible?? Recommended? Strictly not recommended? full mesh is certainly not bad. for cluster network (corosync) latency is the key, bandwidth isn't really much needed. So this surely not bad. We use also some 10g (or 40G, not sure) full mesh for a ceph cluster network - you safe a not to cheap switch and get full bandwidth and good latency. The limiting factor is that this gets quite complex for bigger clusters, but besides that it doesn't really has any drawbacks for inter cluster connects, AFAICT. For multicast you need to try, as Uwe said, I'm currently not sure, it could work as Linux can route multicast just fine (mrouter) but I don't remember exactly anymore - sorry. But if you try it it'd be great if you report back. Else unicast is in those cluster sizes always an option - you really shouldn't have a problem as long as you do not put storage traffic together with corosync (cluster) on the same net (corosync gets to much latency spikes then). > > I am currently just planning and thinking and there is no cluster (or even a PROXMOX server) in place. > > Cheers > frank From smr at kmi.com Fri Nov 23 09:13:52 2018 From: smr at kmi.com (Stefan M. Radman) Date: Fri, 23 Nov 2018 08:13:52 +0000 Subject: [PVE-User] Cluster network via directly connected interfaces? In-Reply-To: <5560d5c8-d15d-e03e-683a-085d5acdb3a3@proxmox.com> References: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> <5560d5c8-d15d-e03e-683a-085d5acdb3a3@proxmox.com> Message-ID: <6D56C3A3-3130-4540-934D-3E629BE09278@kmi.com> You might want to have a look at https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server This is what (I think) Thomas Lamprecht is referring to and it should also be usable for corosync. The advantage of this configuration over the bridged solution used by Uwe Sauter is the zero convergence time of the topology. A bridged solution using the standard Linux bridge might break your corosync ring for a long time (20-50 seconds) during STP state transitions. Disclaimer: I have tried neither of the two solutions. Cheers Stefan From lists at merit.unu.edu Fri Nov 23 10:24:28 2018 From: lists at merit.unu.edu (lists) Date: Fri, 23 Nov 2018 10:24:28 +0100 Subject: [PVE-User] Cluster network via directly connected interfaces? In-Reply-To: <6D56C3A3-3130-4540-934D-3E629BE09278@kmi.com> References: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> <5560d5c8-d15d-e03e-683a-085d5acdb3a3@proxmox.com> <6D56C3A3-3130-4540-934D-3E629BE09278@kmi.com> Message-ID: <264e14f6-c0dc-af5b-31b7-62cf0ec2e497@merit.unu.edu> Hi, On 23-11-2018 9:13, Stefan M. Radman wrote: > You might want to have a look at > https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server We are running that config (method 2) and we have never noticed any multicast issues. MJ From ronny+pve-user at aasen.cx Fri Nov 23 12:00:18 2018 From: ronny+pve-user at aasen.cx (Ronny Aasen) Date: Fri, 23 Nov 2018 12:00:18 +0100 Subject: [PVE-User] Cluster network via directly connected interfaces? In-Reply-To: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> References: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> Message-ID: <61080886-2008-af66-4da8-1c044a44fea1@aasen.cx> Personally if i was to try and experiment with something non-default I would try to use ospf+bfd either with bird or quagga. -you get quick failovers due to bfd. -you can equal cost multipath links to utillize multiple ports between servers. -All links are active, so you do not have a "passive" link, as you have with STP -and there is no needless duplication of data, so you do not get the 50% bandwith loss of a broadcast bond. -you need to use corosync with targeted udp towards spesific loopback addresses. -traffic goes shortest path. so allways towards the correct server. - you can very easily expand beyond 3 nodes if you have enough ports. Or move the ospf domain onto a switch if needed. this also easily converts to a multiple switch config to maintain HA and no SPOF Happy experimentation! mvh Ronny Aasen On 11/22/18 7:29 PM, Frank Thommen wrote: > Please excuse, if this is too basic, but after reading > https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if the > cluster/corosync network could be built by directly connected network > interfaces.? I.e not like this: > > ?+-------+ > ?| pve01 |----------+ > ?+-------+????????? | > ??????????????????? | > ?+-------+???? +----------------+ > ?| pve02 |-----| network switch | > ?+-------+???? +----------------+ > ??????????????????? | > ?+-------+????????? | > ?| pve03 |----------+ > ?+-------+ > > > but like this: > > ?+-------+ > ?| pve01 |---+ > ?+-------+?? | > ???? |?????? | > ?+-------+?? | > ?| pve02 |?? | > ?+-------+?? | > ???? |?????? | > ?+-------+?? | > ?| pve03 |---+ > ?+-------+ > > (all connections 1Gbit, there are currently not plans to extend over > three nodes) > > I can't see any drawback in that solution.? It would remove one layer of > hardware dependency and potential spof (the switch).? If we don't trust > the interfaces, we might be able to configure a second network with the > three remaining interfaces. > > Is such a "direct-connection" topology feasible?? Recommended? Strictly > not recommended? > > I am currently just planning and thinking and there is no cluster (or > even a PROXMOX server) in place. > > Cheers > frank > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From smr at kmi.com Fri Nov 23 12:30:51 2018 From: smr at kmi.com (Stefan M. Radman) Date: Fri, 23 Nov 2018 11:30:51 +0000 Subject: [PVE-User] Cluster network via directly connected interfaces? In-Reply-To: <61080886-2008-af66-4da8-1c044a44fea1@aasen.cx> References: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> <61080886-2008-af66-4da8-1c044a44fea1@aasen.cx> Message-ID: <6B794028-BE60-4588-B574-86A6079707F7@kmi.com> Hi Ronny That's the first time I hear of a routing protocol in the corosync context. Doesn't that add a whole lot of complexity in the setup? Would it work with corosync multicast? Stefan > On Nov 23, 2018, at 12:00 PM, Ronny Aasen wrote: > > Personally if i was to try and experiment with something non-default I would try to use ospf+bfd either with bird or quagga. > > -you get quick failovers due to bfd. > -you can equal cost multipath links to utillize multiple ports between servers. > -All links are active, so you do not have a "passive" link, as you have with STP > -and there is no needless duplication of data, so you do not get the 50% bandwith loss of a broadcast bond. > -you need to use corosync with targeted udp towards spesific loopback addresses. > -traffic goes shortest path. so allways towards the correct server. > - you can very easily expand beyond 3 nodes if you have enough ports. Or move the ospf domain onto a switch if needed. this also easily converts to a multiple switch config to maintain HA and no SPOF > > Happy experimentation! > > mvh > Ronny Aasen > > > > > > On 11/22/18 7:29 PM, Frank Thommen wrote: >> Please excuse, if this is too basic, but after reading https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if the cluster/corosync network could be built by directly connected network interfaces. I.e not like this: >> +-------+ >> | pve01 |----------+ >> +-------+ | >> | >> +-------+ +----------------+ >> | pve02 |-----| network switch | >> +-------+ +----------------+ >> | >> +-------+ | >> | pve03 |----------+ >> +-------+ >> but like this: >> +-------+ >> | pve01 |---+ >> +-------+ | >> | | >> +-------+ | >> | pve02 | | >> +-------+ | >> | | >> +-------+ | >> | pve03 |---+ >> +-------+ >> (all connections 1Gbit, there are currently not plans to extend over three nodes) >> I can't see any drawback in that solution. It would remove one layer of hardware dependency and potential spof (the switch). If we don't trust the interfaces, we might be able to configure a second network with the three remaining interfaces. >> Is such a "direct-connection" topology feasible? Recommended? Strictly not recommended? >> I am currently just planning and thinking and there is no cluster (or even a PROXMOX server) in place. >> Cheers >> frank >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From ronny+pve-user at aasen.cx Fri Nov 23 13:55:41 2018 From: ronny+pve-user at aasen.cx (Ronny Aasen) Date: Fri, 23 Nov 2018 13:55:41 +0100 Subject: [PVE-User] Cluster network via directly connected interfaces? In-Reply-To: <6B794028-BE60-4588-B574-86A6079707F7@kmi.com> References: <974ecccb-3491-2d83-4b38-d14a4f760098@uni-heidelberg.de> <61080886-2008-af66-4da8-1c044a44fea1@aasen.cx> <6B794028-BE60-4588-B574-86A6079707F7@kmi.com> Message-ID: keep in mind that this is just a mental experiment... but i think more standardized then the spanning tree or bond0 hack. That being said. I am absolutely testing this when i get some available hardware :) if you could do it with ipv6 it would probably be less complexity. a single loopback and the bird routing daemon... DONE! with ipv4 is is a mess as usually. you would also need /30 (or/31 ptp) link networks on every single link. multicast would not work. unless you used some sort of multicast proxy or routing daemon. have no idea how that would work. All my clusters are small enough so i use ip targeted corosync with udp Ronny Aasen On 11/23/18 12:30 PM, Stefan M. Radman wrote: > Hi Ronny > > That's the first time I hear of a routing protocol in the corosync context. > Doesn't that add a whole lot of complexity in the setup? > Would it work with corosync multicast? > > Stefan > >> On Nov 23, 2018, at 12:00 PM, Ronny Aasen wrote: >> >> Personally if i was to try and experiment with something non-default I would try to use ospf+bfd either with bird or quagga. >> >> -you get quick failovers due to bfd. >> -you can equal cost multipath links to utillize multiple ports between servers. >> -All links are active, so you do not have a "passive" link, as you have with STP >> -and there is no needless duplication of data, so you do not get the 50% bandwith loss of a broadcast bond. >> -you need to use corosync with targeted udp towards spesific loopback addresses. >> -traffic goes shortest path. so allways towards the correct server. >> - you can very easily expand beyond 3 nodes if you have enough ports. Or move the ospf domain onto a switch if needed. this also easily converts to a multiple switch config to maintain HA and no SPOF >> >> Happy experimentation! >> >> mvh >> Ronny Aasen >> >> >> >> >> >> On 11/22/18 7:29 PM, Frank Thommen wrote: >>> Please excuse, if this is too basic, but after reading https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if the cluster/corosync network could be built by directly connected network interfaces. I.e not like this: >>> +-------+ >>> | pve01 |----------+ >>> +-------+ | >>> | >>> +-------+ +----------------+ >>> | pve02 |-----| network switch | >>> +-------+ +----------------+ >>> | >>> +-------+ | >>> | pve03 |----------+ >>> +-------+ >>> but like this: >>> +-------+ >>> | pve01 |---+ >>> +-------+ | >>> | | >>> +-------+ | >>> | pve02 | | >>> +-------+ | >>> | | >>> +-------+ | >>> | pve03 |---+ >>> +-------+ >>> (all connections 1Gbit, there are currently not plans to extend over three nodes) >>> I can't see any drawback in that solution. It would remove one layer of hardware dependency and potential spof (the switch). If we don't trust the interfaces, we might be able to configure a second network with the three remaining interfaces. >>> Is such a "direct-connection" topology feasible? Recommended? Strictly not recommended? >>> I am currently just planning and thinking and there is no cluster (or even a PROXMOX server) in place. >>> Cheers >>> frank >>> _______________________________________________ >>> pve-user mailing list >>> pve-user at pve.proxmox.com >>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From axel50397 at gmail.com Sun Nov 25 15:20:44 2018 From: axel50397 at gmail.com (Adnan RIHAN) Date: Sun, 25 Nov 2018 06:20:44 -0800 Subject: [PVE-User] Local interface on Promox server Message-ID: Hi there, I?m using Proxmox for years on a remote server, and have recently installed it in my company (so, in the same LAN). In our server room, we have 2 servers (both Proxmox) and 2 Synology NAS, all these servers can only be managed by another client using a web browser. We don?t have any client machine in the server room, so when we fix something in the room (cables, routing, etc?), we need to go out and check the VMs on another machine outside the room, sometimes making us come back, etc? I know VMs can be controlled by command line using qemu, but is there another way to locally control the machines on the Proxmox server, except by installing a desktop manager and pointing the web browser on localhost:8006? Is it even safe to do that? We have a KVM in our bay, we can physically access the machines, is there maybe a way to physically be connected to a VM (as if we were physically connected to a Windows VM for instance)? Thanks for your help. -- Regards, Adnan RIHAN GPG: 5675-62BA (https://keybase.io/max13/key.asc) ? If you are not using GPG/PGP but want to send me an encrypted e-mail: https://encrypt.to/0x567562BA. From yannis.milios at gmail.com Sun Nov 25 16:25:45 2018 From: yannis.milios at gmail.com (Yannis Milios) Date: Sun, 25 Nov 2018 15:25:45 +0000 Subject: [PVE-User] Local interface on Promox server In-Reply-To: References: Message-ID: We don?t have any client machine in the server room, so > when we fix something in the room (cables, routing, etc?), we need to > go out and check the VMs on another machine outside the room, > sometimes making us come back, etc? > Is it really that difficult to get a laptop in the server room to manage the servers? > I know VMs can be controlled by command line using qemu, but is there > another way to locally control the machines on the Proxmox server, > except by installing a desktop manager and pointing the web browser on > localhost:8006? Is it even safe to do that? > Personally I would avoid installing a full Desktop environment on the PVE hosts. Apart from adding unnecessary load, it can also expand the attack surface on the servers. If you insist though, I would recommend a simple Window Manager instead, something like Fluxbox for example. We have a KVM in our bay, we can physically access the machines, is > there maybe a way to physically be connected to a VM (as if we were > physically connected to a Windows VM for instance)? > None that I'm aware of, but sounds like you are trying to over complicate things... :) Yannis From uwe.sauter.de at gmail.com Sun Nov 25 17:43:21 2018 From: uwe.sauter.de at gmail.com (Uwe Sauter) Date: Sun, 25 Nov 2018 17:43:21 +0100 Subject: [PVE-User] Local interface on Promox server In-Reply-To: References: Message-ID: <0084a32c-b397-5d42-1d84-3efea9d09545@gmail.com> You could use qm terminal to connect to the serial console. Ctrl + o will quit the session. You need to configure your VMs to provide a serial console, e.g. by adding "console=tty0 console=ttyS0,115200n8" to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and running "grub-mkconfig -o /boot/grub/grub.cfg". Am 25.11.18 um 16:25 schrieb Yannis Milios: > We don?t have any client machine in the server room, so >> when we fix something in the room (cables, routing, etc?), we need to >> go out and check the VMs on another machine outside the room, >> sometimes making us come back, etc? >> > > Is it really that difficult to get a laptop in the server room to manage > the servers? > > >> I know VMs can be controlled by command line using qemu, but is there >> another way to locally control the machines on the Proxmox server, >> except by installing a desktop manager and pointing the web browser on >> localhost:8006? Is it even safe to do that? >> > > Personally I would avoid installing a full Desktop environment on the PVE > hosts. Apart from adding unnecessary load, it can also > expand the attack surface on the servers. If you insist though, I would > recommend a simple Window Manager instead, > something like Fluxbox for example. > > We have a KVM in our bay, we can physically access the machines, is >> there maybe a way to physically be connected to a VM (as if we were >> physically connected to a Windows VM for instance)? >> > > None that I'm aware of, but sounds like you are trying to over complicate > things... :) > > Yannis > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From joachim at tingvold.com Mon Nov 26 10:54:33 2018 From: joachim at tingvold.com (Joachim Tingvold) Date: Mon, 26 Nov 2018 10:54:33 +0100 Subject: [PVE-User] Host interface down; take down bridge/VM-interfaces? Message-ID: Hi, Is there a ?built in? way to bring down a bridge (or individual VM-interfaces) if/when the physical interface on the host goes down? (without having to resort to custom trigger scripts) Relevant for both containers and VMs, but my current use case is for VMs specifically. Single physical interface on host. Using OVS, so either through that, or through KVM in some way? I could do PCI passthrough, but I was hoping to avoid that. -- Joachim From sir_Misiek1 at o2.pl Mon Nov 26 13:57:57 2018 From: sir_Misiek1 at o2.pl (lord_Niedzwiedz) Date: Mon, 26 Nov 2018 13:57:57 +0100 Subject: [PVE-User] Proxmox - CT problem In-Reply-To: References: Message-ID: ??? ??? Hi, I have a debian-9-turnkey-symfony_15.0-1_amd64 container. Which worked half a year well. Now, every now and then, the mysql disappears into me. How is this possible ? I do not touch or change anything. Any auto updates inside? The idea of what this may be caused. After restoring the base version, everything is ok, for a day, two and again it sits? ;-/ Linux walls 4.15.18-4-pve #1 SMP PVE 4.15.18-23 (Thu, 30 Aug 2018 13:04:08 +0200) x86_64 You have mail. root at walls ~# /etc/init.d/mysql restart [....] Restarting mysql (via systemctl): mysql.serviceFailed to restart mysql.service: Unit mysql.service not found. ?failed! root at walls ~# service mysqld restart Failed to restart mysqld.service: Unit mysqld.service not found. From d.csapak at proxmox.com Mon Nov 26 14:03:01 2018 From: d.csapak at proxmox.com (Dominik Csapak) Date: Mon, 26 Nov 2018 14:03:01 +0100 Subject: [PVE-User] Proxmox - CT problem In-Reply-To: References: Message-ID: On 11/26/18 1:57 PM, lord_Niedzwiedz wrote: > ??? ??? Hi, > I have a debian-9-turnkey-symfony_15.0-1_amd64 container. > Which worked half a year well. > Now, every now and then, the mysql disappears into me. > How is this possible ? > I do not touch or change anything. > Any auto updates inside? > The idea of what this may be caused. > After restoring the base version, everything is ok, for a day, two and > again it sits? ;-/ > > Linux walls 4.15.18-4-pve #1 SMP PVE 4.15.18-23 (Thu, 30 Aug 2018 > 13:04:08 +0200) x86_64 > You have mail. > root at walls ~# /etc/init.d/mysql restart > [....] Restarting mysql (via systemctl): mysql.serviceFailed to restart > mysql.service: Unit mysql.service not found. > ?failed! > root at walls ~# service mysqld restart > Failed to restart mysqld.service: Unit mysqld.service not found. > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user first, please write a new message to the mailing list instead of answering to an existing thread with a new topic second, it seems there was an issue with mysql and turnkeylinux https://www.turnkeylinux.org/blog/debian-secupdate-breaks-lamp-server From drbidwell at gmail.com Mon Nov 26 20:21:40 2018 From: drbidwell at gmail.com (Daniel BIdwell) Date: Mon, 26 Nov 2018 14:21:40 -0500 Subject: [PVE-User] Anyone using ubuntu juju with proxmox? Message-ID: <9d7bcafc493bb1bb9f5d12737359c651d2aba2b8.camel@gmail.com> Does anyone setup proxmox as the cloud infrastructure for using Canonical juju? I am currently using it with VMWare VSphere and would like to try it with proxmox. -- Daniel BIdwell From axel50397 at gmail.com Tue Nov 27 03:39:23 2018 From: axel50397 at gmail.com (Adnan RIHAN) Date: Mon, 26 Nov 2018 18:39:23 -0800 Subject: [PVE-User] Local interface on Promox server Message-ID: Yannis Milios yannis.milios at gmail.com wrote: > Is it really that difficult to get a laptop in the server room to manage > the servers? Well? You would be surprised ;) It?s not ??THAT?? complicated, except that currently we are in a country in crisis and buying a laptop only for that is currently overkill. We had a tech machine there, but had to use it for a point of sale. > Personally I would avoid installing a full Desktop environment on the PVE > hosts No choice. If it?s discouraged, then I won?t insist. Uwe Sauter uwe.sauter.de at gmail.com?wrote: > You could use > > qm terminal > > to connect to the serial console I?m not really looking for connecting a serial console, but would have liked an access to the desktop manager of a VM, as if it was possible to redirect the keyboard/video/mouse of my KVM console to an actual VM. As Yannis said, it seems to overcomplicate things. Then we will wait a budget to buy a mini-pc and install it with the desktop we want. BTW, I take advantage of the discussion about the serial port. In Congo it?s hard to find an original serial-to-usb cable to plug our PBX to one of our VMs. Prolific made an update and all our cables are not working anymore. This was the only way to easily plug our RS232 PBX to a VM. While waiting to find an original converter, is there a way to redirect a serial port (/dev/ttyS0) to a Windows VM, creating a COM port connected to the linux serial port? -- Regards, Adnan RIHAN GPG: 5675-62BA (https://keybase.io/max13/key.asc) ? If you are not using GPG/PGP but want to send me an encrypted e-mail: https://encrypt.to/0x567562BA. From ulrich.huber at heureka.co.at Tue Nov 27 08:30:40 2018 From: ulrich.huber at heureka.co.at (Ulrich Huber) Date: Tue, 27 Nov 2018 08:30:40 +0100 Subject: [PVE-User] Local interface on Promox server In-Reply-To: References: Message-ID: <04fb01d48623$197c2250$4c7466f0$@heureka.co.at> -----Urspr?ngliche Nachricht----- Von: pve-user [mailto:pve-user-bounces at pve.proxmox.com] Im Auftrag von Adnan RIHAN Gesendet: Dienstag, 27. November 2018 03:39 An: pve-user at pve.proxmox.com Betreff: Re: [PVE-User] Local interface on Promox server BTW, I take advantage of the discussion about the serial port. In Congo it?s hard to find an original serial-to-usb cable to plug our PBX to one of our VMs. Prolific made an update and all our cables are not working anymore. This was the only way to easily plug our RS232 PBX to a VM. While waiting to find an original converter, is there a way to redirect a serial port (/dev/ttyS0) to a Windows VM, creating a COM port connected to the linux serial port? [Ulrich Huber] Try the solution proposed on https://stackoverflow.com/questions/22624653/create-a-virtual-serial-port-connection-over-tcp It?s sharing the seriell port on the host-side via tcp/ip and connecting to it from your windows-guest. Same solution we use here with some AIT-devices, we share them via iscsi and connect from the guest.... -- Regards, Adnan RIHAN GPG: 5675-62BA (https://keybase.io/max13/key.asc) ? If you are not using GPG/PGP but want to send me an encrypted e-mail: https://encrypt.to/0x567562BA. _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From tonci at suma-informatika.hr Tue Nov 27 08:44:46 2018 From: tonci at suma-informatika.hr (=?UTF-8?B?VG9uxI1pIFN0aXBpxI1ldmnEhw==?=) Date: Tue, 27 Nov 2018 08:44:46 +0100 Subject: [PVE-User] cluster panic In-Reply-To: <6db4f34b-db9f-1001-777b-d4ee43a5b56f@suma-informatika.hr> References: <6db4f34b-db9f-1001-777b-d4ee43a5b56f@suma-informatika.hr> Message-ID: Hi? to all, I've just upgraded my lab-3node-HA-cluster from 5.2-10? to 5.2-12? and cluster got down. No node sees the other one? . Is there any way to troubleshoot this ? ************ Nov 27 08:42:09 pvesuma01 pvesr[32648]: trying to acquire cfs lock 'file-replication_cfg' ... Nov 27 08:42:10 pvesuma01 pvesr[32648]: error with cfs lock 'file-replication_cfg': no quorum! Nov 27 08:42:10 pvesuma01 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a Nov 27 08:42:10 pvesuma01 systemd[1]: Failed to start Proxmox VE replication runner. Nov 27 08:42:10 pvesuma01 systemd[1]: pvesr.service: Unit entered failed state. ************ Thank you very much in advance and BR Tonci // > > > > > From mityapetuhov at gmail.com Tue Nov 27 08:47:34 2018 From: mityapetuhov at gmail.com (Dmitry Petuhov) Date: Tue, 27 Nov 2018 10:47:34 +0300 Subject: [PVE-User] cluster panic In-Reply-To: References: <6db4f34b-db9f-1001-777b-d4ee43a5b56f@suma-informatika.hr> Message-ID: <7ae395e8-0635-6772-1907-a08d6937521b@gmail.com> Check that corosync is running on all nodes. And check cluster status with pvecm status 27.11.2018 10:44, Ton?i Stipi?evi? ?????: > Hi to all, > > I've just upgraded my lab-3node-HA-cluster from 5.2-10? to 5.2-12 and > cluster got down. No node sees the other one? . > > Is there any way to troubleshoot this ? > > > ************ > > > Nov 27 08:42:09 pvesuma01 pvesr[32648]: trying to acquire cfs lock > 'file-replication_cfg' ... > Nov 27 08:42:10 pvesuma01 pvesr[32648]: error with cfs lock > 'file-replication_cfg': no quorum! > Nov 27 08:42:10 pvesuma01 systemd[1]: pvesr.service: Main process > exited, code=exited, status=13/n/a > Nov 27 08:42:10 pvesuma01 systemd[1]: Failed to start Proxmox VE > replication runner. > Nov 27 08:42:10 pvesuma01 systemd[1]: pvesr.service: Unit entered > failed state. > > ************ > > > Thank you very much in advance and > > BR > > Tonci > > // >> >> >> >> >> > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From ronny+pve-user at aasen.cx Tue Nov 27 10:05:01 2018 From: ronny+pve-user at aasen.cx (Ronny Aasen) Date: Tue, 27 Nov 2018 10:05:01 +0100 Subject: [PVE-User] Local interface on Promox server In-Reply-To: References: Message-ID: On 27.11.2018 03:39, Adnan RIHAN wrote: > Yannis Milios yannis.milios at gmail.com wrote: >> Is it really that difficult to get a laptop in the server room to manage >> the servers? > > Well? You would be surprised ;) > > It?s not ??THAT?? complicated, except that currently we are in a > country in crisis and buying a laptop only for that is currently > overkill. We had a tech machine there, but had to use it for a point > of sale. > >> Personally I would avoid installing a full Desktop environment on the PVE >> hosts > > No choice. If it?s discouraged, then I won?t insist. > > Uwe Sauter uwe.sauter.de at gmail.com?wrote: >> You could use >> >> qm terminal >> >> to connect to the serial console > > I?m not really looking for connecting a serial console, but would have > liked an access to the desktop manager of a VM, as if it was possible > to redirect the keyboard/video/mouse of my KVM console to an actual > VM. > > As Yannis said, it seems to overcomplicate things. Then we will wait a > budget to buy a mini-pc and install it with the desktop we want. > > BTW, I take advantage of the discussion about the serial port. In > Congo it?s hard to find an original serial-to-usb cable to plug our > PBX to one of our VMs. Prolific made an update and all our cables are > not working anymore. This was the only way to easily plug our RS232 > PBX to a VM. > > While waiting to find an original converter, is there a way to > redirect a serial port (/dev/ttyS0) to a Windows VM, creating a COM > port connected to the linux serial port? > i use one of these. https://www.moxa.com/product/NPort_5650.htm The device gets an ip addres in a dedicated lan for serial port servers. an tcp/ip port is mapped to a serial port. so you can have many vm's with 1 serial each, or multiple serial ports on some vm's there is downloadable software for linux and windows. or you can use a opensource standardized tool on linux called socat. you can get them in various sizes depending on need. optinally you can make your own using a software called ser2net if you have a raspberrypi or similar micro machine. kind regards Ronny Aasen From tonci at suma-informatika.hr Tue Nov 27 10:52:59 2018 From: tonci at suma-informatika.hr (=?UTF-8?B?VG9uxI1pIFN0aXBpxI1ldmnEhw==?=) Date: Tue, 27 Nov 2018 10:52:59 +0100 Subject: [PVE-User] cluster panic In-Reply-To: References: <6db4f34b-db9f-1001-777b-d4ee43a5b56f@suma-informatika.hr> Message-ID: <556cc8eb-bd9d-7336-5d0a-f48c652464e7@suma-informatika.hr> No, corosync is not working on two nodes : Job for corosync.service failed because a timeout was exceeded. See "systemctl status corosync.service" and "journalctl -xe" for details. TASK ERROR: command 'systemctl start corosync' failed: exit code 1 orts TolUSNA Nov 27 10:47:01 pvesuma03 pvesr[16526]: trying to acquire cfs lock 'file-replication_cfg' ... Nov 27 10:47:02 pvesuma03 pmxcfs[2598]: [quorum] crit: quorum_initialize failed: 2 Nov 27 10:47:02 pvesuma03 pmxcfs[2598]: [confdb] crit: cmap_initialize failed: 2 Nov 27 10:47:02 pvesuma03 pmxcfs[2598]: [dcdb] crit: cpg_initialize failed: 2 Nov 27 10:47:02 pvesuma03 pmxcfs[2598]: [status] crit: cpg_initialize failed: 2 Nov 27 10:47:02 pvesuma03 pvesr[16526]: trying to acquire cfs lock 'file-replication_cfg' ... Nov 27 10:47:03 pvesuma03 pvesr[16526]: trying to acquire cfs lock 'file-replication_cfg' ... Nov 27 10:47:04 pvesuma03 pvesr[16526]: trying to acquire cfs lock 'file-replication_cfg' ... Nov 27 10:47:05 pvesuma03 pvesr[16526]: trying to acquire cfs lock 'file-replication_cfg' ... Nov 27 10:47:06 pvesuma03 pvesr[16526]: trying to acquire cfs lock 'file-replication_cfg' ... Nov 27 10:47:07 pvesuma03 pvesr[16526]: trying to acquire cfs lock 'file-replication_cfg' ... Nov 27 10:47:08 pvesuma03 pmxcfs[2598]: [quorum] crit: quorum_initialize failed: 2 Nov 27 10:47:08 pvesuma03 pmxcfs[2598]: [confdb] crit: cmap_initialize failed: 2 Nov 27 10:47:08 pvesuma03 pmxcfs[2598]: [dcdb] crit: cpg_initialize failed: 2 Nov 27 10:47:08 pvesuma03 pmxcfs[2598]: [status] crit: cpg_initialize failed: 2 Nov 27 10:47:08 pvesuma03 pvesr[16526]: trying to acquire cfs lock 'file-replication_cfg' ... Nov 27 10:47:09 pvesuma03 pvesr[16526]: trying to acquire cfs lock 'file-replication_cfg' ... Nov 27 10:47:10 pvesuma03 pvesr[16526]: error with cfs lock 'file-replication_cfg': no quorum! Nov 27 10:47:10 pvesuma03 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a Nov 27 10:47:10 pvesuma03 systemd[1]: Failed to start Proxmox VE replication runner. >> >> >> >> >> From tonci at suma-informatika.hr Tue Nov 27 11:41:27 2018 From: tonci at suma-informatika.hr (=?UTF-8?B?VG9uxI1pIFN0aXBpxI1ldmnEhw==?=) Date: Tue, 27 Nov 2018 11:41:27 +0100 Subject: [PVE-User] cluster panic In-Reply-To: <556cc8eb-bd9d-7336-5d0a-f48c652464e7@suma-informatika.hr> References: <6db4f34b-db9f-1001-777b-d4ee43a5b56f@suma-informatika.hr> <556cc8eb-bd9d-7336-5d0a-f48c652464e7@suma-informatika.hr> Message-ID: <64e187e9-5fa8-adb4-1339-7a3d05b85cdf@suma-informatika.hr> Tnx for help this thread solved everything https://forum.proxmox.com/threads/after-upgrade-to-5-2-11-corosync-does-not-come-up.49075/ // On 27. 11. 2018. 10:52, Ton?i Stipi?evi? wrote: > > No, corosync is not working on two nodes : > > Job for corosync.service failed because a timeout was exceeded. > See "systemctl status corosync.service" and "journalctl -xe" for details. > > TASK ERROR: command 'systemctl start corosync' failed: exit code 1 > > orts TolUSNA > Nov 27 10:47:01 pvesuma03 pvesr[16526]: trying to acquire cfs lock > 'file-replication_cfg' ... > Nov 27 10:47:02 pvesuma03 pmxcfs[2598]: [quorum] crit: > quorum_initialize failed: 2 > Nov 27 10:47:02 pvesuma03 pmxcfs[2598]: [confdb] crit: cmap_initialize > failed: 2 > Nov 27 10:47:02 pvesuma03 pmxcfs[2598]: [dcdb] crit: cpg_initialize > failed: 2 > Nov 27 10:47:02 pvesuma03 pmxcfs[2598]: [status] crit: cpg_initialize > failed: 2 > Nov 27 10:47:02 pvesuma03 pvesr[16526]: trying to acquire cfs lock > 'file-replication_cfg' ... > Nov 27 10:47:03 pvesuma03 pvesr[16526]: trying to acquire cfs lock > 'file-replication_cfg' ... > Nov 27 10:47:04 pvesuma03 pvesr[16526]: trying to acquire cfs lock > 'file-replication_cfg' ... > Nov 27 10:47:05 pvesuma03 pvesr[16526]: trying to acquire cfs lock > 'file-replication_cfg' ... > Nov 27 10:47:06 pvesuma03 pvesr[16526]: trying to acquire cfs lock > 'file-replication_cfg' ... > Nov 27 10:47:07 pvesuma03 pvesr[16526]: trying to acquire cfs lock > 'file-replication_cfg' ... > Nov 27 10:47:08 pvesuma03 pmxcfs[2598]: [quorum] crit: > quorum_initialize failed: 2 > Nov 27 10:47:08 pvesuma03 pmxcfs[2598]: [confdb] crit: cmap_initialize > failed: 2 > Nov 27 10:47:08 pvesuma03 pmxcfs[2598]: [dcdb] crit: cpg_initialize > failed: 2 > Nov 27 10:47:08 pvesuma03 pmxcfs[2598]: [status] crit: cpg_initialize > failed: 2 > Nov 27 10:47:08 pvesuma03 pvesr[16526]: trying to acquire cfs lock > 'file-replication_cfg' ... > Nov 27 10:47:09 pvesuma03 pvesr[16526]: trying to acquire cfs lock > 'file-replication_cfg' ... > Nov 27 10:47:10 pvesuma03 pvesr[16526]: error with cfs lock > 'file-replication_cfg': no quorum! > Nov 27 10:47:10 pvesuma03 systemd[1]: pvesr.service: Main process > exited, code=exited, status=13/n/a > Nov 27 10:47:10 pvesuma03 systemd[1]: Failed to start Proxmox VE > replication runner. > > >>> >>> >>> >>> >>> From luiscoralle at fi.uncoma.edu.ar Tue Nov 27 12:43:34 2018 From: luiscoralle at fi.uncoma.edu.ar (Luis G. Coralle) Date: Tue, 27 Nov 2018 08:43:34 -0300 Subject: [PVE-User] Local interface on Promox server In-Reply-To: References: Message-ID: And mount a vpn? I manage a remote proxmox via vpn El dom., 25 de nov. de 2018 a la(s) 11:20, Adnan RIHAN (axel50397 at gmail.com) escribi?: > Hi there, > > I?m using Proxmox for years on a remote server, and have recently > installed it in my company (so, in the same LAN). > > In our server room, we have 2 servers (both Proxmox) and 2 Synology > NAS, all these servers can only be managed by another client using a > web browser. We don?t have any client machine in the server room, so > when we fix something in the room (cables, routing, etc?), we need to > go out and check the VMs on another machine outside the room, > sometimes making us come back, etc? > > I know VMs can be controlled by command line using qemu, but is there > another way to locally control the machines on the Proxmox server, > except by installing a desktop manager and pointing the web browser on > localhost:8006? Is it even safe to do that? > > We have a KVM in our bay, we can physically access the machines, is > there maybe a way to physically be connected to a VM (as if we were > physically connected to a Windows VM for instance)? > > Thanks for your help. > -- > Regards, Adnan RIHAN > > GPG: 5675-62BA (https://keybase.io/max13/key.asc) > ? If you are not using GPG/PGP but want to send me an encrypted > e-mail: https://encrypt.to/0x567562BA. > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > -- Luis G. Coralle Secretar?a de TIC Facultad de Inform?tica Universidad Nacional del Comahue (+54) 299-4490300 Int 647 From mark at tuxis.nl Tue Nov 27 12:53:49 2018 From: mark at tuxis.nl (Mark Schouten) Date: Tue, 27 Nov 2018 12:53:49 +0100 Subject: [PVE-User] Migrating from LVM to ZFS Message-ID: <7e9d88b6cb7dc143c505eabb7a2a2e28@tuxis.nl> Hi, one of my colleagues mistakenly installed a Proxmox node with LVM instead of ZFS, and I want to fix that without reinstalling. I tested the following steps, which seem to be working as it should. But maybe somebody can think of something that I forgot. So I thought I'd share it here. Feel free to comment! /dev/sdb is the unused device, /dev/sda is the currently in-use device. root at proxmoxlvmzfs:~# apt install parted root at proxmoxlvmzfs:~# parted -s /dev/sdb mktable gpt root at proxmoxlvmzfs:~# parted -s /dev/sdb mkpart extended 34s 2047s root at proxmoxlvmzfs:~# parted -s /dev/sdb mkpart extended 2048s 100% root at proxmoxlvmzfs:~# parted -s /dev/sdb set 1 bios_grub on root at proxmoxlvmzfs:~# zpool create -f rpool /dev/sdb2 root at proxmoxlvmzfs:~# zfs create rpool/ROOT root at proxmoxlvmzfs:~# zfs create rpool/ROOT/pve-1 root at proxmoxlvmzfs:~# zfs create rpool/data root at proxmoxlvmzfs:~# zfs create rpool/swap -V 8G root at proxmoxlvmzfs:~# mkswap /dev/zvol/rpool/swap root at proxmoxlvmzfs:~# cd /rpool/ROOT/pve-1 root at proxmoxlvmzfs:/rpool/ROOT/pve-1# rsync -avx / ./ root at proxmoxlvmzfs:/rpool/ROOT/pve-1# mount --bind /proc proc root at proxmoxlvmzfs:/rpool/ROOT/pve-1# mount --bind /dev dev root at proxmoxlvmzfs:/rpool/ROOT/pve-1# mount --bind /sys sys root at proxmoxlvmzfs:/rpool/ROOT/pve-1# swapoff -a root at proxmoxlvmzfs:/rpool/ROOT/pve-1# chroot . ================ fstab fix ================ Change swap partition to /dev/zvol/rpool/swap Remove / mount entry ================ fstab fix ================ ================ grub fix ================ In /etc/default/grub, set: GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/pve-1 boot=zfs" ================ grub fix ================ root at proxmoxlvmzfs:/# zpool set bootfs=rpool/ROOT/pve-1 rpool root at proxmoxlvmzfs:/# grub-install /dev/sda root at proxmoxlvmzfs:/# grub-install /dev/sdb root at proxmoxlvmzfs:/# update-grub root at proxmoxlvmzfs:/# zfs set mountpoint=/ rpool/ROOT/pve-1 Reboot root at proxmoxlvmzfs:~# lvchange -an pve root at proxmoxlvmzfs:~# sgdisk /dev/sdb -R /dev/sda root at proxmoxlvmzfs:~# sgdisk -G /dev/sda root at proxmoxlvmzfs:~# zpool attach rpool /dev/sdb2 /dev/sda2 -- Mark Schouten Tuxis, Ede, https://www.tuxis.nl T: +31 318 200208? ? From gilberto.nunes32 at gmail.com Wed Nov 28 20:20:02 2018 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Wed, 28 Nov 2018 17:20:02 -0200 Subject: [PVE-User] PVE Cluster and iSCSI Message-ID: Hi there.... Is there any problem to use PVE cluster with iSCSI Direct or not ( I mean shared)? Thanks --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 From smr at kmi.com Wed Nov 28 22:09:38 2018 From: smr at kmi.com (Stefan M. Radman) Date: Wed, 28 Nov 2018 21:09:38 +0000 Subject: [PVE-User] PVE Cluster and iSCSI In-Reply-To: References: Message-ID: I am running a 3 node PVE cluster connected to shared storage with LVM. Two of the nodes are connected via 2x4GFC (direct-attach) and one via 2x1GbE iSCSI (switched). The only issue I experienced in the past was a failure to activate LVM volumes after a reboot on the iSCSI node. root at node03:~# systemctl --failed UNIT LOAD ACTIVE SUB DESCRIPTION ? lvm2-activation-net.service loaded failed failed Activation of LVM2 logical volumes It seems that the problem was caused by some delay in the iSCSI initialization that prevented activation of the logical volumes. After a node reboot I would have to go in and restart the service manually. Haven't seen this after the last update, so it might be gone. root at node03:~# pveversion pve-manager/5.2-12/ba196e4b (running kernel: 4.15.18-9-pve) Other than that: no problems Just make sure your iSCSI SAN is well designed (flow control, jumbo frames, isolation, multi pathing, ...) Stefan On Nov 28, 2018, at 8:20 PM, Gilberto Nunes > wrote: Hi there.... Is there any problem to use PVE cluster with iSCSI Direct or not ( I mean shared)? Thanks --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From mark at openvs.co.uk Thu Nov 29 01:23:41 2018 From: mark at openvs.co.uk (Mark Adams) Date: Thu, 29 Nov 2018 00:23:41 +0000 Subject: [PVE-User] PVE Cluster and iSCSI In-Reply-To: References: Message-ID: As long as you have access to the iSCSI storage from all nodes in the cluster then why not? On Wed, 28 Nov 2018 at 19:20, Gilberto Nunes wrote: > Hi there.... > > Is there any problem to use PVE cluster with iSCSI Direct or not ( I mean > shared)? > > Thanks > --- > Gilberto Nunes Ferreira > > (47) 3025-5907 > (47) 99676-7530 - Whatsapp / Telegram > > Skype: gilberto.nunes36 > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From gilberto.nunes32 at gmail.com Thu Nov 29 02:05:44 2018 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Wed, 28 Nov 2018 23:05:44 -0200 Subject: [PVE-User] PVE Cluster and iSCSI In-Reply-To: References: Message-ID: yes... but it's work only with LVEM Over iSCSI or can I access direct from all nodes? --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 Em qua, 28 de nov de 2018 ?s 22:24, Mark Adams escreveu: > As long as you have access to the iSCSI storage from all nodes in the > cluster then why not? > > On Wed, 28 Nov 2018 at 19:20, Gilberto Nunes > wrote: > > > Hi there.... > > > > Is there any problem to use PVE cluster with iSCSI Direct or not ( I mean > > shared)? > > > > Thanks > > --- > > Gilberto Nunes Ferreira > > > > (47) 3025-5907 > > (47) 99676-7530 - Whatsapp / Telegram > > > > Skype: gilberto.nunes36 > > _______________________________________________ > > pve-user mailing list > > pve-user at pve.proxmox.com > > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From mark at openvs.co.uk Thu Nov 29 02:13:45 2018 From: mark at openvs.co.uk (Mark Adams) Date: Thu, 29 Nov 2018 01:13:45 +0000 Subject: [PVE-User] PVE Cluster and iSCSI In-Reply-To: References: Message-ID: Are you are using some iscsi setup that manages the luns independently for each VM? then take a look at this link. https://pve.proxmox.com/wiki/Storage:_User_Mode_iSCSI There has to be some method for creating the VM partitions - this is why LVM is preferred as the option if you give it an entire iscsi target. On Thu, 29 Nov 2018 at 01:06, Gilberto Nunes wrote: > yes... but it's work only with LVEM Over iSCSI or can I access direct from > all nodes? > --- > Gilberto Nunes Ferreira > > (47) 3025-5907 > (47) 99676-7530 - Whatsapp / Telegram > > Skype: gilberto.nunes36 > > > > > > Em qua, 28 de nov de 2018 ?s 22:24, Mark Adams > escreveu: > > > As long as you have access to the iSCSI storage from all nodes in the > > cluster then why not? > > > > On Wed, 28 Nov 2018 at 19:20, Gilberto Nunes > > > wrote: > > > > > Hi there.... > > > > > > Is there any problem to use PVE cluster with iSCSI Direct or not ( I > mean > > > shared)? > > > > > > Thanks > > > --- > > > Gilberto Nunes Ferreira > > > > > > (47) 3025-5907 > > > (47) 99676-7530 - Whatsapp / Telegram > > > > > > Skype: gilberto.nunes36 > > > _______________________________________________ > > > pve-user mailing list > > > pve-user at pve.proxmox.com > > > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > _______________________________________________ > > pve-user mailing list > > pve-user at pve.proxmox.com > > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From sleemburg at it-functions.nl Fri Nov 30 16:51:50 2018 From: sleemburg at it-functions.nl (Stephan Leemburg) Date: Fri, 30 Nov 2018 16:51:50 +0100 Subject: [PVE-User] lxc hang situation Message-ID: <20181130155150.zpwsfn3v7mz5o2yz@daruma.hachimitsu.nl> Hi @proxmox, Since some months we are experiencing frequent 'hang' situations on our proxmox nodes. Today, again, such a situation occured. So we took some time to look at the situation on hand. The situation 'started' when we did a pct start 1310 This did not return. And looking at the process list showed that we had this: 21462 ? Ss 0:00 /usr/bin/lxc-start -n 1310 21619 ? Z 0:00 \_ [lxc-start] 21758 ? Ss 0:00 [lxc monitor] /var/lib/lxc 1310 24681 ? D 0:00 \_ [lxc monitor] /var/lib/lxc 1310 situation. When looking at the wait-channel, the namespaces and the stack of 24681 we noticed that it was blocked in [<0>] copy_net_ns+0x After some more searching, we found with grep copy_net_ns /proc/[0-9]*/stack that there where 2 more processes also blocked on copy_net_ns. These where two ionclean processes in other containers. Killing them (with -9) showed that restarted ionclean processes immediatly blocked again on copy_net_ns. The system on which proxmox is running has 2 Intel(R) Xeon(R) CPU E5-2690 v4 CPU's with 14 cores and 28 threads. In proxmox with multithreading this shows as 56 cpu's. So real concurrency is possible. The problem seems like a race condition on some resource. But killing (with -9) all the processes that are hanging on copy_net_ns does not make the kernel release the contented resource. After killing all the processes on copy_net_ns and with no process having a stack showing copy_net_ns, starting a new container immediately blocks again on copy_net_ns. So only a reboot (as far as we know) solves this. We played around with ip li set netns, on the veth devices, etc. but we could not get the machine out of this situation in any way other then reboot. Based on all this we found that in https://github.com/lxc/lxd/issues/4468 it says that this problem should be solved in kernel 4.17. We run the latest proxmox enterprise updates on this machine and it's kernel is PVE 4.15.18-30 (Thu, 15 Nov 2018 13:32:46 +0100) As the kernel is ubuntu based would it be possible to start using the ubuntu 18.10 kernel which is 4.18 to get around this problem? -- Kind regards, Stephan Leemburg IT Functions