[pve-devel] [PATCH] rbd : free_image : retry if rbd has watchers

Mon Dec 8 04:42:23 CET 2014

Hi Alexandre,

We are still experiencing corruption when performing live disk moves, even
with this patch in place.

Is there anything we can do to help pinpoint the cause of this ?

On Wed, Nov 12, 2014 at 7:39 PM, Alexandre DERUMIER <aderumier at odiso.com>
wrote:

> Ok, Great!
>
> Thanks for testing.
>
> (@cc pve-devel)
>
> ----- Mail transféré -----
>
> De: "Andrew Thrift" <andrew at networklabs.co.nz>
> À: "Alexandre DERUMIER" <aderumier at odiso.com>
> Envoyé: Mercredi 12 Novembre 2014 05:10:40
> Objet: Re: [pve-devel] [PATCH] rbd : free_image : retry if rbd has watchers
>
>
> Hi Alexandre,
>
>
> Initial testing looks promising.
>
>
> I have tested migrating disks that have active writes on them and it
> worked well. All files had matching md5 sums.
>
>
> I will test with 4K writes tomorrow.
>
>
> On Fri, Nov 7, 2014 at 10:42 PM, Andrew Thrift < andrew at networklabs.co.nz
> > wrote:
>
>
>
> Thanks Alexandre,
>
>
> I will try these first thing Monday.
>
>
> Have a good weekend !
>
>
>
>
>
>
>
>
> On Fri, Nov 7, 2014 at 10:29 PM, Alexandre DERUMIER < aderumier at odiso.com
> > wrote:
>
> <blockquote>
> >>Do you know why online Disk Move's could be causing this corruption ? We
> have had to stop using it as if we corrupt a customers DB server it would
> not be a good thing.... :(
>
> Can you try to 2 patchs I have sent ? I think it should fix the problem.
>
>
> ----- Mail original -----
>
> De: "Andrew Thrift" < andrew at networklabs.co.nz >
> À: "Alexandre DERUMIER" < aderumier at odiso.com >
> Envoyé: Jeudi 6 Novembre 2014 21:18:19
> Objet: Re: [pve-devel] [PATCH] rbd : free_image : retry if rbd has watchers
>
>
>
>
> HI Alexandre,
>
>
> Not related specifically to this patch. But using DIsk Move while the VM
> is online results in corruption for us almost every time we use it.
>
>
> We are using PVE3.3 with RBD storage. Typically we are moving from one RBD
> pool to another. We seem to get coorruption if the block copy completes or
> fails.
>
>
> We are primarily running Windows guest OS's with virtio or virtio-scsi
> disks.
>
>
> Our Ceph cluster has 84 spinning disks and 7x Intel S3700 Journal's.
> Networking to all devices is 2x10gigabit bonded and performance generally
> is very good.
>
>
>
>
> Do you know why online Disk Move's could be causing this corruption ? We
> have had to stop using it as if we corrupt a customers DB server it would
> not be a good thing.... :(
>
>
>
>
> On Fri, Nov 7, 2014 at 5:00 AM, Alexandre DERUMIER < aderumier at odiso.com
> > wrote:
>
>
> I'll resend a V2 tommorow
> ----- Mail original -----
>
> De: "Dietmar Maurer" < dietmar at proxmox.com >
> À: "Alexandre DERUMIER" < aderumier at odiso.com >
> Cc: pve-devel at pve.proxmox.com
> Envoyé: Jeudi 6 Novembre 2014 16:08:38
> Objet: RE: [pve-devel] [PATCH] rbd : free_image : retry if rbd has watchers
>
>
>
> > >>And what happens if we get other errors?
> >
> > Currently It's retrying until $i > ~0
> >
> > but we could add a die directly if $err !~ image still has watchers
>
> Yes, I think that would be better.
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
>
>
> </blockquote>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.proxmox.com/pipermail/pve-devel/attachments/20141208/cb05bb56/attachment.htm>