[pve-devel] Blacklisting HP hardware watchdog timer module ?

Alexandre DERUMIER aderumier at odiso.com
Thu Dec 3 18:24:40 CET 2015


I just found a strange bug with ipmi_watchdog,  dell openmanage related

at boot the timeout is correclty setup to 10s

root at kvmtest1 ~ # ipmitool mc watchdog get
Watchdog Timer Use:     SMS/OS (0x44)
Watchdog Timer Is:      Started/Running
Watchdog Timer Actions: Hard Reset (0x01)
Pre-timeout interval:   0 seconds
Timer Expiration Flags: 0x10
Initial Countdown:      10 sec
Present Countdown:      9 sec


but after some minutes (5-10min),
I'm seeing it at 480s

# ipmitool mc watchdog get
Watchdog Timer Use:     SMS/OS (0xc4)
Watchdog Timer Is:      Started/Running
Watchdog Timer Actions: No action (0x00)
Pre-timeout interval:   0 seconds
Timer Expiration Flags: 0x10
Initial Countdown:      480 sec
Present Countdown:      479 sec


In the dell openmanage, I'm seeing a reset configuration option at 480s.

(I think it's the openmanage service which overwrite the value).

I'll add a note in the wiki about this too.


----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "dietmar" <dietmar at proxmox.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Jeudi 3 Décembre 2015 17:48:14
Objet: Re: [pve-devel] Blacklisting HP hardware watchdog timer module ?

>>The timeout must be 60 seconds!! Never change that. 
>> 
>>We set the timeout to 60s when we start watchdog-mux. 
Ah ok. I thinked we need to define it manually 

What is the difference between this 2 timeout ? 

+ int watchdog_timeout = 10; 
+ int client_watchdog_timeout = 60; 


ipmitool give me 10s, so it's seem to works fine :) 
# ipmitool mc watchdog get 
Initial Countdown: 10 sec 




> Another question, I have done some tests 2weeks ago with a customer, 
> and I think I had some problem, if the node reboot too fast 
> (pve-ha-manager see the node down, but it's coming up again before the vm was 
> migrated). 
> Is it a known bug ? 

>>What bug exactly? 
I don't remember exactly, but lrm or crm was stuck, because node (and vms) had rebooted too fast. 

I don't have access to customer logs sorry. 



----- Mail original ----- 
De: "dietmar" <dietmar at proxmox.com> 
À: "aderumier" <aderumier at odiso.com> 
Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Jeudi 3 Décembre 2015 17:28:55 
Objet: Re: [pve-devel] Blacklisting HP hardware watchdog timer module ? 

> BTW, what is the best timeout for the watchdog ? 
> I think that pve ha manager wait for around 1min before migrating vm ? 
> if yes, the watchdog timeout should be lower ? 

The timeout must be 60 seconds!! Never change that. 

We set the timeout to 60s when we start watchdog-mux. 

> Another question, I have done some tests 2weeks ago with a customer, 
> and I think I had some problem, if the node reboot too fast 
> (pve-ha-manager see the node down, but it's coming up again before the vm was 
> migrated). 
> Is it a known bug ? 

What bug exactly? 
_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 




More information about the pve-devel mailing list