[pve-devel] applied: Re: [RFC ha-manager] explicitly sync journal when disabling watchdog updates

Thomas Lamprecht t.lamprecht at proxmox.com
Wed Jun 21 05:55:02 CEST 2017


Just for the record: this got applied to master by Dietmar

On 05/23/2017 02:35 PM, Thomas Lamprecht wrote:
> Without syncing the journal could loose logs for a small interval (ca
> 10-60 seconds), but these last seconds are really interesting for
> analyzing the cause of a triggered watchdog.
>
> Also without this often the
>> "client did not stop watchdog - disable watchdog updates"
> messages wasn't flushed to persistent storage and so some users had a
> hard time to figure out why the machine reset.
>
> Use the '--sync' switch of journalctl which - to quote its man page -
> "guarantees that any log messages written before its invocation are
> safely stored on disk at the time it returns."
>
> Use execl to call `journalctl --sync` in a child process, do not care
> for any error checks or recovery as we will be reset anyway. This is
> just a hit or miss try to log the situation more consistently, if it
> fails we cannot really do anything anyhow.
>
> We call the function on two points:
> a) if we exit with active connections, here the watchdog will be
>     triggered soon and we want to ensure that this is logged.
> b) if a client closes the connection without sending the magic close
>     byte, here the watchdog would trigger while we hang in epoll at
>     the beginning of the loop, so sync the log here also.
>
> Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
> ---
>   src/watchdog-mux.c | 20 +++++++++++++++++++-
>   1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/src/watchdog-mux.c b/src/watchdog-mux.c
> index 7367077..a10187e 100644
> --- a/src/watchdog-mux.c
> +++ b/src/watchdog-mux.c
> @@ -27,6 +27,8 @@
>   
>   #define WATCHDOG_DEV "/dev/watchdog"
>   
> +#define JOURNALCTL_BIN "/bin/journalctl"
> +
>   int watchdog_fd = -1;
>   int watchdog_timeout = 10;
>   int client_watchdog_timeout = 60;
> @@ -98,7 +100,21 @@ watchdog_close(void)
>   
>       watchdog_fd = -1;
>   }
> -
> +
> +static void
> +sync_journal_unsafe(void)
> +{
> +
> +    pid_t child = fork();
> +
> +    // do not care about fork error or collecting the childs exit status,
> +    // we are resetting soon anyway and just want to sync out the journal
> +    if (child == 0) {
> +	execl(JOURNALCTL_BIN, JOURNALCTL_BIN, "--sync", NULL);
> +	exit(-1);
> +    }
> +}
> +
>   int
>   main(void)
>   {
> @@ -327,6 +343,7 @@ main(void)
>   
>                           if (!wd_client->magic_close) {
>                               fprintf(stderr, "client did not stop watchdog - disable watchdog updates\n");
> +                            sync_journal_unsafe();
>                               update_watchdog = 0;
>                           } else {
>                               free_client(wd_client);
> @@ -346,6 +363,7 @@ main(void)
>       int active_count = active_client_count();
>       if (active_count > 0) {
>           fprintf(stderr, "exit watchdog-mux with active connections\n");
> +        sync_journal_unsafe();
>       } else {
>           fprintf(stderr, "clean exit\n");
>           watchdog_close();





More information about the pve-devel mailing list