[pve-devel] [RFC ha-manager] explicitly sync journal when disabling watchdog updates

Thomas Lamprecht t.lamprecht at proxmox.com
Tue May 23 14:35:38 CEST 2017


Without syncing the journal could loose logs for a small interval (ca
10-60 seconds), but these last seconds are really interesting for
analyzing the cause of a triggered watchdog.

Also without this often the
> "client did not stop watchdog - disable watchdog updates"
messages wasn't flushed to persistent storage and so some users had a
hard time to figure out why the machine reset.

Use the '--sync' switch of journalctl which - to quote its man page -
"guarantees that any log messages written before its invocation are
safely stored on disk at the time it returns."

Use execl to call `journalctl --sync` in a child process, do not care
for any error checks or recovery as we will be reset anyway. This is
just a hit or miss try to log the situation more consistently, if it
fails we cannot really do anything anyhow.

We call the function on two points:
a) if we exit with active connections, here the watchdog will be
   triggered soon and we want to ensure that this is logged.
b) if a client closes the connection without sending the magic close
   byte, here the watchdog would trigger while we hang in epoll at
   the beginning of the loop, so sync the log here also.

Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
---
 src/watchdog-mux.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/src/watchdog-mux.c b/src/watchdog-mux.c
index 7367077..a10187e 100644
--- a/src/watchdog-mux.c
+++ b/src/watchdog-mux.c
@@ -27,6 +27,8 @@
 
 #define WATCHDOG_DEV "/dev/watchdog"
 
+#define JOURNALCTL_BIN "/bin/journalctl"
+
 int watchdog_fd = -1;
 int watchdog_timeout = 10;
 int client_watchdog_timeout = 60;
@@ -98,7 +100,21 @@ watchdog_close(void)
 
     watchdog_fd = -1;
 }
- 
+
+static void
+sync_journal_unsafe(void)
+{
+
+    pid_t child = fork();
+
+    // do not care about fork error or collecting the childs exit status,
+    // we are resetting soon anyway and just want to sync out the journal
+    if (child == 0) {
+	execl(JOURNALCTL_BIN, JOURNALCTL_BIN, "--sync", NULL);
+	exit(-1);
+    }
+}
+
 int 
 main(void)
 {
@@ -327,6 +343,7 @@ main(void)
 
                         if (!wd_client->magic_close) {
                             fprintf(stderr, "client did not stop watchdog - disable watchdog updates\n");
+                            sync_journal_unsafe();
                             update_watchdog = 0;
                         } else {
                             free_client(wd_client);
@@ -346,6 +363,7 @@ main(void)
     int active_count = active_client_count();
     if (active_count > 0) {
         fprintf(stderr, "exit watchdog-mux with active connections\n");
+        sync_journal_unsafe();
     } else {
         fprintf(stderr, "clean exit\n");
         watchdog_close();
-- 
2.11.0





More information about the pve-devel mailing list