Systemd timeout stopping boinc-client

Message boards : Questions and problems : Systemd timeout stopping boinc-client
Message board moderation

To post messages, you must log in.

AuthorMessage
Section8

Send message
Joined: 25 Jul 23
Posts: 5
Message 112358 - Posted: 25 Jul 2023, 12:16:19 UTC

Hello,
I am running boinc 7.22.1 in Arch linux, running tasks from Einstein@Home.

My boinc-client is running as a systemd service. Whenever I reboot or shutdown my system, this systemd message appears: "A stop job is running for Berkeley Open Infrastructure Network Computing Client", along with a timer that counts down 60 seconds before the shutdown continues.

If I enter "systemctl stop boinc-client.service" in a terminal, I don't see the message or the timer, but the command blocks for 60 seconds. However, if I have the Boinc Manager open on the Tasks tab when I enter the stop command, the tasks all disappear immediately.

I have mitigated this by setting "TimeoutStopSec=10" in the systemd boinc-client.service unit file, but I would like to get rid of this timeout. Has anyone else seen anything like this?
ID: 112358 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 112360 - Posted: 25 Jul 2023, 14:37:42 UTC - in response to Message 112358.  

BOINC (internally) has two modes for stopping the client;

The "polite" one, where it issues a 'request' to the client to shut itself down. That allows any exit dialogs to be shown, all output files to be flushed to disk and closed, log files likewise, and so on. Only then does the client report back to the calling program that all is complete, and that it's safe to continue with the closedown without risk of data loss or damage.

The other mode is much ruder (it has been described by a researcher/system admin as "terminate with extreme prejudice"). It forces the client to stop immediately, without regard to what it's working on. I would keep this one for emergencies only.

Having said that, it's unusual for the 'polite' call to delay things by as much as a minute. It might be a 'feature' of a science project you're running (aka 'bug'): while it's closing down, the client goes through a similar call-and-response with each of the science apps it's started. Boinc will only shut itself down after every science app has reported a safe closedown. If just one of the science apps is badly behaved, and doesn't hear the call to close, that might have the effect you're describing.
ID: 112360 · Report as offensive
Profile Keith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 869
United States
Message 112370 - Posted: 26 Jul 2023, 2:23:02 UTC

I see this on a few projects for cpu apps. I have to wait approx. 15-30 seconds for all the universe tasks to finish up in the system monitor and flush out after exiting the client. Then I can feel confident all the apps and tasks have written out all the result files.
ID: 112370 · Report as offensive
Section8

Send message
Joined: 25 Jul 23
Posts: 5
Message 112374 - Posted: 26 Jul 2023, 11:39:00 UTC

Thank you for the responses. I'm not sure if this proves anything, but for troubleshooting, if I first remove that "TimeoutStopSec" setting from the boinc unit file, then, in the boinc manager, do Activity->Suspend, and then issue the SYSTEMCTL STOP command, it still blocks for 60 sec.

I would like to look at the boinc logs after the systemctl stop to see it they show anything useful, but in the boinc manager, Tools->Event Log is grayed out after the systemctl stop.

I only recently started running Arch linux. For years, in my previous distro (gentoo), I was running boinc Einstein@Home tasks, but wasn't controlling the client with Systemd. I'm not sure now how the boinc-client was terminated when I shut down my PC, but I never noticed any delay. Maybe the boinc tasks were terminated "with prejudice".
ID: 112374 · Report as offensive
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2538
United Kingdom
Message 112375 - Posted: 26 Jul 2023, 13:11:57 UTC

stdoutdae.txt will let you see the information that would be in the event log. I open it with the kate text editor and have it auto-reload. I don't know which other text editors will do that. I have been using it recently when the event log was refusing to display with my Windows client running under WINE.
ID: 112375 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 112385 - Posted: 26 Jul 2023, 21:14:39 UTC

If he's using systemd, the old logs will be in the system journal, not a text file.

Like many Linux functions, there's a hugely complex set of options, too many to list here.

I refer to somewhere like https://man7.org/linux/man-pages/man1/journalctl.1.html - you need to filter it down to the boinc unit, and choose a tight time-frame. But it's all there.
ID: 112385 · Report as offensive
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2538
United Kingdom
Message 112386 - Posted: 27 Jul 2023, 5:32:57 UTC - in response to Message 112385.  

If he's using systemd, the old logs will be in the system journal, not a text file.
I have both on my Linux machine. (That is on both the native Linux client and the Windows one running under WINE.)
ID: 112386 · Report as offensive
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2538
United Kingdom
Message 112389 - Posted: 27 Jul 2023, 10:22:47 UTC - in response to Message 112386.  

I have both on my Linux machine. (That is on both the native Linux client and the Windows one running under WINE.)
Just wondering if that is anything to do with compiling my own client rather than using the packaged one?
ID: 112389 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 112390 - Posted: 27 Jul 2023, 11:44:18 UTC - in response to Message 112389.  

I have both on my Linux machine. (That is on both the native Linux client and the Windows one running under WINE.)
Just wondering if that is anything to do with compiling my own client rather than using the packaged one?
Quite likely. My initial installation/setup was done from Gianfanco's PPA, which I think sets up the full package of hooks into the host operating system. I've done binary upgrades since then, but just by replacing individual files - not fiddling with the package structure.
ID: 112390 · Report as offensive
Section8

Send message
Joined: 25 Jul 23
Posts: 5
Message 112404 - Posted: 29 Jul 2023, 17:49:50 UTC

Sorry so late responding. Below is my boinc systemd log from shutting down yesterday. I have the unit timeout set to 10 seconds instead of 60 seconds. I don't see anything useful here, but I notice in the boinc manager Event Log Options, there are a lot of log messages I could enable if they would help here (currently I have the defaults of file_xfer, task, and sched_ops enabled)

Jul 28 21:29:21 officepc systemd[1]: Stopping Berkeley Open Infrastructure Network Computing Client...
Jul 28 21:29:21 officepc boinc[776]: 28-Jul-2023 21:29:21 [---] Received signal 15
Jul 28 21:29:22 officepc boinc[776]: 28-Jul-2023 21:29:22 [---] Exiting
Jul 28 21:29:31 officepc systemd[1]: boinc-client.service: State 'stop-sigterm' timed out. Killing.
Jul 28 21:29:31 officepc systemd[1]: boinc-client.service: Killing process 776 (boinc) with signal SIGKILL.
Jul 28 21:29:31 officepc systemd[1]: boinc-client.service: Killing process 1873 (hsgamma_FGRP5_1) with signal SIGKILL.
Jul 28 21:29:31 officepc systemd[1]: boinc-client.service: Killing process 1874 (hsgamma_FGRP5_1) with signal SIGKILL.
Jul 28 21:29:31 officepc systemd[1]: boinc-client.service: Killing process 1875 (hsgamma_FGRP5_1) with signal SIGKILL.
Jul 28 21:29:31 officepc systemd[1]: boinc-client.service: Killing process 1876 (hsgamma_FGRP5_1) with signal SIGKILL.
Jul 28 21:29:31 officepc systemd[1]: boinc-client.service: Killing process 1877 (hsgamma_FGRP5_1) with signal SIGKILL.
Jul 28 21:29:31 officepc systemd[1]: boinc-client.service: Killing process 1878 (hsgamma_FGRP5_1) with signal SIGKILL.
Jul 28 21:29:31 officepc systemd[1]: boinc-client.service: Main process exited, code=killed, status=9/KILL
Jul 28 21:29:31 officepc systemd[1]: boinc-client.service: Failed with result 'timeout'.
Jul 28 21:29:31 officepc systemd[1]: Stopped Berkeley Open Infrastructure Network Computing Client.
Jul 28 21:29:31 officepc systemd[1]: boinc-client.service: Consumed 2h 59min 12.546s CPU time.
ID: 112404 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 112406 - Posted: 29 Jul 2023, 19:57:22 UTC - in response to Message 112404.  

To me, that looks as if systemd tries first with sigterm - I think that's the 'polite' one - and waits 10 seconds. But BOINC doesn't respond, so it moves on to the aggressive one - sigkill.

Is hsgamma_FGRP5_1 one of the Einstein@Home applications? They're usually pretty well behaved. It might be worth trying again with a longer delay - see if any more of the process becomes visible in the log.
ID: 112406 · Report as offensive
Section8

Send message
Joined: 25 Jul 23
Posts: 5
Message 112411 - Posted: 30 Jul 2023, 21:08:03 UTC - in response to Message 112406.  

Yes, those are Einstein@Home. When I first started running boinc on this system, a couple of weeks ago, that was a 60 second delay instead of 10 seconds. I figured out I could set "TimeoutStopSec=10" in the systemd unit to at least reduce the delay at shutdown/reboot.
ID: 112411 · Report as offensive
hadron

Send message
Joined: 5 Sep 22
Posts: 29
Canada
Message 112482 - Posted: 6 Aug 2023, 19:47:10 UTC - in response to Message 112385.  

If he's using systemd, the old logs will be in the system journal, not a text file.

Boinc still logs to stdoutdae.txt
ID: 112482 · Report as offensive
brezzsent

Send message
Joined: 21 Aug 23
Posts: 1
Message 112569 - Posted: 21 Aug 2023, 11:30:12 UTC

I would like to look at the boinc logs after the systemctl stop to see it they show anything useful, but in the boinc manager, Tools->Event Log is grayed out after the systemctl stop.
ID: 112569 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 112570 - Posted: 21 Aug 2023, 12:47:41 UTC - in response to Message 112569.  

I would like to look at the boinc logs after the systemctl stop to see it they show anything useful, but in the boinc manager, Tools->Event Log is grayed out after the systemctl stop.
You can look at the 'back numbers' from the Event Log with a command like:

journalctl -b --unit=boinc-client
As it stands, that will go back as the last reboot and show you everything since then - you may want to redirect the output to a file. Journalctl, like all Linux commands, has an enormous number of other options, but that should be enough to get you started.
ID: 112570 · Report as offensive

Message boards : Questions and problems : Systemd timeout stopping boinc-client

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.