How to automatically restart a linux background process if it fails?
In certain conditions, myprocess can fail and return. Is there any (standard) way how to detect its fail and restart in automatically?
Sure, but it varies based on distribution. Pretty much all of them provide some kind of service manager.
9 Answers 9
The easiest way would be to add it to /etc/inittab, which is designed to do this sort of thing:
respawn If the process does not exist, start the process. Do not wait for its termination (continue scanning the /etc/inittab file). Restart the process when it dies. If the process exists, do nothing and continue scanning the /etc/inittab file.
For example, you could do this:
# Run my stuff myprocess:2345:respawn:/bin/myprocess
Note, /etc/inittab works (or even exists) if and only if you have a sysvinit-based init system. With upstart and with systemd it doesn’t. You have to install either busybox (very primitive shell making the sysadm recover tasks painful, but it can substitute a sysvinit-compatible initd) or sysvinit (it is a fossil). In a docker container, only the first is not painful.
Buildroot has three possible init systems, so there are three ways to do this:
BusyBox init
With this, one adds an entry to /etc/inittab .
Note that BusyBox init has an idiosyncratic /etc/inittab format. The second field is meaningless, and the first field is not an ID but a device basename.
Linux «System V» init
Again, one adds an entry to /etc/inittab .
myprocess:2345:respawn:/bin/myprocess
systemd
One writes a unit file in, say, /etc/systemd/system/myprocess.service :
[Unit] Description=My Process [Service] ExecStart=/bin/myprocess Restart=always [Install] WantedBy=multi-user.target
Enable this to autostart at bootup with:
systemctl enable myprocess.service
systemctl start myprocess.service
Further reading
but when you use this approach inittab then your process is no longer accessible via the ‘service’ interface right? i.e you can’t go service mything start or service mything stop anymore. is there a way to have the best of both? i.e uncrashable sysvinit service, but also have it usable via ‘service’ ?
Running systemd, so the obvious choice I would have is #3 above. Yet for some unknown reason, my nohup ffmpeg . & daemon won’t save its files when launched this way, so I have to start it manually for it to work. Out of 4 similar daemons, rarely do all four continue 24 hours. Usually one fails for unknown reason[s] and sometime two, meaning I really need this auto-restart. I’m upvoting one of the options below until I should find it doesn’t work.
What about creating a subshell with a loop that calls constantly the same process?
If it ends, the next iteration of the loop goes on and starts it again.
(while true; do /bin/myprocess done) &
If the subshell dies, it’s over though. The only possibility in that case would be to create another process (I’ll call it necromancer) that checks whether yourprocess is alive, start it if it isn’t and run this necromancer with cron, so that you can check that regularly.
Next step would be wondering what could happen if cron dies, but at some point you should feel safe and stop worrying.
You could make use of Monit . It’s really easy to use and quite flexible. See for example this configuration for restarting the Tomcat process on failure.
check process tomcat with pidfile /var/run/tomcat.pid start program = "/etc/init.d/tomcat start" stop program = "/etc/init.d/tomcat stop" if failed port 8080 type tcp then restart
It also has a lot of configuration examples for many use cases.
start) restarter -c /bin/myprocess & stop) pkill -f myprocess
On newer systems use systemd which solves all those trivial issues
If you’re not a super user or root, and if your Linux system has Docker installed, then you can create a docker image of your process, using docker to restart your process if the system is rebooted.
version: "3" services: lserver: image: your_docker_image:latest ports: - 8080:8080 # just use 8080 as an example restart: always # this is where your process can be guaranteed to restart
To start your docker container,
I find it’s easy to handle my-own-process with auto-restart if I am not a super user of the system.
For a sample example of how to create a docker image, here is a quick example:
FROM alpine:3.5 RUN apk update && apk upgrade && rm -rf /var/cache/apk/* WORKDIR /app COPY my-process-server /app RUN ln -s /app/my-process-server /usr/local/bin/my-process-server EXPOSE 8080 CMD ["my-process-server"]
He is asking about init.d scripts. Adding docker on top is really not the tool to auto restart a daemon. It’s like automating a computer power cycle if the daemon stops.
init.d scripts aren’t available to non-root users. This one is prefaced by being a possibility for those who do not have that access. A cron job to run testing scripts would work as well though.
In my case, as a quick-fix, I modified and used the solution of @Trylks to wrap the program I was launching. I wanted it to end only on clean exit.
Should run in most shells:
#!/bin/sh echo "" echo "Use: $0 ./program" echo "" #eg="/usr/bin/apt update" echo "Executing $1 . " EXIT_CODE=1 (while [ $EXIT_CODE -gt 0 ]; do $1 # loops on error code: greater-than 0 EXIT_CODE=$? done) &
(Edit): Sometimes programs hang without quitting, for no apparent reason. (Yes, of course there’s always a reason but it can take a lot of time and effort to find it, particularly if it’s not your own code.)
The problem I had was that the process (a Python server) was hanging from time to time, so I needed to regularly kill and restart it. I did it with a cron task that runs every couple of hours. Here’s the shell script:
#!/bin/sh # This cron script restarts the server # Look for a running instance of myapp.py p=$(ps -eaf | grep "[m]yapp.py") # Get the second item; the process number n=$(echo $p | awk '') # If it's not empty, kill the process if [ "$n" ] then kill $n fi # Start a new instance python3 myapp.py
How to restart (or reset) a running process in linux
I have two Linux systems communicating over sockets (Desktop and ARM-based development board). I want to restart (or reset) my client application (running on a development board) when server sends a particular predefined message. I don’t want to restart (reboot) Linux, I just want that client application restart itself automatically. I am unable to understand how it should be done.
5 Answers 5
Make your client exec /proc/self/exe when it receives that paticular message. You don’t need to know where the executable actually resides in the file system. And you can reuse main() ‘s argv to construct a new argument vector.
#include #include #include int main(int argc, char **argv) < char buf[32] = <>; char *exec_argv[] = < argv[0], buf, 0 >; int count = argc > 1 ? atoi(argv[1]) : 0; printf("Running: %s %d\n", argv[0], count); snprintf(buf, sizeof(buf), "%d", count+1); sleep(1); execv("/proc/self/exe", exec_argv); /* NOT REACHED */ return 0; >
This restart.c runs like this:
$ gcc restart.c $ ./a.out 3 Running: ./a.out 3 Running: ./a.out 4 Running: ./a.out 5
The normal way to do this is to let your program exit, and use a monitoring system to restart it. The init program offers such a monitoring system. There are many different init programs (SysVinit, BusyBox, Systemd, etc.), with completely different configuration mechanisms (always writing a configuration file, but the location and the syntax of the file differs), so look up the documentation of the one you’re using. Configure init to launch your program at boot time or upon explicit request, and to restart it if it dies. There are also fancier monitoring programs but you don’t sound like you need them. This approach has many advantages over having the program do the restart by itself: it’s standard, so you can restart a bunch of services without having to care how they’re made; it works even if the program dies due to a bug.
There’s a standard mechanism to tell a process to exit: signals. Send your program a TERM signal. If your program needs to perform any cleanup, write a signal handler. That doesn’t preclude having a program-specific command to make it shut down if you have an administrative channel to send it commands like this.
How do I write a bash script to restart a process if it dies?
How do I write a bash script that will check if it’s running, and if not, start it. Roughly the following pseudo code (or maybe it should do something like ps | grep ?):
# keepalivescript.sh if processidfile exists: if processid is running: exit, all ok run checkqueue.py write processid to processidfile
# crontab */5 * * * * /path/to/keepalivescript.sh
Just to add this for 2017. Use supervisord. crontab is not mean to do this kind of task. A bash script is terrible on emitting the real error. stackoverflow.com/questions/9301494/…
How about using inittab and respawn instead of other non-system solutions? See superuser.com/a/507835/116705
10 Answers 10
Avoid PID-files, crons, or anything else that tries to evaluate processes that aren’t their children.
There is a very good reason why in UNIX, you can ONLY wait on your children. Any method (ps parsing, pgrep, storing a PID, . ) that tries to work around that is flawed and has gaping holes in it. Just say no.
Instead you need the process that monitors your process to be the process’ parent. What does this mean? It means only the process that starts your process can reliably wait for it to end. In bash, this is absolutely trivial.
until myserver; do echo "Server 'myserver' crashed with exit code $?. Respawning.." >&2 sleep 1 done
The above piece of bash code runs myserver in an until loop. The first line starts myserver and waits for it to end. When it ends, until checks its exit status. If the exit status is 0 , it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don’t want to restart it (we just asked it to shut down!). If the exit status is not 0 , until will run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) after 1 second.
Why do we wait a second? Because if something’s wrong with the startup sequence of myserver and it crashes immediately, you’ll have a very intensive loop of constant restarting and crashing on your hands. The sleep 1 takes away the strain from that.
Now all you need to do is start this bash script (asynchronously, probably), and it will monitor myserver and restart it as necessary. If you want to start the monitor on boot (making the server «survive» reboots), you can schedule it in your user’s cron(1) with an @reboot rule. Open your cron rules with crontab :
Then add a rule to start your monitor script:
@reboot /usr/local/bin/myservermonitor
Alternatively; look at inittab(5) and /etc/inittab. You can add a line in there to have myserver start at a certain init level and be respawned automatically.
Let me add some information on why not to use PID files. While they are very popular; they are also very flawed and there’s no reason why you wouldn’t just do it the correct way.
- PID recycling (killing the wrong process):
- /etc/init.d/foo start : start foo , write foo ‘s PID to /var/run/foo.pid
- A while later: foo dies somehow.
- A while later: any random process that starts (call it bar ) takes a random PID, imagine it taking foo ‘s old PID.
- You notice foo ‘s gone: /etc/init.d/foo/restart reads /var/run/foo.pid , checks to see if it’s still alive, finds bar , thinks it’s foo , kills it, starts a new foo .
- PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to 1. .
- What if you don’t even have write access or are in a read-only environment?
- It’s pointless overcomplication; see how simple my example above is. No need to complicate that, at all.
By the way; even worse than PID files is parsing ps ! Don’t ever do this.
- ps is very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!
- Parsing ps leads to a LOT of false positives. Take the ps aux | grep PID example, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It’s just all kinds of bad.
If you don’t want to manage the process yourself; there are some perfectly good systems out there that will act as monitor for your processes. Look into runit, for example.