How to automatically restart a linux background process if it fails?
In certain conditions, myprocess can fail and return. Is there any (standard) way how to detect its fail and restart in automatically?
Sure, but it varies based on distribution. Pretty much all of them provide some kind of service manager.
9 Answers 9
The easiest way would be to add it to /etc/inittab, which is designed to do this sort of thing:
respawn If the process does not exist, start the process. Do not wait for its termination (continue scanning the /etc/inittab file). Restart the process when it dies. If the process exists, do nothing and continue scanning the /etc/inittab file.
For example, you could do this:
# Run my stuff myprocess:2345:respawn:/bin/myprocess
Note, /etc/inittab works (or even exists) if and only if you have a sysvinit-based init system. With upstart and with systemd it doesn’t. You have to install either busybox (very primitive shell making the sysadm recover tasks painful, but it can substitute a sysvinit-compatible initd) or sysvinit (it is a fossil). In a docker container, only the first is not painful.
Buildroot has three possible init systems, so there are three ways to do this:
BusyBox init
With this, one adds an entry to /etc/inittab .
Note that BusyBox init has an idiosyncratic /etc/inittab format. The second field is meaningless, and the first field is not an ID but a device basename.
Linux «System V» init
Again, one adds an entry to /etc/inittab .
myprocess:2345:respawn:/bin/myprocess
systemd
One writes a unit file in, say, /etc/systemd/system/myprocess.service :
[Unit] Description=My Process [Service] ExecStart=/bin/myprocess Restart=always [Install] WantedBy=multi-user.target
Enable this to autostart at bootup with:
systemctl enable myprocess.service
systemctl start myprocess.service
Further reading
but when you use this approach inittab then your process is no longer accessible via the ‘service’ interface right? i.e you can’t go service mything start or service mything stop anymore. is there a way to have the best of both? i.e uncrashable sysvinit service, but also have it usable via ‘service’ ?
Running systemd, so the obvious choice I would have is #3 above. Yet for some unknown reason, my nohup ffmpeg . & daemon won’t save its files when launched this way, so I have to start it manually for it to work. Out of 4 similar daemons, rarely do all four continue 24 hours. Usually one fails for unknown reason[s] and sometime two, meaning I really need this auto-restart. I’m upvoting one of the options below until I should find it doesn’t work.
What about creating a subshell with a loop that calls constantly the same process?
If it ends, the next iteration of the loop goes on and starts it again.
(while true; do /bin/myprocess done) &
If the subshell dies, it’s over though. The only possibility in that case would be to create another process (I’ll call it necromancer) that checks whether yourprocess is alive, start it if it isn’t and run this necromancer with cron, so that you can check that regularly.
Next step would be wondering what could happen if cron dies, but at some point you should feel safe and stop worrying.
You could make use of Monit . It’s really easy to use and quite flexible. See for example this configuration for restarting the Tomcat process on failure.
check process tomcat with pidfile /var/run/tomcat.pid start program = "/etc/init.d/tomcat start" stop program = "/etc/init.d/tomcat stop" if failed port 8080 type tcp then restart
It also has a lot of configuration examples for many use cases.
start) restarter -c /bin/myprocess & stop) pkill -f myprocess
On newer systems use systemd which solves all those trivial issues
If you’re not a super user or root, and if your Linux system has Docker installed, then you can create a docker image of your process, using docker to restart your process if the system is rebooted.
version: "3" services: lserver: image: your_docker_image:latest ports: - 8080:8080 # just use 8080 as an example restart: always # this is where your process can be guaranteed to restart
To start your docker container,
I find it’s easy to handle my-own-process with auto-restart if I am not a super user of the system.
For a sample example of how to create a docker image, here is a quick example:
FROM alpine:3.5 RUN apk update && apk upgrade && rm -rf /var/cache/apk/* WORKDIR /app COPY my-process-server /app RUN ln -s /app/my-process-server /usr/local/bin/my-process-server EXPOSE 8080 CMD ["my-process-server"]
He is asking about init.d scripts. Adding docker on top is really not the tool to auto restart a daemon. It’s like automating a computer power cycle if the daemon stops.
init.d scripts aren’t available to non-root users. This one is prefaced by being a possibility for those who do not have that access. A cron job to run testing scripts would work as well though.
In my case, as a quick-fix, I modified and used the solution of @Trylks to wrap the program I was launching. I wanted it to end only on clean exit.
Should run in most shells:
#!/bin/sh echo "" echo "Use: $0 ./program" echo "" #eg="/usr/bin/apt update" echo "Executing $1 . " EXIT_CODE=1 (while [ $EXIT_CODE -gt 0 ]; do $1 # loops on error code: greater-than 0 EXIT_CODE=$? done) &
(Edit): Sometimes programs hang without quitting, for no apparent reason. (Yes, of course there’s always a reason but it can take a lot of time and effort to find it, particularly if it’s not your own code.)
The problem I had was that the process (a Python server) was hanging from time to time, so I needed to regularly kill and restart it. I did it with a cron task that runs every couple of hours. Here’s the shell script:
#!/bin/sh # This cron script restarts the server # Look for a running instance of myapp.py p=$(ps -eaf | grep "[m]yapp.py") # Get the second item; the process number n=$(echo $p | awk '') # If it's not empty, kill the process if [ "$n" ] then kill $n fi # Start a new instance python3 myapp.py
How to start a stopped process in Linux
I have a stopped process in Linux at a given terminal. Now I am at another terminal. How do I start that process. What kill signal would I send. I own that process.
2 Answers 2
You can issue a kill -CONT pid, which will do what you want as long as the other terminal session is still around. If the other session is dead it might not have anywhere to put the output.
In addition to @Dave’s answer, there is an advanced method to redirect input and output file descriptors of a running program using GDB.
A FreeBSD example for an arbitrary shell script with PID 4711:
> gdb /bin/sh 4711 . Attaching to program: /bin/sh, process 4711 . (gdb) p close(1) $1 = 0 (gdb) p creat("/tmp/testout.txt",0644) $2 = 1 (gdb) p close(2) $3 = 0 (gdb) p dup2(1,2) $4 = 2
EDIT — explanation: this closes filehandle 1, then opens a file, which reuses 1. Then it closes filehandle 2 and duplicates filehandle 1 to 2.
Now this process’ stdout and stderr go to indicated file and are readable from there. If stdin is required, you need to p close(0) and then attach some input file or PIPE or smth.
For the time being, I could not find a method to remotely disown this process from the controlling terminal, which means that when the terminal exits, this process receives SIGHUP signal.
Note: If you do have/gain access to the other terminal, you can disown -a so that this process will continue to run after the terminal closes.
How can I resume a stopped job in Linux?
This is actually a fairly normal work flow for Vim, if you want to keep you commands in your bash history, then you hit Ctrl-z type your commands and then resume. Obviously you can run commands without leaving Vim via the :! ed command
5 Answers 5
The command fg is what you want to use. You can also give it a job number if there are more than one stopped jobs.
The general job control commands in Linux are:
- jobs — list the current jobs
- fg — resume the job that’s next in the queue
- fg %[number] — resume job [number]
- bg — Push the next job in the queue into the background
- bg %[number] — Push the job [number] into the background
- kill %[number] — Kill the job numbered [number]
- kill -[signal] %[number] — Send the signal [signal] to job number [number]
- disown %[number] — disown the process(no more terminal will be owner), so command will be alive even after closing the terminal.
That’s pretty much all of them. Note the % infront of the job number in the commands — this is what tells kill you’re talking about jobs and not processes.
Why use «%» character. Is it required to be prepended before the job number or Is it a unix convention to specify the int type ?
You can also type % ; i.e., you hit Ctrl-Z in emacs, then you can type %emacs in the console and bring it back to the foreground.
Just to add to the other answers, bash lets you skip the fg if you specify a job number.
For example, these are equivalent and resume the latest job:
% is awesome, thanks! As a touch-typist, I find fg very irritating (same finger). But then, so is cd .
in my case when i tried to use fg i see the stopped process appears and disappears quickly and just
If you didn’t launch it from current terminal, use ps aux | grep to find the process number (pid), then resume it with:
(Despite the name, kill is simply a tool to send a signal to the process, allowing processes to communicate with each other. A «kill signal» is only one of many standard signals.)
Bonus tip: wrap the first character of the process name with [] to prevent the grep command itself appearing in the results. e.g. to find emacs process, use ps aux | grep [e]macs