Multithreading in Bash [duplicate]
I would like to introduce multithreading feature in my shell script. I have a script which calls the function read_cfg() with different arguments. Each of these function calls are independent. Would it be possible to instantiate these function calls (not scripts) parallelly. Please let me how can we achieve that.. ?
This is not multithreading — it’s multiprocessing. Each instance is run in a distinct process, copied from the original with fork() . These processes — unlike threads — have their own file descriptor tables, and their memory is copy-on-write (so when they change a variable’s value, the parent process doesn’t see it).
3 Answers 3
Sure, just add & after the command:
read_cfg cfgA & read_cfg cfgB & read_cfg cfgC & wait
all those jobs will then run in the background simultaneously. The optional wait command will then wait for all the jobs to finish.
Each command will run in a separate process, so it’s technically not «multithreading», but I believe it solves your problem.
You should read up the difference between process and thread. What you propose is not multithreading — it involves separate processes for every command.
@TomTom: I certainly know the difference between processes and threads. If you see through the OP’s choice of words, I believe he is simply asking whether it’s possible to run the commands in parallel (which is possible). I added a note about this to clarify.
You can run several copies of your script in parallel, each copy for different input data, e.g. to process all *.cfg files on 4 cores:
ls *.cfg | xargs -P 4 -n 1 read_cfg.sh
The read_cfg.sh script takes just one parameters (as enforced by -n)
just a note that you should specify the full path to read_cfg.sh or xargs will say it can’t find the file.
Better to use printf ‘%s\0’ *.cfg | xargs -0 . — that way this works with filenames with spaces, unprintable characters, etc. See also Why you shouldn’t parse the output of ls(1).
Bash job control involves multiple processes, not multiple threads.
You can execute a command in background with the & suffix.
You can wait for completion of a background command with the wait command.
You can execute multiple commands in parallel by separating them with | . This provides also a synchronization mechanism, since stdout of a command at left of | is connected to stdin of command at right.
How to Use Multi-Threaded Processing in Bash Scripts
Roel Van de Paar
Roel Van de Paar
Writer
Roel has 25 years of experience in IT & business, 9 years of leading teams, and 5 years in hiring & building teams. He worked for companies like Oracle, Volvo, Sun, Percona, Siemens, Karat, and now MariaDB in various senior, principal, lead, and managerial roles. Read more.
Multi-threaded programming has always been of interest to developers to increase application performance and optimize resource usage. This guide will introduce you to Bash multi-threaded coding basics.
What Is multi-threaded programming?
A picture is worth a thousand words, and this holds when it comes to showing the difference between single (1) thread programming and multi-threaded (>1) programming in Bash:
Our first multi-threaded programming setup or mini one-liner script could not have been simpler; in the first line, we sleep for one second using the sleep 1 command. As far as the user is concerned, a single thread was executing a single sleep of one second.
In the second line, we have two one-second sleep commands. We join them by using a & separator, which does not only act as a separator between the two sleep commands, but also as an indicator to Bash to start the first command in a background thread.
Normally, one would terminate a command by using a semicolon ( ; ). Doing so would execute the command and only then proceed to the next command listed behind the semicolon. For example, executing sleep 1; sleep 1 would take just over two seconds – exactly one second for the first command, one second for the second, and a tiny amount of system overhead for each of the two commands. However, instead of terminating a command with a semicolon one can use other command terminators which Bash recognizes like & , && and || . The && syntax is quite unrelated to multi-threaded programming, it simply does this; proceed with executing the second command only if the first command was successful. The || is the opposite of && and will execute the second command only if the first command failed. Returning to multi-threaded programming, using & as our command terminator will initiate a background process executing the command preceding it. It then immediately proceeds with executing the next command in the current shell while leaving the background process (thread) to execute by itself. In the output of the command we can see a background process being started (as indicated by [1] 445317 where 445317 is the Process ID or PID of the just started background process and [1] is an indicated that this is our first background process) and it subsequently being terminated (as indicated by [1]+ Done sleep 1 ).
If you would like to view an additional example of background process handling, please see our Bash Automation and Scripting Basics (Part 3) article. Additionally, Bash Process Termination Hacks may be of interest.
time sleep 1; echo 'done' time $(sleep 1 & sleep 1); echo 'done'
Here we start our sleep process under time and we can see how our single threaded command ran for exactly 1.003 seconds before our command line prompt was returned.
However, in the second example, it took about the same time (1.005 seconds) even though we were executing two periods (and processes) of sleep, though not consecutively. Again we used a background process for the first sleep command, leading to (semi-)parallel execution, i.e., multi-threaded.
We also used a subshell wrapper ( $(. ) ) around our two sleep commands to combine them together under time . As we can see our done output shows in 1.005 seconds and thus the two sleep 1 commands must have run simultaneously. Interesting is the very small increase in overall processing time (0.002 seconds) which can be easily explained by the time required to start a subshell and the time required to initiate a background process.
Multi-threaded (and Background) Process Management
In Bash, multi-threaded coding will normally involve background threads from a main one-line script or full Bash script. In essence, one may think about multi-threaded coding in Bash as starting several background threads. When one starts to code using multiple threads, it quickly becomes clear that such threads will usually require some handling. For example, take the fictive example where we start five concurrent periods (and processes) of sleep in a Bash script;
#!/bin/bash sleep 10 & sleep 600 & sleep 1200 & sleep 1800 & sleep 3600 &
When we start the script (after making it executable using chmod +x rest.sh ), we see no output! Even if we execute jobs (the command which shows any background jobs in progress), there is no output. Why?
The reason is that the shell which was used to start this script (i.e., the current shell) is not the same shell (nor the same thread; to start thinking in terms of subshells as threads in and by themselves) that executed the actual sleep commands or placed them into the background. It was rather the (sub)shell which was started when ./rest.sh was executed.
Let’s change our script by adding jobs inside the script. This will ensure that jobs is executed from within the (sub)shell where it is relevant, the same one as to where the periods (and processes) of sleep were started.
This time we can see the list of background processes being started thanks to the jobs command at the end of the script. We can also see their PID’s (Process Identifiers). These PIDs are very important when it comes to handling and managing background processes.
Another way to obtain the background Process Identifier is to query for it immediately after placing a program/process into the background:
#!/bin/bash sleep 10 & echo $ sleep 600 & echo $ sleep 1200 & echo $ sleep 1800 & echo $ sleep 3600 & echo $