ubuntu/linux bash: traverse directory and subdirectories to work with files
let me start off with what I need; the program is given a directory, it will then examine all the files in the directory (works) and do stuff to the files (waiting till it can find all the files for this part). then it will look for subdirectories and re-run its self for each subdirectory. the directory I’m testing with looks like this:
desktop/test_files/ (starting directory) desktop/test_files/folder 1/ desktop/test_files/folder 1>folder 2/ desktop/test_files/folder 1>folder 2/ desktop/test_files/folder 3/ desktop/test_files/folder 3/
$ ./x007_shorter.sh Desktop/test_files/ Desktop/test_files/"folder 1"/ Desktop/test_files/folder 1/"folder 2"/ ls: cannot access */: No such file or directory Desktop/test_files/folder 1/folder 2/"folder 3"/ ./x007_shorter.sh: line 4: cd: ./folder 3/: No such file or directory ls: cannot access */: No such file or directory
#!/bin/bash function findir < newDir=$1 eval cd $newDir ARRAY=( $(ls -d */) ) declare -a diry count=0 a=0 while [ $a -lt $]; do diry[$count]="$" noSpace=true while [ true ]; do if [[ $ == */* ]] ; then if [ $noSpace = false ]; then diry[$count]="$-1))>\"/" fi break noSpace=true fi let "a=$a+1" if [ $noSpace = false ]; then diry[$count]="$ $" else diry[$count]="\"$ $" fi noSpace=false done let "count=$count+1" let "a=$a+1" done for a in `seq 1 $`; do eval cd .$newDir # list "$" where=`pwd` # eval cd $newDir #findir "$" #findir "$where$" #Right option won, echo "$ Vs $where/$" echo "$where/$" findir "./$" done > function list < input_file_directory=$1 eval cd $input_file_directory ARRAY=( $(find . -maxdepth 1 -type f -print) ) declare -a files count=0 a=0 while [ $a -lt $]; do files[$count]="$" while [ true ]; do if [[ $ == ./* ]] ; then break fi if [[ "$" == "" ]] ; then break fi let "a=$a+1" files[$count]="$ $" done let "count=$count+1" let "a=$a+1" done where=`pwd` for a in `seq 1 $`; do echo "$where$" #going to work on each file, just echoing file till lists all files done > clear dar="" if [[ $1 = "" ]]; then read -p "Please enter a directory for me to scan" newdir dar=$newdir list $newdir findir $newdir else dar=$1 list $1 findir $1 fi
Execute command on all files in a directory
Could somebody please provide the code to do the following: Assume there is a directory of files, all of which need to be run through a program. The program outputs the results to standard out. I need a script that will go into a directory, execute the command on each file, and concat the output into one big output file. For instance, to run the command on 1 file:
$ cmd [option] [filename] > results.out
I would like to add to the question. Can it be done using xargs? e.g., ls
It can, but you probably don’t want to use ls to drive xargs . If cmd is at all competently written, perhaps you can simply do cmd .
10 Answers 10
The following bash code will pass $file to command where $file will represent every file in /dir
for file in /dir/* do cmd [option] "$file" >> results.out done
el@defiant ~/foo $ touch foo.txt bar.txt baz.txt el@defiant ~/foo $ for i in *.txt; do echo "hello $i"; done hello bar.txt hello baz.txt hello foo.txt
If no files exist in /dir/ , then the loop still runs once with a value of ‘*’ for $file , which may be undesirable. To avoid this, enable nullglob for the duration of the loop. Add this line before the loop shopt -s nullglob and this line after the loop shopt -u nullglob #revert nullglob back to it’s normal default state .
If the output file is the same inside the loop, it’s much more efficient to redirect outside the loop done >results.out (and probably then you can overwrite instead of append, like I have assumed here).
«be carefull by using this command for huge amount of files in dir. Use find -exec instead». But why?
find /some/directory -maxdepth 1 -type f -exec cmd option <> \; > results.out
- -maxdepth 1 argument prevents find from recursively descending into any subdirectories. (If you want such nested directories to get processed, you can omit this.)
- -type -f specifies that only plain files will be processed.
- -exec cmd option <> tells it to run cmd with the specified option for each file found, with the filename substituted for <>
- \; denotes the end of the command.
- Finally, the output from all the individual cmd executions is redirected to results.out
However, if you care about the order in which the files are processed, you might be better off writing a loop. I think find processes the files in inode order (though I could be wrong about that), which may not be what you want.
This is the correct way to process files. Using a for loop is error-prone due to many reasons. Also sorting can be done by using other commands such as stat and sort , which of-course dependes on what is sorting criteria.
if I wanted to run two commands how would I link them after the -exec option? Do i have to wrap them in single quotes or something?
find is always the best option cause you can filter by file name pattern with option -name and you can do it in a single command.
@frei the answer to your question is here: stackoverflow.com/a/6043896/1243247 but basically just add -exec options: find . -name «*.txt» -exec echo <> \; -exec grep banana <> \;
I’m doing this on my Raspberry Pi from the commandline by running:
While this answer is probably the «right» way to do this in a production environment, for day-to-day usage convenience, this one-liner wins!
If one wants to use the modified filename as an argument (e.g. for the name of the output file), you can add anything after the $i part, and you will have a new string. Example of an imaginary command ppp -i raw.txt -o processed.txt would be: for i in *; do ppp -i «$i» -o «$i changed»; done — this will do the ppp command on every file and the resulting file for each execution will be named like the input file, with addition of » changed» at the end.
ls | xargs -L 1 -d '\n' your-desired-command
Using xargs is nice because it allows you to run your-desired-command in parallel if you add the -P 8 flag (up to 8 processes at the same time).
For macOS, the -d option isn’t available. You can fix it by brew install findutils first and then use gxargs instead of xargs
The accepted/high-voted answers are great, but they are lacking a few nitty-gritty details. This post covers the cases on how to better handle when the shell path-name expansion (glob) fails, when filenames contain embedded newlines/dash symbols and moving the command output re-direction out of the for-loop when writing the results to a file.
When running the shell glob expansion using * there is a possibility for the expansion to fail if there are no files present in the directory and an un-expanded glob string will be passed to the command to be run, which could have undesirable results. The bash shell provides an extended shell option for this using nullglob . So the loop basically becomes as follows inside the directory containing your files
shopt -s nullglob for file in ./*; do cmdToRun [option] -- "$file" done
This lets you safely exit the for loop when the expression ./* doesn’t return any files (if the directory is empty)
or in a POSIX compliant way ( nullglob is bash specific)
for file in ./*; do [ -f "$file" ] || continue cmdToRun [option] -- "$file" done
This lets you go inside the loop when the expression fails for once and the condition [ -f «$file» ] check if the un-expanded string ./* is a valid filename in that directory, which wouldn’t be. So on this condition failure, using continue we resume back to the for loop which won’t run subsequently.
Also note the usage of — just before passing the file name argument. This is needed because as noted previously, the shell filenames can contain dashes anywhere in the filename. Some of the shell commands interpret that and treat them as a command option when the name are not quoted properly and executes the command thinking if the flag is provided.
The — signals the end of command line options in that case which means, the command shouldn’t parse any strings beyond this point as command flags but only as filenames.
Double-quoting the filenames properly solves the cases when the names contain glob characters or white-spaces. But *nix filenames can also contain newlines in them. So we de-limit filenames with the only character that cannot be part of a valid filename — the null byte ( \0 ). Since bash internally uses C style strings in which the null bytes are used to indicate the end of string, it is the right candidate for this.
So using the printf option of shell to delimit files with this NULL byte using the -d option of read command, we can do below
( shopt -s nullglob; printf '%s\0' ./* ) | while read -rd '' file; do cmdToRun [option] -- "$file" done
The nullglob and the printf are wrapped around (..) which means they are basically run in a sub-shell (child shell), because to avoid the nullglob option to reflect on the parent shell, once the command exits. The -d » option of read command is not POSIX compliant, so needs a bash shell for this to be done. Using find command this can be done as
while IFS= read -r -d '' file; do cmdToRun [option] -- "$file" done < <(find -maxdepth 1 -type f -print0)
For find implementations that don't support -print0 (other than the GNU and the FreeBSD implementations), this can be emulated using printf
find . -maxdepth 1 -type f -exec printf '%s\0' <> \; | xargs -0 cmdToRun [option] --
Another important fix is to move the re-direction out of the for-loop to reduce a high number of file I/O. When used inside the loop, the shell has to execute system-calls twice for each iteration of the for-loop, once for opening and once for closing the file descriptor associated with the file. This will become a bottle-neck on your performance for running large iterations. Recommended suggestion would be to move it outside the loop.
Extending the above code with this fixes, you could do
( shopt -s nullglob; printf '%s\0' ./* ) | while read -rd '' file; do cmdToRun [option] -- "$file" done > results.out
which will basically put the contents of your command for each iteration of your file input to stdout and when the loop ends, open the target file once for writing the contents of the stdout and saving it. The equivalent find version of the same would be
while IFS= read -r -d '' file; do cmdToRun [option] -- "$file" done < <(find -maxdepth 1 -type f -print0) >results.out
How to loop over files in directory and change path and add suffix to filename
I need to write a script that starts my program with different arguments. I start my program with: ./MyProgram.exe Data/data1.txt [Logs/data1_Log.txt] . Here is the pseudocode for what I want to do:
for each filename in /Data do for int i = 0, i = 3, i++ ./MyProgram.exe Data/filename.txt Logs/filename_Log.txt end for end for
How can I create the second argument from the first one, so it looks like dataABCD_Log1.txt and start my program?
@LéaGris The proposed duplicate seems less stellar, especially as one of the answers there still advocates looping over ls output. These seem different enough that I have not nominated that as a duplicate of this, either.
6 Answers 6
A couple of notes first: when you use Data/data1.txt as an argument, should it really be /Data/data1.txt (with a leading slash)? Also, should the outer loop scan only for .txt files, or all files in /Data? Here's an answer, assuming /Data/data1.txt and .txt files only:
#!/bin/bash for filename in /Data/*.txt; do for ((i=0; i
- /Data/*.txt expands to the paths of the text files in /Data (including the /Data/ part)
- $( . ) runs a shell command and inserts its output at that point in the command line
- basename somepath .txt outputs the base part of somepath, with .txt removed from the end (e.g. /Data/file.txt -> file )
If you needed to run MyProgram with Data/file.txt instead of /Data/file.txt , use "$" to remove the leading slash. On the other hand, if it's really Data not /Data you want to scan, just use for filename in Data/*.txt .
How to perform grep operation on all files in a directory?
Working with xenserver, and I want to perform a command on each file that is in a directory, grepping some stuff out of the output of the command and appending it in a file. I'm clear on the command I want to use and how to grep out string(s) as needed. But what I'm not clear on is how do I have it perform this command on each file, going to the next, until no more files are found.
5 Answers 5
In Linux, I normally use this command to recursively grep for a particular text within a directory:
- r = recursive i.e, search subdirectories within the current directory
- n = to print the line numbers to stdout
- i = case insensitive search
grep $PATTERN * would be sufficient. By default, grep would skip all subdirectories. However, if you want to grep through them, grep -r $PATTERN * is the case.
@Tomáš Zato, just supply all your file patterns instead of *: grep $PATTERN *.cpp *.h . If you need more specific rules for what files should be grepped, use find command (check Rob's answer).
@Chris it's possible you don't have *.scss files in current directory but somewhere deeper in subdirs so grep does not look in all the files you wanted. You should use --include option to tell grep to look recursively for files that matches specific patterns: grep -r x --include '*.scss' . (note the quotes, they prevent the pattern from being expanded by the shell). Or just use find (see Rob's answer).
You want grep -s so you don't get a warning for each subdirectory that grep skips. You should probably double-quote "$PATTERN" here.