Concatenating multiple text files into a single file in Bash
What is the quickest and most pragmatic way to combine all *.txt file in a directory into one large text file? Currently I’m using windows with cygwin so I have access to BASH. Windows shell command would be nice too but I doubt there is one.
12 Answers 12
This appends the output to all.txt
you may run into a problem where it cats all.txt into all.txt. I have this problem with grep sometimes, not sure if cat has the same behavior.
@rmeador yes, that is true, if all.txt already exists you will have this problem. This problem is solved by providing the output file with a different extension, or moving all.txt to a different folder.
Just remember, for all the solutions given so far, the shell decides the order in which the files are concatenated. For Bash, IIRC, that’s alphabetical order. If the order is important, you should either name the files appropriately (01file.txt, 02file.txt, etc. ) or specify each file in the order you want it concatenated.
$ cat file1 file2 file3 file4 file5 file6 > out.txt
The Windows shell command type can do this:
Type type command also writes file names to stderr, which are not captured by the > redirect operator (but will show up on the console).
Just be aware that if you put the output file in the same directory as the original file it will cause a duplication because it will also combine the new output file twice.
You can use Windows shell copy to concatenate files.
To append files, specify a single file for destination, but multiple files for source (using wildcards or file1+file2+file3 format).
This as the IMHO cleanest solution with basically no side effects that beginners could trip over unfortunately does not get appreciated enough 🙁
Worked pretty well, except at the very end of my file I got a weird SUB special unicode character. Deleting it is pretty easy programmatically but not sure why that happened.
Be careful, because none of these methods work with a large number of files. Personally, I used this line:
for i in $(ls | grep ".txt");do cat $i >> output.txt;done
EDIT: As someone said in the comments, you can replace $(ls | grep «.txt») with $(ls *.txt)
EDIT: thanks to @gnourf_gnourf expertise, the use of glob is the correct way to iterate over files in a directory. Consequently, blasphemous expressions like $(ls | grep «.txt») must be replaced by *.txt (see the article here).
Good Solution
for i in *.txt;do cat $i >> output.txt;done
Mandatory ParsingLs link, together with a downvote (and you deserve more than one downvote, because ls | grep is a seriously bad antipattern).
Got an upvote from me because it allows for arbitrary testing/ operations by file name prior to output and it’s quick and easy and good for practice. (In my case I wanted: for i in *; do echo -e «\n$i:\n»; cat $1; done )
find . -type f -name '*.txt' -exec cat <> + >> output.txt
Since OP says the files are in the same directory, you may need to add -maxdepth 1 to the find command.
This should be the correct answer. It will work properly in a shell script. Here is a similar method if you want output sorted: sort -u —output=»$OUTPUT_FILE» —files0-from=- < <(find "$DIRECTORY_NAME" -maxdepth 1 -type f -name '*.txt' -print0)
This is a very flexible approach relying on all the strengths of the find . My favourite! Surely cat *.txt > all.txt does the job within the same directory (as pointed out above). To me, however, becoming comfortably fluent in using find has been a very good habit. Today they’re all in one folder, tomorrow they have multiple file-endings across nested directory hierarchies. Don’t overcomplicate, but also, do make friends with find . 🙂
How to merge all (text) files in a directory into one?
This is technically what cat («concatenate») is supposed to do, even though most people just use it for outputting files to stdout. If you give it multiple filenames it will output them all sequentially, and then you can redirect that into a new file; in the case of all files just use ./* (or /path/to/directory/* if you’re not in the directory already) and your shell will expand it to all the filenames (excluding hidden ones by default).
Make sure you don’t use the csh or tcsh shells for that which expand the glob after opening the merged-file for output, and that merged-file doesn’t exist before hand, or you’ll likely end up with an infinite loop that fills up the filesystem.
The list of files is sorted lexically. If using zsh , you can change the order (to numeric, or by age, size. ) with glob qualifiers.
To include files in sub-directories, use:
find . ! -path ./merged-file -type f -exec cat <> + > merged-file
Though beware the list of files is not sorted and hidden files are included. -type f here restricts to regular files only as it’s unlikely you’ll want to include other types of files. With GNU find , you can change it to -xtype f to also include symlinks to regular files.
Would do the same ( (-.) achieving the equivalent of -xtype f ) but give you a sorted list and exclude hidden files (add the D qualifier to bring them back). zargs can be used there to work around argument list too long errors.
How to join text files?
I have saved many documents as txt. I want to print them together so first I want them together in a single file. The order doesn’t matter in this case. I want a solution that does not involve typing the names of the files to be merged, but one that would just merge all txt files within the folder. Can I do it with a command or some GUI? I looked here. Don’t know how to use join .
8 Answers 8
Use cat with output redirection. Syntax: cat file [file] [[file] . ] > joined-file .
Example with just two files (you can have many more):
$ echo "some text in a file" > file1 $ echo "another file with some text" > file2 $ cat file1 file2 > mergedfiles $ cat mergedfiles some text in a file another file with some text
In case you have «many documents», make use of shell globbing (patterns):
cat input-files-dir/* > joined-file
This will join all files in that directory to the current directory (preventing it to match the output file itself). It is totally independent to the use of cat and output redirection — it’s just Bash providing all the files as arguments to cat .
File types
It will just glue (join) files together as you would do with paper and tape. It does not care about the actual file format being capable of handling this. It will work for text files, but not for PDFs, ODTs, etc. Well, it will glue them together, but it’s not a valid PDF/ODT anymore.
Order of joining
As phoibos pointed out the shell globbing will result in alphabetical order of file names. This is how Bash and shell globbing works.
Addendum about input file is output file error
When the pattern of the input files matches the very same file as being output, this will cause an error. It’s a safety feature. Example: cat *.txt > out.txt run the second time will cause this.
- Choose a more specific pattern to match the actual input files, not matching the output name. Example: input files pattern *.txt with output file output.out will not collide.
- Work in different directories. In the example above I’ve used a separate input-files-dir directory to place all files in, and output to the current working directory. This makes it impossible to get this error.