Joining text files linux

join multiple files

I am using the standard join command to join two sorted files based on column1. The command is simple join file1 file2 > output_file. But how do I join 3 or more files using the same technique ? join file1 file2 file3 > output_file Above command gave me an empty file. I think sed can help me but I am not too sure how ?

8 Answers 8

NAME join - join lines of two files on a common field SYNOPSIS join [OPTION]. FILE1 FILE2 

it only works with two files.

if you need to join three, maybe you can first join the first two, then join the third.

join file1 file2 | join - file3 > output 

that should join the three files without creating an intermediate temp file. — tells the join command to read the first input stream from stdin

One can join multiple files (N>=2) by constructing a pipeline of join s recursively:

#!/bin/sh # multijoin - join multiple files join_rec() < if [ $# -eq 1 ]; then join - "$1" else f=$1; shift join - "$f" | join_rec "$@" fi >if [ $# -le 2 ]; then join "$@" else f1=$1; f2=$2; shift 2 join "$f1" "$f2" | join_rec "$@" fi 

Definitely my favourite answer! However, I replaced the join_rec function’s body by this : f1=$1; f2=$2; shift 2; if [ $# -gt 0 ]; then; join «$f1» «$f2» | join_rec — «$@»; else; join «$f1» «$f2»; fi as to eliminate the need for the second if . The call would look like join_rec «$@»

I know this is an old question but for future reference. If you know that the files you want to join have a pattern like in the question here e.g. file1 file2 file3 . fileN Then you can simply join them with this command

Where output will be the series of the joined files which were joined in alphabetical order.

This works superb for text files. How about the binary files which have been split using other commands / packages / software.

well there you have probably some header in every file which indicates, what kind of file is it, so there is this not sufficient, but you should search for other so questions for this, i am sure someone solved it already

I created a function for this. First argument is the output file, rest arguments are the files to be joined.

function multijoin() < out=$1 shift 1 cat $1 | awk '' > $out for f in $*; do join $out $f > tmp; mv tmp $out; done > 
multijoin output_file file* 

While a bit an old question, this is how you can do it with a single awk :

awk -v j= ' # get key and delete field j (NR==FNR) # store the key-order # update key-entry END < for(i=1;i<=FNR;++i) < key=order[i]; print key entryJoining text files linux # print >>' file1 . filen 
  • all files have the same amount of lines
  • the order of the output is the same order of the first file.
  • files do not need to be sorted in field
  • is a valid integer.
Читайте также:  Подсчет трафика трафик linux

The man page of join states that it only works for two files. So you need to create and intermediate file, which you delete afterwards, i.e.:

> join file1 file2 > temp > join temp file3 > output > rm temp 

Join joins lines of two files on a common field. If you want to join more — do it in pairs. Join first two files first, then join the result with a third file etc.

Assuming you have four files A.txt, B.txt, C.txt and D.txt as:

~$ cat A.txt x1 2 x2 3 x4 5 x5 8 ~$ cat B.txt x1 5 x2 7 x3 4 x4 6 ~$ cat C.txt x2 1 x3 1 x4 1 x5 1 ~$ cat D.txt x1 1 
firstOutput='0,1.2'; secondOutput='2.2'; myoutput="$firstOutput,$secondOutput"; outputCount=3; join -a 1 -a 2 -e 0 -o "$myoutput" A.txt B.txt > tmp.tmp; for f in C.txt D.txt; do firstOutput="$firstOutput,1.$outputCount"; myoutput="$firstOutput,$secondOutput"; join -a 1 -a 2 -e 0 -o "$myoutput" tmp.tmp $f > tempf; mv tempf tmp.tmp; outputCount=$(($outputCount+1)); done; mv tmp.tmp files_join.txt 
~$ cat files_join.txt x1 2 5 0 1 x2 3 7 1 0 x3 0 4 1 0 x4 5 6 1 0 x5 8 0 1 0 

Источник

How to Join or Merge Text Files in Linux

The Linux cat command is one of the most versatile tools that can use to create files, view them, and even combine them in the Linux command line.

In this article, we take a detour and explore how you can join two text files in Linux using the cat command, (short for “concatenate”) is one of the most commonly used commands in Linux as well as other UNIX-like operating systems, used to concatenate files and print on the standard output.

It is not only used to view files but can also be used to create files together with the redirection character.

View Contents of File in Linux

Suppose you have three text files: sample1.txt, sample2.txt, and sample.3.txt.

To view the contents of these files without opening them, you can use the cat command as shown (remember to replace sample1.txt , sample2.txt and domains3.txt with the names of the files you wish to combine):

$ cat sample1.txt sample2.txt sample3.txt

This provides the following output, with each line in the output corresponding to the files in order of appearance.

View File Contents in Linux

Join Contents of Three Files in Linux

To join the three files into one text file, we will use the output redirection operator (>) to redirect output from all the files to a new file. In this example, we have redirected content from all three files to sample4.txt.

$ cat sample1.txt sample2.txt sample3.txt > sample4.txt

The new file now contains content from all the text files, which you can verify by running the following command.

Читайте также:  Изменить группу владельца папки linux

Join Files in Linux

CAUTION: The sample4.txt file is overwritten if it already exists. Therefore proceed with caution when using the redirection operator.

A better option is to append the content of the files to an already existing file. This prevents the deletion of pre-existing content. To achieve this, use the double redirection operator (>>) followed by the file name of the file you want to append the content.

The previous command can be modified as follows:

$ cat sample1.txt sample2.txt sample3.txt >> sample4.txt

This ensures that the existing file is not overwritten. Instead content from the other files is simply added or appended to it.

Append File Contents to New File in Linux

Alternatively, to append content to the file, simply type the cat command followed by the double redirection operator and then the name of the file. Upon pressing ENTER, type in the content you want to add. Then hit ENTER again and press ctrl + d to save the changes made.

Append File Contents to New File

Merge Contents of Files Using Sed Command

Alternatively, you can also use the popular sed (a streamer editor) to join or merge the content of two or more files on the command-line, by using its r flag, which instructs sed to read the file provided as an argument. If there are many files, it reads all of them and displays their content as a combined output.

$ sed r sample1.txt sample1.txt sample3.txt $ sed r sample1.txt sample1.txt sample3.txt > sample4.txt $ cat sample4.txt

That was a short guide on how you can join two or more text files on Linux. Any additional ideas you might have up your sleeve? Do let us know in the comment section.

Источник

How to join text files?

I have saved many documents as txt. I want to print them together so first I want them together in a single file. The order doesn’t matter in this case. I want a solution that does not involve typing the names of the files to be merged, but one that would just merge all txt files within the folder. Can I do it with a command or some GUI? I looked here. Don’t know how to use join .

8 Answers 8

Use cat with output redirection. Syntax: cat file [file] [[file] . ] > joined-file .

Example with just two files (you can have many more):

$ echo "some text in a file" > file1 $ echo "another file with some text" > file2 $ cat file1 file2 > mergedfiles $ cat mergedfiles some text in a file another file with some text 

In case you have «many documents», make use of shell globbing (patterns):

cat input-files-dir/* > joined-file 

This will join all files in that directory to the current directory (preventing it to match the output file itself). It is totally independent to the use of cat and output redirection — it’s just Bash providing all the files as arguments to cat .

Читайте также:  Удалить старые файлы линукс

File types

It will just glue (join) files together as you would do with paper and tape. It does not care about the actual file format being capable of handling this. It will work for text files, but not for PDFs, ODTs, etc. Well, it will glue them together, but it’s not a valid PDF/ODT anymore.

Order of joining

As phoibos pointed out the shell globbing will result in alphabetical order of file names. This is how Bash and shell globbing works.

Addendum about input file is output file error

When the pattern of the input files matches the very same file as being output, this will cause an error. It’s a safety feature. Example: cat *.txt > out.txt run the second time will cause this.

  • Choose a more specific pattern to match the actual input files, not matching the output name. Example: input files pattern *.txt with output file output.out will not collide.
  • Work in different directories. In the example above I’ve used a separate input-files-dir directory to place all files in, and output to the current working directory. This makes it impossible to get this error.

Источник

How to merge all (text) files in a directory into one?

This is technically what cat («concatenate») is supposed to do, even though most people just use it for outputting files to stdout. If you give it multiple filenames it will output them all sequentially, and then you can redirect that into a new file; in the case of all files just use ./* (or /path/to/directory/* if you’re not in the directory already) and your shell will expand it to all the filenames (excluding hidden ones by default).

Make sure you don’t use the csh or tcsh shells for that which expand the glob after opening the merged-file for output, and that merged-file doesn’t exist before hand, or you’ll likely end up with an infinite loop that fills up the filesystem.

The list of files is sorted lexically. If using zsh , you can change the order (to numeric, or by age, size. ) with glob qualifiers.

To include files in sub-directories, use:

find . ! -path ./merged-file -type f -exec cat <> + > merged-file 

Though beware the list of files is not sorted and hidden files are included. -type f here restricts to regular files only as it’s unlikely you’ll want to include other types of files. With GNU find , you can change it to -xtype f to also include symlinks to regular files.

Would do the same ( (-.) achieving the equivalent of -xtype f ) but give you a sorted list and exclude hidden files (add the D qualifier to bring them back). zargs can be used there to work around argument list too long errors.

Источник

Оцените статью
Adblock
detector