Linux compare directories recursively

How can I compare two directories recursively and check if one of the directories contains the other?

I have two directories, they contain common files. I want to know if one directory contains the same file as the other. I found a script on the net but I want to need improve it to do recursively.

 #!/bin/bash # cmp_dir - program to compare two directories # Check for required arguments if [ $# -ne 2 ]; then echo "usage: $0 directory_1 directory_2" 1>&2 exit 1 fi # Make sure both arguments are directories if [ ! -d $1 ]; then echo "$1 is not a directory!" 1>&2 exit 1 fi if [ ! -d $2 ]; then echo "$2 is not a directory!" 1>&2 exit 1 fi # Process each file in directory_1, comparing it to directory_2 missing=0 for filename in $1/*; do fn=$(basename "$filename") if [ -f "$filename" ]; then if [ ! -f "$2/$fn" ]; then echo "$fn is missing from $2" missing=$((missing + 1)) fi fi done echo "$missing files missing" 

3 Answers 3

#!/bin/bash # cmp_dir - program to compare two directories # Check for required arguments if [ $# -ne 2 ]; then echo "usage: $0 directory_1 directory_2" 1>&2 exit 1 fi # Make sure both arguments are directories if [ ! -d "$1" ]; then echo "$1 is not a directory!" 1>&2 exit 1 fi if [ ! -d "$2" ]; then echo "$2 is not a directory!" 1>&2 exit 1 fi # Process each file in directory_1, comparing it to directory_2 missing=0 while IFS= read -r -d $'\0' filename do fn=$ if [ ! -f "$2/$fn" ]; then echo "$fn is missing from $2" missing=$((missing + 1)) fi done < <(find "$1" -type f -print0) echo "$missing files missing" 

Note that I have added double-quotes around $1 and $2 at various places above to protect them shell expansion. Without the double-quotes, directory names with spaces or other difficult characters would cause errors.

while IFS= read -r -d $'\0' filename do fn=$ if [ ! -f "$2/$fn" ]; then echo "$fn is missing from $2" missing=$((missing + 1)) fi done < <(find "$1" -type f -print0) 

This uses find to recursively dive into directory $1 and find file names. The construction while IFS= read -r -d $'\0' filename; do . done < <(find "$1" -type f -print0) is safe against all file names.

basename is no longer used because we are looking at files within subdirectories and we need to keep the subdirectories. So, in place of the call to basename , the line fn=$ is used. This just removes from filename the prefix containing directory $1 .

Problem 2

Suppose that we match files by name but regardless of directory. In other words, if the first directory contains a file a/b/c/some.txt , we will consider it present in the second directory if file some.txt exists in any subdirectory of the second directory. To do this replace the loop above with:

while IFS= read -r -d $'\0' filename do fn=$(basename "$filename") if ! find "$2" -name "$fn" | grep -q . ; then echo "$fn is missing from $2" missing=$((missing + 1)) fi done < <(find "$1" -type f -print0) 

I prepared two directories like cmp1 and cmp2 I put a number of identical files in both of them, when I run above script, it reported the result as expected but when I move some files to a subfolder in cmp2 it reported moved files as missing in cmp2

Читайте также:  Стандартный пароль root линукс

@kenn Yes. If you want it to search cmp2 for a matching file name anywhere within cmp2 , that requires a different approach.

FSlint is a small GUI application that helps you identify and clean your system of redundant files.

Installing FSlint

Install FSlint from the Ubuntu Software Center, or from the command line as follows:

sudo apt-get install fslint 

(On my system, installing FSlint did not pull in additional dependencies. Specifically, fslint depends on findutils, python, and python-glade2, which should all be on your system already. You can remove FSlint using the Software Center or by typing sudo apt-get autoremove --purge fslint in a terminal).

Searching for Files

Launch FSlint from the Unity Dash.

Here is a screen-shot of the main screen. There are many advanced features, but basic usage of the application is relatively straightforward.

Click the Add button at the top left to add all the directories you would like to check. Obviously, you can remove directories using the Remove button.

enter image description here

Make sure the recurse? check-box at the the right is selected. Then click the Find button. (Any errors, such as file permission issues, will be printed at the bottom of the FSlint window).

FSlint will list all of the duplicate files, their directory locations, and the file date. FSlint also presents you with the number of bytes wasted due to the redundant files.

Removing Duplicates

Now you can select multiple files using the Shift or Ctrl keys and left mouse button. If you want to select multiple files automatically, click on the Select button and you will be given options such as selecting files based on date, or entering wild card selection criteria.

If you need to use the list of selected files outside of FSlint (perhaps as input to your own script) click on the Save button to save a text file.

Finally you can delete the selected files using the Delete button, or you can merge the selected files using the Merge button. Note that the Merge feature removes the unselected files from your system and creates hard links to the corresponding selected files. You would use this feature if you wanted to keep your existing file structure, but wanted to free up some space on your system.

enter image description here

Additional Features & Documentation

FSlint has other powerful features which are accessible from the tabs in the left pane. I have found Name clashes to be useful where there are files that have the same name, but are different (perhaps because you saved a newer version of a file in a different directory).

There is also an Advanced search parameters tab at the top of the FSlint window that allows you to exclude certain directories in your search, or filter your results using parameters.

Источник

How to Find Difference Between Two Directories Using Diff and Meld Tools

In an earlier article, we reviewed 9 best file comparison and difference (Diff) tools for Linux and in this article, we will describe how to find the difference between two directories in Linux.

Читайте также:  How to create new file linux

Normally, to compare two files in Linux, we use the diff – a simple and original Unix command-line tool that shows you the difference between two computer files; compares files line by line and it is easy to use, comes with pre-installed on most if not all Linux distributions.

The question is how do we get the difference between two directories in Linux? Here, we want to know what files/subdirectories are common in the two directories, those that are present in one directory but not in the other.

The conventional syntax for running diff is as follows:

$ diff [OPTION]… FILES $ diff options dir1 dir2

By default, its output is ordered alphabetically by file/subdirectory name as shown in the screenshot below. In this command, the -q switch tells diff to report only when files differ.

$ diff -q directory-1/ directory-2/

Difference Between Two Directories

Again diff doesn’t go into the subdirectories, but we can use the -r switch to read the subdirectories as well like this.

$ diff -qr directory-1/ directory-2/

Using Meld Visual Diff and Merge Tool

There is a cool graphical option called meld (a visual diff and merge tool for the GNOME Desktop) for those who enjoy using the mouse, you can install it as follows.

$ sudo apt install meld [Debian/Ubuntu systems] $ sudo yum install meld [RHEL/CentOS systems] $ sudo dnf install meld [Fedora 22+]

Once you have installed it, search for “meld” in the Ubuntu Dash or Linux Mint Menu, in Activities Overview in Fedora or CentOS desktop and launch it.

You will see the Meld interface below, where you can choose file or directory comparison as well as version control view. Click on directory comparison and move to the next interface.

Meld Comparison Tool

Select the directories you want to compare, note that you can add a third directory by checking the option “3-way Comparison”.

Select Comparison Directories

Once you selected the directories, click on “Compare”.

Listing Difference Between Directories

In this article, we described how to find the difference between two directories in Linux. If you know any other commandline or gui way don’t forget to share your thoughts to this article via the comment section below.

Источник

Recursively compare two directories with diff -r without output on broken links

I am using diff -r a b to recursively compare directories a and b. It often happens though that there are some broken links (the same broken links in both a and b directories and pointing to the same, non-existing targets). diff then outputs error messages for those cases and exits with a non-zero exit code, however I would like it to stay silent, and exit with 0 as the directories are the same in my book. How can I do that?

Do you still want the symlinks compared (and identified as equivalent but broken), or is it acceptable to ignore all symlinks when doing this diff?

compared and identified as equivalent, I don't care if they are broken. I am just trying to verify that my rsync worked.

3 Answers 3

For version 3.3 or later of diff , you should use the --no-dereference option, as described in Pete Harlan's answer.

Читайте также:  Mangohud linux mint установка

Unfortunately, older versions of diff don't support ignoring symlinks:

Some files are neither directories nor regular files: they are unusual files like symbolic links, device special files, named pipes, and sockets. Currently, diff treats symbolic links like regular files; it treats other special files like regular files if they are specified at the top level, but simply reports their presence when comparing directories. This means that patch cannot represent changes to such files. For example, if you change which file a symbolic link points to, diff outputs the difference between the two files, instead of the change to the symbolic link.

diff should optionally report changes to special files specially, and patch should be extended to understand these extensions.

If all you want is to verify an rsync (and presumably fix what's missing), then you could just run the rsync command a second time. If you don't want to do that, then check-summing the directory may be sufficient.

If you really want to do this with diff , then you can use find to skip the symlinks, and run diff on each file individually. Pass your directories a and b in as arguments:

#!/bin/bash # Skip files in $1 which are symlinks for f in `find $1/* ! -type l` do # Suppress details of differences diff -rq $f $2/$ done 
for f in `find a/* ! -type l`;do diff -rq $f b/$;done 

This will identify files that differ in content, or files which are in a but not in b.

  • since we are skipping symlinks entirely, this won't notice if symlink names are not present in b. If you required that, you would need a second find pass to identify all the symlinks and then explicitly check for their existence in b.
  • Extra files in b will not be identified, since the list is constructed from the contents of a. This probably isn't a problem for your rsync scenario.

Источник

Recursively compare directories with summary on different contents without examining file contents' differences

I want to compare the contents of two directories, recursively, showing which files are missing from one or the other, and which files have different content. But I don't want output on the differences within the files, just whether they are different or not. There won't be any links to worry about. I hope this isn't a duplicate, I've trawled through examples and can't find an answer to this. Thanks

2 Answers 2

Usually this looks already good:

Unfortunately, diff -rq doesn't account for Unicode equivalence in file names. This causes problems, because my external SSD from SanDisk uses different byte codes in Unicode file names than my Mac laptop.

For example, my MacBook has the following files:

tmp/Česky.txt tmp/Česky/README.txt 

My SSD has the following identical files:

/Volumes/MySSD/Česky.txt /Volumes/MySSD/Česky/README.txt 

The names look exactly identical in browser, but on my machine they are actually different. Giving different bytecodes in Python:

# python3 >>> 'Česky'.encode() b'C\xcc\x8cesky' >>> 'Česky'.encode() b'\xc4\x8cesky' 

When I run diff -rq tmp/ /Volumes/MySSD/ , I get:

Only in /Volumes/MySSD/: Česky Only in /Volumes/MySSD/: Česky.txt Only in tmp/: Česky Only in tmp/: Česky.txt 

Even though I just copied these file from laptop to the external drive! The contents of files are identical.

To avoid same issue, you can use this Python script for comparing directories.

Источник

Оцените статью
Adblock
detector