- Create checksum sha256 of all files and directories?
- 6 Answers 6
- TL;DR
- Intro
- Piping output to file, compare with diff
- Fixing order of files found with find by piping to sort
- Other sha/md5 sums
- A complete 1 line command to compare 2 directories with 1 shasum output
- Problem relating to diff output and paths used
- Learn How to Generate and Verify Files with MD5 Checksum in Linux
Create checksum sha256 of all files and directories?
I need to create a list of checksums of the files that are inside a directory, including any subdirectories. The command that I try to execute is the following:
-b = Read in Binary. * = Specifies that you must verify all file extensions.
sha256sum: test0: Is a directory e3d748fdf10adca15c96d77a38aa0447fa87af9c297cb0b75e314cc313367daf *test1.txt db0c7a354881fe2dd1b45642a68f6a971c7421e8fdffe56ffa7c740111e07274 *test2.txt
Instead of reporting that test0 is a directory, you should also generate the checksum of the content. Do you recommend always using -b in any type of file? In what cases should -t be used? Is it possible to filter the types of files I want to omit in the verification, without having to add all the files I want to admit? What command should I execute? I looked for help but I do not find anything related.
6 Answers 6
You can use find to find all files in the directory tree, and let it run sha256sum . The following command line will create checksums for the files in the current directory and its subdirectories.
find . -type f -exec sha256sum <> \;
I don’t use the options -b and -t , but if you wish, you can use -b for all files. The only difference that I notice is the asterisk in front of each file name.
Excellent! And why should we add find instead of containing the option within the same sha256sum program? Does this usually happen?
Now I do not understand the use of the curly braces <> well. I was reading a bit more but I found that «it can be used as a placeholder for each file that locates the search command» what does that mean? Does it refer to the coloring of the text or some other reason? I tried inserting a route / test and accepted it. This confuses me even more. It’s just a curiosity to learn more about the parameters used.
Using find is a good way to find files in subdirectories, and with the -exec option it is possible to run commands with parameters <> . Each file found by find will be replacing the spaceholder <> , so in your case sha256sum will work on each of the files one after another.
Thank you so much for everything. As a clarification, due to tests that I was doing, if this command is going to be used; you should not use the -b option if you do not want to have to edit the text later because when you run (sha256sum -c) you can not find the path of the files. However, I wonder if there will be a difference between using -b or not.
@MarianoM the -b flag means to open the file in binary mode. This would only make a difference on systems where binary and text mode are different, for example Windows uses \r\n for line endings and text mode will convert that to \n . For any Linux, binary and text modes should be the same.
TL;DR
cd /path/to/working/directory sha256sum <(find . -type f -exec sha256sum \; | sort)
Intro
A more complete answer to the one above, which fixes the problem with find "finding" files in different orders on different systems.
Piping output to file, compare with diff
Firstly, you probably want to pipe the output to a file for comparison with diff. For this you would use
find . -type f -exec sha256sum <> \; > file1.lst
Then on your other system
find . -type f -exec sha256sum <> \; > file2.lst rsync file2.lst user@host:/home/user/file2.lst ssh user@host diff file1.lst file2.lst # might not match due to order
Fixing order of files found with find by piping to sort
Here I am assuming you are doing something similar to what I required this for - copying files from one system to another over a network and verifying the integrity of those files.
What I found was that the order in which find finds files can vary between two systems, even when the OS is "Debian" in both cases.
Therefore, one needs to sort the output in the text files.
sort file1.lst > file1sorted.lst sort file2.lst > file2sorted.lst diff file1.lst file2.lst # bad diff file1sorted.lst file2sorted.lst # ok
You can do the find and sort all in one line, while redirecting the output to a file.
find . -type f -exec sha256sum <> \; | sort > file1.lst
Other sha/md5 sums
You might want to have an increased level of shasumming. To use the 512 bit version simply do;
find . -type f -exec sha512sum <> \; | sort > file1.lst
Alternatively, 256 bit might be overkill for what you are doing, so do
find . -type f -exec md5sum <> \; | sort > file1.lst
A complete 1 line command to compare 2 directories with 1 shasum output
Now, if you have many files and do not want to save the output to a file, you could simply shasum the output. To do this, use
The pipe to sort is required to ensure the output is sorted before computing the final sha256sum . Without this, if find finds files in a different order, despite the shasums for each file being correct, the overall shasum will depend on the order.
Problem relating to diff output and paths used
You may have some path which looks like
where * are the subdirectories and files you are interested in shasumming. If A/B/C are 1 or more directories containing only 1 subfolder you might end up accidentally running your shasum command in the wrong directory, resulting in the following
sort1.txt sha256sum1 ./A/B/C/file1 sort2.txt sha256sum2 ./B/C/file1
Even if sha256sum = sha256sum2 diff will say the files are different. (Because they are due to the different base directory in the path.)
Here is a short python3 code to check the sums line by line, which solves this problem.
#!/usr/bin/env python3 file1_name = "sort1.txt" file2_name = "sort2.txt" file1 = open(file1_name, 'r') file2 = open(file2_name, 'r') file1_lines = file1.readlines(); file2_lines = file2.readlines(); if(len(file1_lines) == len(file2_lines)): print("line numbers ok") for i in range(len(file1_lines)): line1 = file1_lines[i] line2 = file2_lines[i] line1_split = line1.split(' ') line2_split = line2.split(' ') shasum1 = line1_split[0] shasum2 = line2_split[0] if(shasum1 != shasum2): print("shasum error: ", line1) else: print("Error: file ", file1_name, " number of lines != ", file2_name, " number of lines") print("done")
I initially wanted to write a shell script to do this, but I got bored trying to figure out how to do it, so went back to python.
This makes me think that actually writing a python code to do the entire thing would have been easier, except for the find command.
Learn How to Generate and Verify Files with MD5 Checksum in Linux
A checksum is a digit which serves as a sum of correct digits in data, which can be used later to detect errors in the data during storage or transmission. MD5 (Message Digest 5) sums can be used as a checksum to verify files or strings in a Linux file system.
MD5 Sums are 128-bit character strings (numerals and letters) resulting from running the MD5 algorithm against a specific file. The MD5 algorithm is a popular hash function that generates 128-bit message digest referred to as a hash value, and when you generate one for a particular file, it is precisely unchanged on any machine no matter the number of times it is generated.
It is normally very difficult to find two distinct files that results in same strings. Therefore, you can use md5sum to check digital data integrity by determining that a file or ISO you downloaded is a bit-for-bit copy of the remote file or ISO.
In Linux, the md5sum program computes and checks MD5 hash values of a file. It is a constituent of GNU Core Utilities package, therefore comes pre-installed on most, if not all Linux distributions.
Take a look at the contents of /etc/group saved as groups.cvs below.
root:x:0: daemon:x:1: bin:x:2: sys:x:3: adm:x:4:syslog,aaronkilik tty:x:5: disk:x:6: lp:x:7: mail:x:8: news:x:9: uucp:x:10: man:x:12: proxy:x:13: kmem:x:15: dialout:x:20: fax:x:21: voice:x:22: cdrom:x:24:aaronkilik floppy:x:25: tape:x:26: sudo:x:27:aaronkilik audio:x:29:pulse dip:x:30:aaronkilik
The md5sums command below will generate a hash value for the file as follows:
$ md5sum groups.csv bc527343c7ffc103111f3a694b004e2f groups.csv
When you attempt to alter the contents of the file by removing the first line, root:x:0: and then run the command for a second time, try to observe the hash value:
$ md5sum groups.csv 46798b5cfca45c46a84b7419f8b74735 groups.csv
You will notice that the hash value has now changed, indicating that the contents of the file where altered.
Now, put back the first line of the file, root:x:0: and rename it to group_file.txt and run the command below to generate its hash value again:
$ md5sum groups_list.txt bc527343c7ffc103111f3a694b004e2f groups_list.txt
From the output above, the hash value is still the same even when the file has been renamed, with its original content.
Important: md5 sums only verifies/works with the file content rather than the file name.
The file groups_list.txt is a duplicate of groups.csv, so, try to generate the hash value of the files at the same time as follows.
You will see that they both have equal hash values, this is because they have the exact same content.
$ md5sum groups_list.txt groups.csv bc527343c7ffc103111f3a694b004e2f groups_list.txt bc527343c7ffc103111f3a694b004e2f groups.csv
You can redirect the hash value(s) of a file(s) into a text file and store, share them with others. For the two files above, you can issues the command below to redirect generated hash values into a text file for later use:
$ md5sum groups_list.txt groups.csv > myfiles.md5
To check that the files have not been modified since you created the checksum, run the next command. You should be able to view the name of each file along with “OK”.
The -c or --check option tells md5sums command to read MD5 sums from the files and check them.
$ md5sum -c myfiles.md5 groups_list.txt: OK groups.csv: OK
Remember that after creating the checksum, you can not rename the files or else you get a “No such file or directory” error, when you try to verify the files with new names.
$ mv groups_list.txt new.txt $ mv groups.csv file.txt $ md5sum -c myfiles.md5
md5sum: groups_list.txt: No such file or directory groups_list.txt: FAILED open or read md5sum: groups.csv: No such file or directory groups.csv: FAILED open or read md5sum: WARNING: 2 listed files could not be read
The concept also works for strings alike, in the commands below, -n means do not output the trailing newline:
$ echo -n "Tecmint How-Tos" | md5sum - afc7cb02baab440a6e64de1a5b0d0f1b -
$ echo -n "Tecmint How-To" | md5sum - 65136cb527bff5ed8615bd1959b0a248 -
In this guide, I showed you how to generate hash values for files, create a checksum for later verification of file integrity in Linux. Although security vulnerabilities in the MD5 algorithm have been detected, MD5 hashes still remains useful especially if you trust the party that creates them.
Verifying files is therefore an important aspect of file handling on your systems to avoid downloading, storing or sharing corrupted files. Last but not least, as usual reach us by means of the comment form below to seek any assistance, you can as well make some important suggestions to improve this post.