- How can I calculate an MD5 checksum of a directory?
- 16 Answers 16
- Проверка целостности скачанного файла
- How to Use the md5sum Command in Linux
- The md5sum Command with Examples
- Read in Binary Mode
- Read in Text Mode
- Create a BSD-Style Checksum
- Validate md5 Checksum with a File
- Validate Multiple Files
- Display Only Modified Files
- Generate Status Only
- Check Improperly Formatted Checksum Lines
- Skip Reporting Status for Missing Files
- Show Help and Version Information
How can I calculate an MD5 checksum of a directory?
I need to calculate a summary MD5 checksum for all files of a particular type ( *.py for example) placed under a directory and all sub-directories. What is the best way to do that? The proposed solutions are very nice, but this is not exactly what I need. I’m looking for a solution to get a single summary checksum which will uniquely identify the directory as a whole — including content of all its subdirectories.
Why would you have two directory trees that may or may not be «the same» that you want to uniquely identify? Does file create/modify/access time matter? Is version control what you really need?
What is really matter in my case is similarity of the whole directory tree content which means AFAIK the following: 1) content of any file under the directory tree has not been changed 2) no new file was added to the directory tree 3) no file was deleted
16 Answers 16
Create a tar archive file on the fly and pipe that to md5sum :
This produces a single MD5 hash value that should be unique to your file and sub-directory setup. No files are created on disk.
@CharlesB with a single check-sum you never know which file is different. The question was about a single check-sum for a directory.
ls -alR dir | md5sum . This is even better no compression just a read. It is unique because the content contains the mod time and size of file 😉
@Daps0l — there is no compression in my command. You need to add z for gzip, or j for bzip2. I’ve done neither.
Take care that doing this would integrate the timestamp of the files and other stuff in the checksum computation, not only the content of the files
This is cute, but it doesn’t really work. There’s no guarantee that tar ing the same set of files twice, or on two different computers, will yield the same exact result.
find /path/to/dir/ -type f -name "*.py" -exec md5sum <> + | awk '' | sort | md5sum
The find command lists all the files that end in .py. The MD5 hash value is computed for each .py file. AWK is used to pick off the MD5 hash values (ignoring the filenames, which may not be unique). The MD5 hash values are sorted. The MD5 hash value of this sorted list is then returned.
I’ve tested this by copying a test directory:
I renamed some of the files in ~/pybin2.
The find. md5sum command returns the same output for both directories.
2bcf49a4d19ef9abd284311108d626f1 -
To take into account the file layout (paths), so the checksum changes if a file is renamed or moved, the command can be simplified:
find /path/to/dir/ -type f -name "*.py" -exec md5sum <> + | md5sum
find /path/to/dir/ -type f -name "*.py" -exec md5 <> + | md5
Note that the same checksum will be generated if a file gets renamed. So this doesn’t truly fit a «checksum which will uniquely identify the directory as a whole» if you consider file layout part of the signature.
you could slightly change the command-line to prefix each file checksum with the name of the file (or even better, the relative path of the file from /path/to/dir/) so it is taken into account in the final checksum.
@zim2001: Yes, it could be altered, but as I understood the problem (especially due to the OP’s comment under the question), the OP wanted any two directories to be considered equal if the contents of the files were identical regardless of filename or even relative path.
- tar processes directory entries in the order which they are stored in the filesystem, and there is no way to change this order. This effectively can yield completely different results if you have the «same» directory on different places, and I know no way to fix this (tar cannot «sort» its input files in a particular order).
- I usually care about whether groupid and ownerid numbers are the same, not necessarily whether the string representation of group/owner are the same. This is in line with what for example rsync -a —delete does: it synchronizes virtually everything (minus xattrs and acls), but it will sync owner and group based on their ID, not on string representation. So if you synced to a different system that doesn’t necessarily have the same users/groups, you should add the —numeric-owner flag to tar
- tar will include the filename of the directory you’re checking itself, just something to be aware of.
As long as there is no fix for the first problem (or unless you’re sure it does not affect you), I would not use this approach.
The proposed find -based solutions are also no good because they only include files, not directories, which becomes an issue if you the checksumming should keep in mind empty directories.
Finally, most suggested solutions don’t sort consistently, because the collation might be different across systems.
This is the solution I came up with:
dir=; (find "$dir" -type f -exec md5sum <> +; find "$dir" -type d) | LC_ALL=C sort | md5sum
Notes about this solution:
- The LC_ALL=C is to ensure reliable sorting order across systems
- This doesn’t differentiate between a directory «named\nwithanewline» and two directories «named» and «withanewline», but the chance of that occurring seems very unlikely. One usually fixes this with a -print0 flag for find , but since there’s other stuff going on here, I can only see solutions that would make the command more complicated than it’s worth.
PS: one of my systems uses a limited busybox find which does not support -exec nor -print0 flags, and also it appends ‘/’ to denote directories, while findutils find doesn’t seem to, so for this machine I need to run:
dir=; (find "$dir" -type f | while read f; do md5sum "$f"; done; find "$dir" -type d | sed 's#/$##') | LC_ALL=C sort | md5sum
Luckily, I have no files/directories with newlines in their names, so this is not an issue on that system.
Проверка целостности скачанного файла
В Ubuntu и других дистрибутивах Linux также можно воспользоваться графической программой Gtkhash, установить ее можно командой:
sudo apt-get install gtkhash
В Windows используйте программу HashCalc. Ее можно скачать с официального сайта: slavasoft.com
В результате программа должна показать контрольную сумму (набор букв и цифр), примерно в таком виде:
463e4e1561df2d0a4e944e91fcef63fd
Ее нужно сверить с контрольной суммой, указанной на официальном сайте.
Если контрольная сумма совпала, значит можно использовать файл, а если не совпала — скачать файл заново.
Когда скачиваете образ системы, то проверять контрольную сумму нужно обязательно. В остальных случаях, например при скачивании видео и музыки — на ваше усмотрение.
- Сайт
- Об Ubuntu
- Скачать Ubuntu
- Семейство Ubuntu
- Новости
- Форум
- Помощь
- Правила
- Документация
- Пользовательская документация
- Официальная документация
- Семейство Ubuntu
- Материалы для загрузки
- Совместимость с оборудованием
- RSS лента
- Сообщество
- Наши проекты
- Местные сообщества
- Перевод Ubuntu
- Тестирование
- RSS лента
© 2018 Ubuntu-ru — Русскоязычное сообщество Ubuntu Linux.
© 2012 Canonical Ltd. Ubuntu и Canonical являются зарегистрированными торговыми знаками Canonical Ltd.
How to Use the md5sum Command in Linux
When you download a file from the internet, it is a good safety practice to check whether you received the original version. Comparing checksums you received from the file creator with the ones you obtain by checking the file yourself is a reliable way to confirm your download’s integrity.
The md5sum command in Linux helps create, read, and check file checksums.
In this tutorial, you will learn how to use the md5sum command to validate the files you receive.
The md5sum Command with Examples
When used on a file without any options, the md5sum command displays the file’s hash value alongside the filename. The syntax is:
After obtaining the hash value, compare it with the MD5 value provided by the file creator.
Note: While md5sum is a reliable method for testing whether the file you received has been compromised, it is useful only if you know that the website you downloaded it from is secure. If hackers gain access to the website, they can change both the file and its checksum, making it appear as if the file you are downloading is safe.
Read in Binary Mode
To read the file in binary mode, use the -b option ( —binary ):
The * character before the file name means that md5sum read it in binary mode.
Read in Text Mode
Use the -t option ( —text ) to read the file in text mode:
Text mode is the default mode for reading files with md5sum .
Create a BSD-Style Checksum
Using the —tag option outputs the hash value in the BSD-style format:
Validate md5 Checksum with a File
To check a file by comparing its hash value with the value provided in a hash file, use the -c option.
1. As an example, create a hash file containing the md5sum output:
md5sum [filename] > [file-containing-hashes]
2. Use the following syntax to compare the hash value from the file you created against the current hash value of the .txt file:
md5sum -c [file-containing-hashes]
3. If you change the contents of the file and repeat the check, a warning message is displayed:
Validate Multiple Files
Use the same md5sum -c procedure to check the integrity of multiple files:
md5sum [filename1] [filename2] [filename3] > [file-containing-hashes]
In the following example, the contents of example2.txt have changed, resulting in a warning message from md5sum :
Display Only Modified Files
The —quiet option displays only the files whose hash value has changed. It skips the output of validated files.
md5sum --quiet -c [file-containing-hashes]
Generate Status Only
The md5sum command with the —status option does not produce any output but returns 0 if there are no changes and 1 if it detects changes. This argument is useful for scripting, where there is no need for standard output.
The example script below illustrates the use of the —status option:
#!/bin/bash md5sum --status -c hashfile Status=$? echo "File check status is: $Status" exit $Status
When the script executes, it shows status 1 , meaning that md5sum detected the change made earlier in example2.txt .
Check Improperly Formatted Checksum Lines
Add the —strict option to exit non-zero for improperly formatted hash values:
md5sum --strict -c [file-containing-hashes]
The example shows the output of md5sum —strict when you put invalid characters in the first line of the file containing hashes:
To display which line has an invalid hash, use -w ( —warn ):
md5sum -w -c [file-containing-hashes]
The example above shows the -w option displaying that the improperly formatted MD5 checksum line is line 1 of the file.
Skip Reporting Status for Missing Files
By default, md5sum shows warnings about the files it cannot find on the system. To override this behavior, use the —ignore-missing option:
md5sum --ignore-missing -c [file-containing-hashes]
In the example below, example1.txt was deleted before running the md5sum command. The output ignores the deleted file:
Show Help and Version Information
To get the official help for the md5sum command, type:
To check md5sum version, type:
Note: You should also check out our overview of the diff command to learn how to compare two files line by line.
After completing this tutorial, you should know how to properly use the md5sum command to create, print, or check MD5 checksums.
Marko Aleksić is a Technical Writer at phoenixNAP. His innate curiosity regarding all things IT, combined with over a decade long background in writing, teaching and working in IT-related fields, led him to technical writing, where he has an opportunity to employ his skills and make technology less daunting to everyone.
The echo command prints out a text string you provide as the output message. This tutorial covers the echo.
The ls command (short for list) lists information about directories and any type of files in the working.
A list of all the important Linux commands in one place. Find the command you need, whenever you need it or.
Creating a file in Linux might seem straightforward, but there are some surprising and clever techniques. In.