Linux md5 всех файлов

md5 all files in a directory tree

I would like to create a bash script that makes a md5 checksum of every file in this directory. I want to be able to type the script name in the CLI and then the path to the directory I want to hash and have it work. I’m sure there are many ways to accomplish this. Currently I have:

#!/bin/bash for file in "$1" ; do md5 >> "$__checksums.md5" done 

This just hangs and it not working. Perhaps I should use find? One caveat — the directories I want to hash will have files with different extensions and may not always have this exact same tree structure. I want something that will work in these different situations, as well.

8 Answers 8

Using md5deep

md5deep -r path/to/dir > sums.md5 

Using find and md5sum

find relative/path/to/dir -type f -exec md5sum <> + > sums.md5 

Be aware, that when you run check on your MD5 sums with md5sum -c sums.md5 , you need to run it from the same directory from which you generated sums.md5 file. This is because find outputs paths that are relative to your current location, which are then put into sums.md5 file.

If this is a problem you can make relative/path/to/dir absolute (e.g. by puting $PWD/ in front of your path). This way you can run check on sums.md5 from any location. Disadvantage is, that now sums.md5 contains absolute paths, which makes it bigger.

You can put this function to your .bashrc file (located in your $HOME directory):

function md5sums < if [ "$#" -lt 1 ]; then echo -e "At least one parameter is expected\n" \ "Usage: md5sums [OPTIONS] dir" else local OUTPUT="checksums.md5" local CHECK=false local MD5SUM_OPTIONS="" while [[ $# >1 ]]; do local key="$1" case $key in -c|--check) CHECK=true ;; -o|--output) OUTPUT=$2 shift ;; *) MD5SUM_OPTIONS="$MD5SUM_OPTIONS $1" ;; esac shift done local DIR=$1 if [ -d "$DIR" ]; then # if $DIR directory exists cd $DIR # change to $DIR directory if [ "$CHECK" = true ]; then # if -c or --check option specified md5sum --check $MD5SUM_OPTIONS $OUTPUT # check MD5 sums in $OUTPUT file else # else find . -type f ! -name "$OUTPUT" -exec md5sum $MD5SUM_OPTIONS <> + > $OUTPUT # Calculate MD5 sums for files in current directory and subdirectories excluding $OUTPUT file and save result in $OUTPUT file fi cd - > /dev/null # change to previous directory else cd $DIR # if $DIR doesn't exists, change to it to generate localized error message fi fi > 

After you run source ~/.bashrc , you can use md5sums like normal command:

will generate checksums.md5 file in path/to/dir directory, containing MD5 sums of all files in this directory and subdirectories. Use:

to check sums from path/to/dir/checksums.md5 file.

Note that path/to/dir can be relative or absolute, md5sums will work fine either way. Resulting checksums.md5 file always contains paths relative to path/to/dir . You can use different file name then default checksums.md5 by supplying -o or —output option. All options, other then -c , —check , -o and —output are passed to md5sum .

Читайте также:  Kali linux hacking site

First half of md5sums function definition is responsible for parsing options. See this answer for more information about it. Second half contains explanatory comments.

Note: In my experience using -exec command <> + variant of exec makes it run faster than when variant -exec command ; is used.

find /path/you/need -type f -exec md5sum <> \; > checksums.md5

Update#1: Improved the command based on @twalberg’s recommendation to handle white spaces in file names.

Update#2: Improved based on @jil’s suggestion, to remove unnecessary xargs call and use -exec option of find instead.

Update#3: @Blake a naive implementation of your script would look something like this:

#!/bin/bash # Usage: checksumchecker.sh find "$1" -type f -exec md5sum <> \; > "$1"__checksums.md5 

I would recommend find /path -type f -print0 | xargs -0 md5sum , to deal with file names that otherwise might get unintentionally split due to whitespace.

Thank you @taskalman. «You can build the path and output file name from $1 if we put it in your script. Note that you will have to handle the slashes in your path parameter to make it part of the filename in your script.» Can you explain this a bit further? I don’t quite understand what you mean.

@Blake sorry, misunderstood the logic, you van ignore that part, I remove it from the answer. I didn’t realize you want tó save the file __checksums.md5 in the root of your search path.

@taskalman I still want to use the $1 so that I don’t haveto have to type out a find command each time, rather just a script name, say «makechecksum», and drop the directory to be hashed into the CLI.

Updated Answer

If you like the answer below, or any of the others, you can make a function that does the command for you. So, to test it, type the following into Terminal to declare a function:

function sumthem() < find "$1" -type f -print0 | parallel -0 -X md5 >checksums.md5; > 
sumthem /Users/somebody/somewhere 

If that works how you like, you can add that line to the end of your «bash profile» and the function will be declared and available whenever you are logged in. Your «bash profile» is probably in $HOME/.profile

Original Answer

Why not get all your CPU cores working in parallel for you?

find . -type f -print0 | parallel -0 -X md5sum 

This finds all the files ( -type f ) in the current directory ( . ) and prints them with a null byte at the end. These are then passed passed into GNU Parallel, which is told that the filenames end with a null byte ( -0 ) and that it should do as many files as possible at a time ( -X ) to save creating a new process for each file and it should md5sum the files.

This approach will pay the largest bonus, in terms off speed, with big images like Photoshop files.

Yes, this would work, but I would have to continually re-type the whole command. I want a script where the directory to be hashed is a variable and the name of the file output is based on that variable. I don’t want to have to type out a find command each time — just a script name, say «makechecksum», and drop the directory to be hashed into the CLI.

#!/bin/bash shopt -s globstar md5sum "$1"/** > "$__checksums.md5" 

Explanation: shopt -s globstar (manual) enables ** recursive glob wildcard. It will mean that «$1″/** will expand to list of all the files recursively under the directory given as parameter $1 . Then the script simply calls md5sum with this file list as parameter and > «$__checksums.md5» redirects the output to the file.

Читайте также:  Canon mf742cdw драйвер linux

your answer may be correct , but it would be a lot more helpful if you could explain what it does or how it works

@Blake Be aware that this won’t hash hidden files. If you want hidden files to not be ignored, then activate dotglob option: shopt -s dotglob [source].

@jil this is just hanging for me #!/usr/bin/env bash shopt -s globstar for file in «$1″/** ; do md5deep -br >> «$<1>__checksums.md5″ done

md5deep -r $your_directory | awk | sort | md5sum | awk

Please edit your answer to include some explanation. Code-only answers do very little to educate future SO readers. Your answer is in the moderation queue for being low-quality.

Use find command to list all files in directory tree, then use xargs to provide input to md5sum command

find dirname -type f | xargs md5sum > checksums.md5 

In case you prefer to have separate checksum files in every directory, rather than a single file, you can

  • find all subdirectories
  • keep only those which actually contain files (not only other subdirs)
  • cd to each of them and create a checksums.md5 file inside that directory

Here is a an example script which does that:

#!/bin/bash # Do separate md5 files in each subdirectory md5_filename=checksums.md5 dir="$1" [ -z "$dir" ] && dir="." # Check OS to select md5 command if [[ "$OSTYPE" == "linux-gnu"* ]]; then is_linux=1 md5cmd="md5sum" elif [[ "$OSTYPE" == "darwin"* ]]; then md5cmd="md5 -r" else echo "Error: unknown OS '$OSTYPE'. Don't know correct md5 command." exit 1 fi # go to base directory after saving where we started start_dir="$PWD" cd "$dir" # if we're in a symlink cd to the real path if [ ! "$dir" = "$(pwd -P)" ]; then dir="$(pwd -P)" cd "$dir" fi if [ "$PWD" = "/" ]; then die "Refusing to do it on system root '$PWD'" fi # Find all folders to process declare -a subdirs=() declare -a wanted=() # find all non-hidden subdirectories (not if the name begins with "." like ".Trashes", ".Spotlight-V100", etc.) while IFS= read -r; do subdirs+=("$PWD/$REPLY"); done < <(find . -type d -not -name ".*" | LC_ALL=C sort) # count files and if there are any, add dir to "wanted" array echo "Counting files and sizes to process . " for d in "$dir" "$"; do # include "$dir" itself, not only it's subdirs files_here=0 while IFS= read -r ; do (( files_here += 1 )) done < <(find "$d" -maxdepth 1 -type f -not -name "*.md5") (( files_here )) && wanted+=("$d") done echo "Found $<#wanted[@]>folders to process:" printf " * %s\n" "$" if [ "$" = 0 ]; then echo "Nothing to do. Exiting." exit 0 fi for d in "$"; do cd "$d" find . -maxdepth 1 -type f -not -name "$md5_filename" -print0 \ | LC_ALL=C sort -z \ | while IFS= read -rd '' f; do $md5cmd "$f" | tee -a "$md5_filename" done cd "$dir" done cd "$start_dir" 

(This is actually a very simplified version of this «md5dirs» script on Github. The original is quite specific and more complex, making it less illustrative as an example, and more difficult to adapt to other different needs.)

Источник

контрольная сумма для всей папки в терминале linux?

требуется сверить контрольную сумму залитых на сервер кучи файлов, которые находятся в одной папке. по сути без разницы каким алгоритмом, лишь бы сверить, чем менее ресурснозатратно и более быстро, тем лучше. можно выполнить чек сумм прямо на папку? как? например папка тут /home/user/thisfolderforchksumm вбил :~/thisfolderforchksumm$ md5sum — может конечно и работает, но что-то долго не выводит, а я для примера всего лишь два файла туда сунул (из 1000 нужных). ну или контрольную сумму на всю сразу группу файлов, находящиеся в этой папке одной командой? как вариант — в filezilla нет функции сравнения именно по хешу? (просто сравнение каталогов знаю) сенкс

md5sum без параметров ожидает данных со стандартного ввода. он никогда не сакончиться пока Ctrl+D не нажмете. сделайте md5sum *

ок, спасибо, но md5sum * выкатывает суммы всех файлов, находящихся в папке, по каждому отдельно, если зайти в нее ‘:~/thisfolderforchksumm$ md5sum *’ а как на всю папку? я как не прописывал, только ругается что директория.. как прописать на папку?

находясь в папке cat * | md5sum правда ему важен порядок файлов, а в каком порядке шел разложит * я не в курсе.

ок, спасибо! но ведь, если мне нужна полная копия всех внутри файлов и они все лежать подряд, то по идее ведь не должна контрольная сумма отличаться (какая разница как он их там высчитывает?) . ?

1 ответ 1

прочитать (fopen()+fread() и т.д. и т.п.) можно содержимое файла, но что именно должно происходить при «чтении самого каталога» (1), насколько мне известно, не описано ни в каком стандарте (ну, разве что в plan9 что-нибудь эдакое выдумали по поводу «чтения каталога»).

если вы принципиально не пользуетесь чем-либо вроде программы rsync для копирования и/или сверки актуальности копии файлов/каталогов, то, чтобы не сравнивать два списка с контрольными суммами (что, к слову, очень легко сделать с помощью программы diff), можно подсчитать контрольные суммы самих списков:

вместо программы sum можно использовать любую другую аналогичную программу: cksum, md*sum, sha*sum и т.п.

промежуточный вызов программы sort для сортировки списка контрольных сумм — на всякий случай. ведь теоретически оболочка может выдать список одних и тех же файлов («раскрывая» мета-символ * ) в разных случаях в разном порядке.

если в каталоге имеются вложенные каталоги с файлами, и требуется получить и их контрольные суммы, то можно воспользоваться связкой программ find+xargs:

$ find -type f | xargs sum | sort | sum 

Источник

Оцените статью
Adblock
detector