How to compare binary files to check if they are the same?
What is the easiest way (using a graphical tool or command line on Ubuntu Linux) to know if two binary files are the same or not (except for the time stamps)? I do not need to actually extract the difference. I just need to know whether they are the same or not.
The man page for cmp specifically says it does a byte by byte comparison so that is my default for 2 binary files. diff is line by line and will give you the same Yes/No answer but of course not the same dump to the standard out stream. If the lines are long because perhaps they are not text files then I would prefer cmp . diff has the advantage that you can specify a comparison of directories and the -r for recursion thereby comparing multiple files in one command.
15 Answers 15
The standard unix diff will show if the files are the same or not:
[me@host ~]$ diff 1.bin 2.bin Binary files 1.bin and 2.bin differ
If there is no output from the command, it means that the files have no differences.
diff seems to have problems with really large files. I got a diff: memory exhausted when comparing two 13G files.
Interesting output. diff is telling you they are «binary» fies. Since all files can be considered to be binary that’s a strange assertion.
You can report identical files with option: diff -s 1.bin 2.bin or diff —report-identical-files 1.bin 2.bin This shows Files 1.bin and 2.bin are identical
I have two executables, I know they are different because I compiled and ran them, but all options of diff and cmp given here judge them identical. Why? .
I found Visual Binary Diff was what I was looking for, available on:
sudo apt install vbindiff
Nice. I /thought/ I only wanted to know whether the files differed; but being able to see the exact differences easily was a lot more useful. It tended to segfault when I got to the end of the file, but never mind, it still worked.
This should be the accepted answer as it’s a far superior method than the bland and unhelpful output of the canonical diff command.
Use cmp command. This will either exit cleanly if they are binary equal, or it will print out where the first difference occurs and exit.
does the cmp stop when it found the first difference, and display it or it goes through the end of the files?
cmp has «silent» mode: -s, —quiet, —silent — suppress all normal output . I didn’t test yet but I think that it will stop at the first difference if there is one.
I checked it right now for cmp (GNU diffutils) 3.7 . As already stated in the answer, cmp stops at the first difference and specifies it like this: file1 file2 differ: char 14, line 1 .
I ended up using hexdump to convert the binary files to there hex representation and then opened them in meld / kompare / any other diff tool. Unlike you I was after the differences in the files.
hexdump tmp/Circle_24.png > tmp/hex1.txt hexdump /tmp/Circle_24.png > tmp/hex2.txt meld tmp/hex1.txt tmp/hex2.txt
Use hexdump -v -e ‘/1 «%02x\n»‘ if you want to diff and see exactly which bytes were inserted or removed.
Meld also works with binary files when they aren’t converted to hex first. It shows hex values for things which aren’t in the char set, otherwise normal chars, which is useful with binary files that also contain some ascii text. Many do, at least begin with a magic string.
Use sha1 to generate checksum:
sha1 [FILENAME1] sha1 [FILENAME2]
If you only had a checksum for one of the files, this would be useful, but if you have both files on disk this is unnecessary. diff and cmp will both tell you if they differ without any extra effort.
SHA1 has already one public collision (shattered.io) and probably some non-public as well. One collision can be used to generate countless of colliding files Use SHA2 for hashing instead please.
You can use MD5 hash function to check if two files are the same, with this you can not see the differences in a low level, but is a quick way to compare two files.
If both MD5 hashes (the command output) are the same, then, the two files are not different.
Can you explain your down votes please? SHA1 has 4 upvotes, and if the OP thinks there’s a chance the two files could be the same or similar, the chances of a collision are slight and not worthy of down voting MD5 but up voting SHA1 other than because you heard you should hash your passwords with SHA1 instead of MD5 (that’s a different problem).
not sure about the reason but a pure cmp will be more efficient than computing any hash function of files and comparing them (at least for only 2 files)
if the two files are large and on the same disk (not ssd), the md5 or sha* variant might be faster because the disks can read the two files sequentially which saves lots of head movements
I downvoted because you posted a minor variant of an earlier (bad) solution, when it should have been a comment.
Try diff -s
Short answer: run diff with the -s switch.
Long answer: read on below.
Here’s an example. Let’s start by creating two files with random binary contents:
$ dd if=/dev/random bs=1k count=1 of=test1.bin 1+0 records in 1+0 records out 1024 bytes (1,0 kB, 1,0 KiB) copied, 0,0100332 s, 102 kB/s $ dd if=/dev/random bs=1k count=1 of=test2.bin 1+0 records in 1+0 records out 1024 bytes (1,0 kB, 1,0 KiB) copied, 0,0102889 s, 99,5 kB/s
Now let’s make a copy of the first file:
$ cp test1.bin copyoftest1.bin
Now test1.bin and test2.bin should be different:
$ diff test1.bin test2.bin Binary files test1.bin and test2.bin differ
. and test1.bin and copyoftest1.bin should be identical:
$ diff test1.bin copyoftest1.bin
But wait! Why is there no output.
The answer is: this is by design. There is no output on identical files.
But there are different error codes:
$ diff test1.bin test2.bin Binary files test1.bin and test2.bin differ $ echo $? 1 $ diff test1.bin copyoftest1.bin $ echo $? 0
Now fortunately you don’t have to check error codes each and every time because you can just use the -s (or —report-identical-files ) switch to make diff be more verbose:
$ diff -s test1.bin copyoftest1.bin Files test1.bin and copyoftest1.bin are identical
Use cmp command. Refer to Binary Files and Forcing Text Comparisons for more information.
-b doesn’t compare files in «binary mode». It actually «With GNU cmp , you can also use the -b or —print-bytes option to show the ASCII representation of those bytes.». This is exactly what I found using URL to manual that you have provided.
Victor Yarema, I don’t know what you mean by «binary mode». cmp is inherently a binary comparison in my opinion. The -b option merely prints the first byte that is different.
Diff with the following options would do a binary comparison to check just if the files are different at all and it’d output if the files are the same as well:
If you are comparing two files with the same name in different directories, you can use this form instead:
For finding flash memory defects, I had to write this script which shows all 1K blocks which contain differences (not only the first one as cmp -b does)
#!/bin/sh f1=testinput.dat f2=testoutput.dat size=$(stat -c%s $f1) i=0 while [ $i -lt $size ]; do if ! r="`cmp -n 1024 -i $i -b $f1 $f2`"; then printf "%8x: %s\n" $i "$r" fi i=$(expr $i + 1024) done
2d400: testinput.dat testoutput.dat differ: byte 3, line 1 is 200 M-^@ 240 M- 2dc00: testinput.dat testoutput.dat differ: byte 8, line 1 is 327 M-W 127 W 4d000: testinput.dat testoutput.dat differ: byte 37, line 1 is 270 M-8 260 M-0 4d400: testinput.dat testoutput.dat differ: byte 19, line 1 is 46 & 44 $
Disclaimer: I hacked the script in 5 min. It doesn’t support command line arguments nor does it support spaces in file names
@unseen_rider I can’t help you this way. The script is ok. Please post your debug output to pastebin.com. You can see here what I mean: pastebin.com/8trgyF4A. Also, please tell me the output of readlink -f $(which sh)
If the md5sum is same, binaries are same
md5sum new* 89c60189c3fa7ab5c96ae121ec43bd4a new.txt 89c60189c3fa7ab5c96ae121ec43bd4a new1.txt root@TinyDistro:~# cat new* aa55 aa55 0000 8010 7738 aa55 aa55 0000 8010 7738 root@TinyDistro:~# cat new* aa55 aa55 000 8010 7738 aa55 aa55 0000 8010 7738 root@TinyDistro:~# md5sum new* 4a7f86919d4ac00c6206e11fca462c6f new.txt 89c60189c3fa7ab5c96ae121ec43bd4a new1.txt
You would have to change MD5 hash to SHA2 in order for this advice to be practical. Anyone’s laptop can these days generate collision in MD5 and based on this single collision prefix (2 files of the same size, same prefix and same MD5) to generate infinite number of colliding files (having same prefix, different colliding block, same suffix)
Radiff2 is a tool designed to compare binary files, similar to how regular diff compares text files.
Try radiff2 which is a part of radare2 disassembler. For instance, with this command:
radiff2 -x file1.bin file2.bin
You get pretty formatted two columns output where differences are highlighted.
My favourite ones using xxd hex-dumper from the vim package :
1) using vimdiff (part of vim)
#!/bin/bash FILE1="$1" FILE2="$2" vimdiff <( xxd "$FILE1" ) <( xxd "$FILE2" )
#!/bin/bash FILE1=$1 FILE2=$2 diff -W 140 -y <( xxd $FILE1 ) <( xxd $FILE2 ) | colordiff | less -R -p ' \| '
wxHexEditor
wxHexEditor is both free and able to Diff large files up to 2^64 bytes (2 ExaByte). Has a GUI. Cross-platform. Lots of features.
To get it for free, choose one of the following options:
Below is the same suggestion as above. But with details if you're interested in those.
Screenshot
Strength
• Hexadecimal (Hex) Editor. Which is helpful for doing reverse Engineering.
• Cross-platform. Linux, Mac OS, Windows
• Easy to use Graphical User Interface (GUI)
• Supports very large files up to 2^64 bytes (2 ExaByte)
• Compare two large files side by side (diff). Optionally list and search all diff.
• Do not create temporary files. So it used a very small amount of storage space.
• Multilingual 15 languages
• Open source. If you are not familiar with "open source", it means this software has both stronger security & stronger privacy. Because its code is publicly available for review and contributions to GitHub at https://github.com/EUA/wxHexEditor or at SourceForge at https://sourceforge.net/p/wxhexeditor/code/
• Attractive GNU General Public License version 2. This means the software code of this extension is owned and supported by a friendly not-for-profit community. Instead of a for-profit corporation. https://github.com/EUA/wxHexEditor/blob/master/LICENSE
Challenge
• Confusion between the two code repositories. At the time of this writing, August 2021, the GitHub repository seems to be more recent. As it was last updated in 2021 at https://github.com/EUA/wxHexEditor In comparison, the SourceForge repository at https://sourceforge.net/projects/wxhexeditor/ was last update of wxHexEditor was December 31st, 2017.
Show Your Support
• If you enjoy this application, show your support to the authors & contributors with: