Linux binary file diff

How do I compare binary files in Linux?

Is there a way to do this in Linux? I know about cmp -l but it uses a decimal system for offsets and octal for bytes which I would like to avoid.

FreeBSD’s cmp has an -x flag («heXadecimal») which produces output formatted exactly as specified in the question, in conjunction with -l : cmp -xl file1.bin file2.bin . source

17 Answers 17

This will print the offset and bytes in hex:

cmp -l file1.bin file2.bin | gawk '' 

Or do $1-1 to have the first printed offset start at 0.

cmp -l file1.bin file2.bin | gawk '' 

Unfortunately, strtonum() is specific to GAWK, so for other versions of awk—e.g., mawk—you will need to use an octal-to-decimal conversion function. For example,

cmp -l file1.bin file2.bin | mawk 'function oct2dec(oct, dec) ; return dec> ' 

Broken out for readability:

cmp -l file1.bin file2.bin | mawk 'function oct2dec(oct, dec) < for (i = 1; i ; return dec > < printf "%08X %02X %02X\n", $1, oct2dec($2), oct2dec($3) >' 

@gertvdijk: strtonum is specific to GAWK. I believe Ubuntu previously used GAWK as the default, but switched at some point to mawk . In any case, GAWK can be installed and set to the default (see also man update-alternatives ). See my updated answer for a solution that doesn’t require strtonum .

 % xxd b1 > b1.hex % xxd b2 > b2.hex 

In Bash: diff <(xxd b1) <(xxd b2) but the output format of this (or yours) is nowhere near what the OP asked for.

Nice. I’m on an embedded system that uses BusyBox and there is no cmp , but hexdump + diff works like a charm.

This worked great for me (with opendiff on OS X instead of vimdiff ) — the default view xxd provides keeps the diff engine on track comparing byte-by-byte. With plain (raw) hex simply column-fit with fold , diff would try to fold/group random stuff in the files I was comparing.

This command does not work well for byte addition removal, as every line that follows will be misaligned and seen as modified by diff . The solution is to put 1 byte per line and remove the address column as proposed by John Lawrence Aspden and me.

diff + xxd

Try diff in the following combination of zsh/bash process substitution:

  • -y shows you differences side-by-side (optional).
  • xxd is CLI tool to create a hexdump output of the binary file.
  • Add -W200 to diff for wider output (of 200 characters per line).
  • For colors, use colordiff as shown below.

colordiff + xxd

If you’ve colordiff , it can colorize diff output, e.g.:

Otherwise install via: sudo apt-get install colordiff .

binary file output in terminal - diff -y <(xxd foo1.bin) <(xxd foo2.bin) | colordiff

vimdiff + xxd

You can also use vimdiff , e.g.

If you don’t have colordiff, this will do the same thing without colors: diff -y <(xxd foo1.bin) <(xxd foo2.bin)

If you just want to know whether both files are actually the same, you can use the -q or —brief switch, which will only show output when the files differ.

My favorite solution, helped me a lot! With option —suppress-common-lines only different lines will be displayed

There’s a tool called DHEX which may do the job, and there’s another tool called VBinDiff.

For a strictly command-line approach, try jojodiff.

DHEX is awesome is comparing binaries is what you want to do. Feed it two files and it takes you right to a comparative view, highlighting to differences, with easy ability to move to the next difference. Also it’s able to work with large terminals, which is very useful on widescreen monitors.

Читайте также:  Linux узнать контроллер домена

I prefer VBinDiff. DHEX is using CPU even when idling, I think it’s redrawing all the time or something. VBinDiff doesn’t work with wide terminals though. But the addresses become weird with wide terminals anyway, since you have more than 16 bytes per row.

@DanielBeauyat compressed files will be completely different after you encounter the first different byte. The output is not likely to be useful.

@1111161171159459134 jdiff is part of a «suite» of programs to sync and patch the differences found by jdiff. But, as Mark Ransom said, that would be generally not wise on compressed files; the exception is «synchronizable» compressed formats (like that produced by gzip —rsyncable), in which small differences in the uncompressed files should have a limited effect on the compressed file.

Method that works for byte addition / deletion

Generate a test case with a single removal of byte 64:

for i in `seq 128`; do printf "%02x" "$i"; done | xxd -r -p > file1 for i in `seq 128`; do if [ "$i" -ne 64 ]; then printf "%02x" $i; fi; done | xxd -r -p > file2 

If you also want to see the ASCII version of the character:

bdiff() ( f() ( od -An -tx1c -w1 -v "$1" | paste -d '' - - ) diff <(f "$1") <(f "$2") ) bdiff file1 file2 

I prefer od over xxd because:

  • -An removes the address column. This is important otherwise all lines would differ after a byte addition / removal.
  • -w1 puts one byte per line, so that diff can consume it. It is crucial to have one byte per line, or else every line after a deletion would become out of phase and differ. Unfortunately, this is not POSIX, but present in GNU.
  • -tx1 is the representation you want, change to any possible value, as long as you keep 1 byte per line.
  • -v prevents asterisk repetition abbreviation * which might interfere with the diff
  • paste -d '' - - joins every two lines. We need it because the hex and ASCII go into separate adjacent lines. Taken from: https://stackoverflow.com/questions/8987257/concatenating-every-other-line-with-the-next
  • we use parenthesis () to define bdiff instead of <> to limit the scope of the inner function f , see also: https://stackoverflow.com/questions/8426077/how-to-define-a-function-inside-another-function-in-bash

The good side of this method is that od is extremely powerful. In particular, it lets one compare longer-than-a-byte objects, e.g. 32-bit floats. Example: diff -u <(od -tf4 -w1 fileA.bin) <(od -tf4 -w1 fileB.bin) .

When using hexdumps and text diff to compare binary files, especially xxd , the additions and removals of bytes become shifts in addressing which might make it difficult to see. This method tells xxd to not output addresses, and to output only one byte per line, which in turn shows exactly which bytes were changed, added, or removed. You can find the addresses later by searching for the interesting sequences of bytes in a more "normal" hexdump (output of xxd first.bin ).

I'd recommend hexdump for dumping binary files to textual format and kdiff3 for diff viewing.

hexdump myfile1.bin > myfile1.hex hexdump myfile2.bin > myfile2.hex kdiff3 myfile1.hex myfile2.hex 

Even here in bash kdiff3 <(hexdump myfile1.bin) <(hexdump myfile2.bin) with no need to create files myfile1.hex and myfile2.hex .

The firmware analysis tool binwalk also has this as a feature through its -W / --hexdump command line option which offers options such as to only show the differing bytes:

 -W, --hexdump Perform a hexdump / diff of a file or files -G, --green Only show lines containing bytes that are the same among all files -i, --red Only show lines containing bytes that are different among all files -U, --blue Only show lines containing bytes that are different among some files -w, --terse Diff all files, but only display a hex dump of the first file 

In OP's example when doing binwalk -W file1.bin file2.bin :

Читайте также:  Запуск процесса в фоновом режиме linux

binwalk -W file1.bin file2.bin

The hexdiff is a program designed to do exactly what you're looking for.

It displays the hex (and 7-bit ASCII) of the two files one above the other, with any differences highlighted. Look at man hexdiff for the commands to move around in the file, and a simple q will quit.

But it does a pretty bad job when it comes to the comparing part. If you insert some bytes into a file, it will mark all byte afterwards as changes

It may not strictly answer the question, but I use this for diffing binaries:

It prints both files out as hex and ASCII values, one byte per line, and then uses Vim's diff facility to render them visually.

Below is a Perl script, colorbindiff, which performs a binary diff, taking into account bytes changes but also byte additions/deletions (many of the solutions proposed here only handle byte changes), like in a text diff. It's also available on GitHub.

It displays results side by side with colors, and this greatly facilitate analysis.

perl colorbindiff.pl FILE1 FILE2 
#!/usr/bin/perl ######################################################################### # # VBINDIFF.PL : A side-by-side visual diff for binary files. # Consult usage subroutine below for help. # # Copyright (C) 2020 Jerome Lelasseux jl@jjazzlab.com # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . # # ######################################################################### use warnings; use strict; use Term::ANSIColor qw(colorstrip colored); use Getopt::Long qw(GetOptions); use File::Temp qw(tempfile); use constant BLANK => ".."; use constant BUFSIZE => 64 * 1024; # 64kB sub usage < print "USAGE: $0 [OPTIONS] FILE1 FILE2\n"; print "Show a side-by-side binary comparison of FILE1 and FILE2. Show byte modifications but also additions and deletions, whatever the number of changed bytes. Rely on the 'diff' external command such as found on Linux or Cygwin. The algorithm is not suited for large and very different files.\n"; print "Author: Jerome Lelasseux \@2021\n"; print "OPTIONS: \n"; print " --cols=N : display N columns of bytes.diff Default is 16.\n"; print " --no-color : don't colorize output. Needed if you view the output in an editor.\n"; print " --no-marker : don't use the change markers (+ for addition, - for deletion, * for modified).\n"; print " --no-ascii : don't show the ascii columns.\n"; print " --only-changes : only display lines with changes.\n"; exit; ># Command line arguments my $maxCols = 16; my $noColor = 0; my $noMarker = 0; my $noAscii = 0; my $noCommon = 0; GetOptions( 'cols=i' => \$maxCols, 'no-ascii' => \$noAscii, 'no-color' => \$noColor, 'no-marker' => \$noMarker, 'only-changes' => \$noCommon ) or usage(); usage() unless ($#ARGV == 1); my ($file1, $file2) = (@ARGV); # Convert input files into hex lists my $fileHex1 = createHexListFile($file1); my $fileHex2 = createHexListFile($file2); # Process diff -y output to get an easy-to-read side-by-side view my $colIndex = 0; my $oldPtr = 0; my $newPtr = 0; my $oldLineBuffer = sprintf("0x%04X ", 0); my $newLineBuffer = sprintf("0x%04X ", 0); my $oldCharBuffer; my $newCharBuffer; my $isDeleting = 0; my $isAdding = 0; my $isUnchangedLine = 1; open(my $fh, '-|', qq(diff -y $fileHex1 $fileHex2)) or die $!; while () < # Parse line by line the output of the 'diff -y' on the 2 hex list files. # We expect: # "xx | yy" for a modified byte # " >yy" for an added byte # "xx $isAdding = 0; $isDeleting = 0; $isUnchangedLine = 0; /([a-fA-F0-9]+)([^a-fA-F0-9]+)([a-fA-F0-9]+)/; $oldByte = formatByte($1, 3); $oldChar = toPrintableChar($1, 3); $newByte = formatByte($3, 3); $newChar = toPrintableChar($3, 3); $oldPtr++; $newPtr++; > elsif (/ $isAdding = 0; $isDeleting = 1; $isUnchangedLine = 0; /([a-fA-F0-9]+)/; $oldByte=formatByte($1, 2); $oldChar=toPrintableChar($1, 2); $newByte=formatByte(BLANK, 2); $newChar=colorize(".", 2); $oldPtr++; > elsif (/>/) < # Added in new if ($isDeleting) < printLine($colIndex); >$isAdding = 1; $isDeleting = 0; $isUnchangedLine = 0; /([a-fA-F0-9]+)/; $oldByte=formatByte(BLANK, 1); $oldChar=colorize(".", 1); $newByte=formatByte($1, 1); $newChar=toPrintableChar($1, 1); $newPtr++; > else < # Unchanged if ($isDeleting || $isAdding) < printLine($colIndex); >$isDeleting = 0; $isAdding = 0; /([a-fA-F0-9]+)([^a-fA-F0-9]+)([a-fA-F0-9]+)/; $oldByte=formatByte($1, 0); $oldChar=toPrintableChar($1, 0); $newByte=formatByte($3, 0); $newChar=toPrintableChar($3, 0); $oldPtr++; $newPtr++; > # Append the bytes to the old and new buffers $oldLineBuffer .= $oldByte; $oldCharBuffer .= $oldChar; $newLineBuffer .= $newByte; $newCharBuffer .= $newChar; $colIndex++; if ($colIndex == $maxCols) < printLine(); >> printLine($colIndex); # Possible remaining line #================================================================ # subroutines #================================================================ # $1 a string representing a data byte # $2 0=unchanged, 1=added, 2=deleted, 3=changed # return the formatted string (color/maker) sub formatByte < my ($byte, $type) = @_; my $res; if (!$noMarker) < if ($type == 0 || $byte eq BLANK) < $res = " " . $byte; ># Unchanged or blank elsif ($type == 1) < $res = " +" . $byte; ># Added elsif ($type == 2) < $res = " -" . $byte; ># Deleted elsif ($type == 3) < $res = " *" . $byte; ># Changed else < die "Error"; >> else < $res = " " . $byte; >$res = colorize($res, $type); return $res; > # $1 a string # $2 0=unchanged, 1=added, 2=deleted, 3=changed # return the colorized string according to $2 sub colorize < my ($res, $type) = @_; if (!$noColor) < if ($type == 0) < ># Unchanged elsif ($type == 1) < $res = colored($res, 'bright_green'); ># Added elsif ($type == 2) < $res = colored($res, 'bright_red'); ># Deleted elsif ($type == 3) < $res = colored($res, 'bright_cyan'); ># Changed else < die "Error"; >> return $res; > # Print the buffered line sub printLine < if (length($oldLineBuffer) <=10) < return; # No data to display >if (!$isUnchangedLine) < # Colorize and add a marker to the address of each line if some bytes are changed/added/deleted my $prefix = substr($oldLineBuffer, 0, 6) . ($noMarker ? " " : "*"); $prefix = colored($prefix, 'magenta') unless $noColor; $oldLineBuffer =~ s/^. /$prefix/; $prefix = substr($newLineBuffer, 0, 6) . ($noMarker ? " " : "*"); $prefix = colored($prefix, 'magenta') unless $noColor; $newLineBuffer =~ s/^. /$prefix/; >my $oldCBuf = $noAscii ? "" : $oldCharBuffer; my $newCBuf = $noAscii ? "" : $newCharBuffer; my $spacerChars = $noAscii ? "" : (" " x ($maxCols - $colIndex)); my $spacerData = ($noMarker ? " " : " ") x ($maxCols - $colIndex); if (!($noCommon && $isUnchangedLine)) < print "$$ $$ $$ $\n"; > # Reset buffers and counters $oldLineBuffer = sprintf("0x%04X ", $oldPtr); $newLineBuffer = sprintf("0x%04X ", $newPtr); $oldCharBuffer = ""; $newCharBuffer = ""; $colIndex = 0; $isUnchangedLine = 1; > # Convert a hex byte string into a printable char, or '.'. # $1 = hex str such as A0 # $2 0=unchanged, 1=added, 2=deleted, 3=changed # Return the corresponding char, possibly colorized sub toPrintableChar < my ($hexByte, $type) = @_; my $char = chr(hex($hexByte)); $char = ($char =~ /[[:print:]]/) ? $char : "."; return colorize($char, $type); ># Convert file $1 into a text file with 1 hex byte per line. # $1=input file name # Return the output file name sub createHexListFile < my ($inFileName) = @_; my $buffer; my $in_fh; open($in_fh, "> close($in_fh); return $filename; > 

Источник

Оцените статью
Adblock
detector