Compressing data in linux

What Is the Best Compression Tool in Linux?

In this article, we will compare all the best and most popular Linux compression tools. This will include benchmark tests to see which compression method performs the best, and we’ll also weigh the pros and cons of compatibility and other areas. Compression methods covered will be gzip, xz, bzip2, 7zip, zip, rar, and zstd (Zstandard).

Linux gives us a lot of options when we need to compress files. While that’s definitely a good thing, it can lead to confusion about which one should be used. Let’s start by comparing each method across a few key areas.

Compression Benchmark Test

Although compression ratio should not be the only determining factor when deciding on which tool to use, it will definitely play a big role.

For our benchmark test, we’ll try compressing a copy of the 2002 video game Age of Mythology with a variety of tools. Older video games like AOM make for a good test, since compression methods weren’t up to par with today’s technology and video games contain a wide range of file formats, like audio, video, images, binary files, text, etc. The total size of this video game installation is 1350 MB.

Default Compression Results

Here are the results of our compression test when we use each tool’s default compression level. You can see the resulting compressed size, time elpased, and the precise commands we used to perform the compression.

Compression Size Time Elapsed Command
gzip 955 MB 1:45 tar cfz AOM.tar.gz AOM/
xz 856 MB 16:06 tar cfJ AOM.tar.xz AOM/
bzip2 943 MB 5:36 tar cfj AOM.tar.bz2 AOM/
7zip 851 MB 10:59 7z a AOM.7z AOM/
zip 956 MB 1:41 zip -r AOM.zip AOM/
rar 877 MB 6:37 rar a AOM.rar AOM/*
zstd 934 MB 0:43 tar —zstd -cf AOM.tar.zst AOM/

Our test directory has been compressed with multiple tools

Highest Compression Results

And here are the results when we use each tool’s maximum compression level. A higher compression level usually results in some minor space savings, but can take the tool a lot longer to perform the job. The commands we use below are utilizing the absolute maximum compression level for each tool.

Compression Size Time Elapsed Command
gzip 954 MB 2:10 tar cf — AOM/ | gzip -9 — > AOM.tar.gz
xz 847 MB 27:32 tar cf — AOM/ | xz -9e — > AOM.tar.xz
bzip2 943 MB 5:42 tar cf — AOM/ | bzip2 -9 — > AOM.tar.bz2
7zip 845 MB 16:41 7z a -mx=9 AOM.7z AOM/
zip 955 MB 2:05 zip -9 -r AOM.zip AOM/
rar 876 MB 6:31 rar a -m5 AOM.rar AOM/*
zstd 873 MB 22:19 tar -I ‘zstd —ultra -22’ -cf AOM.tar.zst AOM/
Читайте также:  Linux find directories with files

And the Winner Is…

According to our benchmark test:

For compression ratio, the best compression tool on Linux is 7zip.

For compression speed, the best compression tool on Linux is Zstandard (zstd).

Potential for Varying Results

Keep in mind that you should take these benchmark results with a grain of salt. Depending on the type of files you’re compressing, and the hardware of your PC or server, you could get very different results in compression ratio and speed. This benchmark test works well as a very general measurement of the compression tools listed, but every situation is going to be different. If in doubt, try out a few of them yourself – that’s why we’ve given you the commands for each compression tool.

Note also that we used the normal compression level and maximum compression level for each tool. There are a lot of other choices than just these two options. You could use some value in between, or even use a lesser compression level so the files compress very quickly.

Compatibility

Compression ratio and speed aren’t the only concern. Not always, anyway.

On Linux systems, tar is the usual format for archives. Compression is then added to the tar file, resulting in extensions like .tar.gz and .tar.bz2 and .tar.xz . The tar format is able to combine files into a single archive, while preserving all of the Linux file permissions. Its compatibility with Linux file systems is why it’s preferred on Linux.

On other operating systems, like Windows, the .zip format is much more common. Zip files are usually pretty painless to open on Linux, but tar files don’t always enjoy the same privilege on Windows. Zip files also won’t preserve file permissions on Linux.

Why’s this matter? Well, depending on what you’re doing with your compressed archive, you may need to take the filetype into consideration. For example, it’s better to share zip files with Windows users. If you’re sharing the archive with Linux users, then it won’t matter as much. Users of both systems usually need extra software if they’re going to extract the contents of a 7z, rar, or zstd file.

Remember your target audience when you compress files, and think about whether or not the users will have an easy time extracting files from the archive. Of course, if these files are for your eyes only, then this may not matter at all.

Conclusion

After taking benchmark results and compatibility into consideration, the answer to “which compression tool is best?” is just it depends. Are you in a hurry? Does every last megabyte count? Can users easily open your archive? It’s always going to depend on these factors. Using the information in this guide should help you make the right choice, but the “right choice” may change in different situations.

Читайте также:  Iso master linux установка

1 thought on “What Is the Best Compression Tool in Linux?”

Great article!
I’m using GNU tar v1.30 in AlmaLinux 8 so I need to use tar —use-compress-program zstd -cf directory.tar.zst directory/ instead of tar —zstd Regards,
Mauricio

Источник

XZ data compression in Linux¶

XZ is a general purpose data compression format with high compression ratio and relatively fast decompression. The primary compression algorithm (filter) is LZMA2. Additional filters can be used to improve compression ratio even further. E.g. Branch/Call/Jump (BCJ) filters improve compression ratio of executable data.

The XZ decompressor in Linux is called XZ Embedded. It supports the LZMA2 filter and optionally also BCJ filters. CRC32 is supported for integrity checking. The home page of XZ Embedded is at , where you can find the latest version and also information about using the code outside the Linux kernel.

For userspace, XZ Utils provide a zlib-like compression library and a gzip-like command line tool. XZ Utils can be downloaded from .

The xz_dec module provides XZ decompressor with single-call (buffer to buffer) and multi-call (stateful) APIs. The usage of the xz_dec module is documented in include/linux/xz.h.

The xz_dec_test module is for testing xz_dec. xz_dec_test is not useful unless you are hacking the XZ decompressor. xz_dec_test allocates a char device major dynamically to which one can write .xz files from userspace. The decompressed output is thrown away. Keep an eye on dmesg to see diagnostics printed by xz_dec_test. See the xz_dec_test source code for the details.

For decompressing the kernel image, initramfs, and initrd, there is a wrapper function in lib/decompress_unxz.c. Its API is the same as in other decompress_*.c files, which is defined in include/linux/decompress/generic.h.

scripts/xz_wrap.sh is a wrapper for the xz command line tool found from XZ Utils. The wrapper sets compression options to values suitable for compressing the kernel image.

For kernel makefiles, two commands are provided for use with $(call if_needed). The kernel image should be compressed with $(call if_needed,xzkern) which will use a BCJ filter and a big LZMA2 dictionary. It will also append a four-byte trailer containing the uncompressed size of the file, which is needed by the boot code. Other things should be compressed with $(call if_needed,xzmisc) which will use no BCJ filter and 1 MiB LZMA2 dictionary.

Notes on compression options¶

Since the XZ Embedded supports only streams with no integrity check or CRC32, make sure that you don’t use some other integrity check type when encoding files that are supposed to be decoded by the kernel. With liblzma, you need to use either LZMA_CHECK_NONE or LZMA_CHECK_CRC32 when encoding. With the xz command line tool, use —check=none or —check=crc32.

Читайте также:  Arch linux свой репозиторий

Using CRC32 is strongly recommended unless there is some other layer which will verify the integrity of the uncompressed data anyway. Double checking the integrity would probably be waste of CPU cycles. Note that the headers will always have a CRC32 which will be validated by the decoder; you can only change the integrity check type (or disable it) for the actual uncompressed data.

In userspace, LZMA2 is typically used with dictionary sizes of several megabytes. The decoder needs to have the dictionary in RAM, thus big dictionaries cannot be used for files that are intended to be decoded by the kernel. 1 MiB is probably the maximum reasonable dictionary size for in-kernel use (maybe more is OK for initramfs). The presets in XZ Utils may not be optimal when creating files for the kernel, so don’t hesitate to use custom settings. Example:

xz --check=crc32 --lzma2=dict=512KiB inputfile

An exception to above dictionary size limitation is when the decoder is used in single-call mode. Decompressing the kernel itself is an example of this situation. In single-call mode, the memory usage doesn’t depend on the dictionary size, and it is perfectly fine to use a big dictionary: for maximum compression, the dictionary should be at least as big as the uncompressed data itself.

Future plans¶

Creating a limited XZ encoder may be considered if people think it is useful. LZMA2 is slower to compress than e.g. Deflate or LZO even at the fastest settings, so it isn’t clear if LZMA2 encoder is wanted into the kernel.

Support for limited random-access reading is planned for the decompression code. I don’t know if it could have any use in the kernel, but I know that it would be useful in some embedded projects outside the Linux kernel.

Conformance to the .xz file format specification¶

There are a couple of corner cases where things have been simplified at expense of detecting errors as early as possible. These should not matter in practice all, since they don’t cause security issues. But it is good to know this if testing the code e.g. with the test files from XZ Utils.

Reporting bugs¶

Before reporting a bug, please check that it’s not fixed already at upstream. See to get the latest code.

Report bugs to or visit #tukaani on Freenode and talk to Larhzu. I don’t actively read LKML or other kernel-related mailing lists, so if there’s something I should know, you should email to me personally or use IRC.

Don’t bother Igor Pavlov with questions about the XZ implementation in the kernel or about XZ Utils. While these two implementations include essential code that is directly based on Igor Pavlov’s code, these implementations aren’t maintained nor supported by him.

Источник

Оцените статью
Adblock
detector