How to specify level of compression when using tar -zcvf?
Is there a way to specify the compression level here? I want to use the best compression possible even if it takes more time to compress.
6 Answers 6
GZIP=-9 tar cvzf file.tar.gz /path/to/directory
assuming you’re using bash. Generally, set GZIP environment variable to «-9», and run tar normally.
Also — if you really want best compression, don’t use gzip. Use lzma or 7z.
And when using gzip (which is good idea for various of reasons anyway) consider using pigz program and not the gzip .
pigz is «parallel gzip» which uses all your cores for gzip compression. You can watch top and see it using anywhere between 200%-400$ CPU.
From the man page on Ubuntu 16.04 for gzip: «On Vax/VMS, the name of the environment variable is GZIP_OPT, to avoid a conflict with the symbol set for invocation of the program.» For sh, csh, and MSDOS it should still just be GZIP
This is what I get when I try to set GZIP environment variable to -9 : gzip: warning: GZIP environment variable is deprecated; use an alias or script
Instead of using the gzip flag for tar, gzip the files manually after the tar process, then you can specify the compression level for the gzip program:
tar -cvf files.tar /path/to/file0 /path/to/file1 ; gzip -9 files.tar
tar cvf - /path/to/file0 /path/to/file1 | gzip -9 - > files.tar.gz
The -9 in the gzip command line tells gzip to use the maximum possible compression level (default is -6).
Edit: Fixed pipe command line based on @depesz comment.
addition to the previos comment. From «man tar» section Environtment: TAPE Device or file to use for the archive if —file is not specified. If this environment variable is unset, use stdin or stdout instead.
and we can reduce «gzip -9 -» -> «gzip -9». From «man gzip» section Description: If no files are specified, or if a file name is «-«, the standard input is compressed to the standard output.
Modern versions of tar support the xz archive format (GNU tar, since 1.22 in 2009, Busybox since 1.17.0 in 2010).
It’s based on lzma2, kind of like a 7-Zip version of gz. This gives better compression if you are ok with the requirement of needing xz support.
tar -Jcvf file.tar.xz /path/to/directory
I just found out here (basically a dupe of this question, but in the Unix stackexchange) that there is also a XZ_OPT=-9 environment variable to control the XZ compression level similar to the GZIP one in the other post.
XZ_OPT=-9 tar -Jcvf file.tar.xz /path/to/directory
no. xz -1 significantly beats bz2 -1~9 in terms of both compression ratio, compression/decompression speed. bz2 is the most awful format among popular formats. In short, if you ever use bz2, try xz -1, that’s done.
@YumeYao FYI this is not always true, I just tried to compress some data with both bzip2 and xz at -9 compression level, xz gives me a 38M compressed size whereas bzip2 gives me 36M.
tar cv /path/to/directory | gzip --best > file.tar.gz
This is Matrix Mole’s second solution, but slightly shortened:
When calling tar, option f states that the output is a file. Setting it to — (stdout) makes tar write its output to stdout which is the default behavior without both f and — .
And as stated by the gzip man page, if no files are specified gzip will compress from standard input. There is no need for — in the gzip call.
Option —best (equivalent to -9 ) sets the highest compression level.
Как указать уровень сжатия при использовании tar -zcvf?
Есть ли способ указать уровень сжатия здесь? Я хочу использовать максимально возможное сжатие, даже если для его сжатия требуется больше времени.
6 ответов 6
GZIP=-9 tar cvzf file.tar.gz /path/to/directory
при условии, что вы используете Bash. Как правило, установите для переменной среды GZIP значение «-9» и обычно запускайте tar.
Кроме того — если вы действительно хотите лучшее сжатие, не используйте gzip. Используйте lzma или 7z.
И при использовании gzip (что в любом случае является хорошей идеей по разным причинам) рассмотрите возможность использования программы pigz а не gzip .
Вместо использования флага gzip для tar, gzip файлы вручную после процесса tar, затем вы можете указать уровень сжатия для программы gzip:
tar -cvf files.tar /path/to/file0 /path/to/file1 ; gzip -9 files.tar
Или вы можете использовать:
tar cvf - /path/to/file0 /path/to/file1 | gzip -9 - > files.tar.gz
-9 в командной строке gzip указывает gzip использовать максимально возможный уровень сжатия (по умолчанию -6).
Редактировать: Исправлена ошибка командной строки на основе комментария @depesz.
Современные версии tar поддерживают формат архива xz (GNU tar, начиная с 1.22 в 2009 году, Busybox начиная с 1.17.0 в 2010 году).
Он основан на lzma2, вроде 7-Zip- версии gz. Это дает лучшее сжатие, если вы согласны с требованием поддержки xz.
tar -Jcvf file.tar.xz /path/to/directory
Я только что обнаружил здесь (в основном, обман этого вопроса, но в стек-обмене Unix), что есть также переменная среды XZ_OPT = -9 для управления уровнем сжатия XZ, аналогичным GZIP в другом посте.
XZ_OPT=-9 tar -Jcvf file.tar.xz /path/to/directory
tar cv /path/to/directory | gzip --best > file.tar.gz
Это второе решение Matrix Mole, но немного сокращенное:
При вызове tar опция f указывает, что вывод является файлом. Установка его в — (stdout) заставляет tar записывать свои выходные данные в stdout, который является поведением по умолчанию без f и — .
И, как указано на справочной странице gzip , если файлы не указаны, gzip будет сжиматься из стандартного ввода. Нет необходимости — в вызове gzip .
Опция —best (эквивалентно -9 ) устанавливает самый высокий уровень сжатия.
How to obtain maximum compression with .tar.gz? [duplicate]
The way i understand the use of tar + gzip is that tar is normally used to consolidate a grouping of files into a single file, then gzip is used to compress that file. I recently learned that tar can also compress. Because I do not fully understand how compression works @ it’s core, I have (possibly ridiculous) concerns that sending a pre-compressed .tar to gzip might prevent gzip from compressing as well as it’s potential would allow and things of that nature. My question is essentially: What combination of args/compression methods should i use to create the absolute smallest tar.gz, and what does the command line statement look like for that?
Compressing already compressed files may reduce their size, or it may make the archive bigger. It all depends on the type of data and any compression being used.
What @Keltari said. Compression rates and ratios are highly dependent on what it is you are compressing, which is also why there are different compression algorithms and methods.
4 Answers 4
Or, you can tell tar to user maximum compression this way:
export GZIP=-9 tar cvzf file.tar.gz /path/to/directory
Additionally, to keep your envvars clutter-free, you can do this:
env GZIP=-9 tar cvzf file.tar.gz /path/to/directory
gzip: warning: GZIP environment variable is deprecated; use an alias or script tar: Exiting with failure status due to previous errors
To work around this warning, use tar cvf file.tar.gz /path/to/directory -I «gzip —best» . -I specifies the compression program and options.
@stj tar: -I: Cannot stat: No such file or directory tar: gzip —best: Cannot stat: No such file or directory tar: Error exit delayed from previous errors.
As you stated- «tar can also compress«, implies that — tar does not always compress data by itself. It does so only when used with the z option. That too not by itself, but by passing the tarred data through gzip.
However instead, as noted in this answer, you can pipe the two commands: tar & gzip such that you can explicitly specify compression level for the gzip command to achieve the smallest output size.
tar cvf - /path/to/directory | gzip -9 - > file.tar.gz
Here 9 specifies maximum possible compression level.
I had an issue where its not recursive, and complains that it will be an empty archive, since the command is split, its hard to find how to properly force recursive, since its already tar default. MY BAD, I had incorrectly specified it starting like this tar -cvf /path
tar -z will create a compressed .tar.gz archive but this is not the only compression method supported by (all versions of) tar . For instance, tar -j will create a tar.bz2 , and there are some other compression methods supported as well.
Usually neither gzip nor tar can create «the absolute smallest tar.gz». There are many compression utilities that can compress to the gz format. I have written a bash script «gz99» to try gzip , 7z and advdef to get the smallest file. To use this to create the smallest possible file run:
tar c path/to/data | gz99 file.gz
The advdef utility from AdvanceCOMP usually gives the smallest file, but is also buggy (the gz99 utility checks that it hasn’t corrupted the file before accepting the output of advdef ). To use advdef directly, create file.tar.gz however you feel like. Then run:
This will create a standard gz file that can be read by gzip and tar as normal, just a tiny bit smaller. This is about the best you can do with the gz format.
Since you only recently learnt that tar can compress, and didn’t say why you wanted the the smallest «.tar.gz» file, you may be unaware that there are more efficient formats can be used with tar files, such as xz. Generally, switching to a different format can give a vastly better improvement in compression than fiddling round with gzip options. The main disadvantage of xz is that it isn’t as common as gzip so the people you send the file to might have to install a new package. It also tends to be a bit slower, particularly when compressing. If this doesn’t matter to you, and you really want the smallest tar file, try:
tar cv path/to/data | xz -9 > file.tar.xz
Modern versions of tar, for example on Ubuntu 13.10, automatically detect compressed files. So even if you use xz compression you can still decompress as usual:
To give a quick idea how these compression utilities compare, consider the effect of compressing patch-3.1.1 from the linux kernel:
utility cpu format size(bytes) gzip -9 0.02s gz 105,628 advdef -2 0.07s gz 102,619 7z -mx=9 -tgzip 0.42s gz 102,297 advdef -3 0.55s gz 102,290 advdef -4 0.75s gz 101,956 xz -9 0.03s xz 91,064 xz -3e 0.15s xz 90,996
In this trivial example, we see that to get the smallest gz we need advdef (though 7z -tgzip is almost as good and a lot less buggy). We also see that switching to xz gains us much more space than trying to squeeze the most out of the old gz format, without compression taking too long.