Linux grep in gzip

Linux grep a string within a ZIP file containing multiple files — zgrep or zipgrep?

From time to time, you might need to search certain strings within an archive file in the zip format which contains multiple files on a Linux system. If you have never done this, you might ask — what tools to use? Are there existing commands which can be used instead of writing your own script to unzip the archive and search?

Fortunately the answer is yes. There are commands like zgrep and zipgrep. What’s the difference between them then?

When I got a request as mentioned above to grep a string within an archive, I didn’t know which commands to use and which commands for which scenarios. I first tried the command zgrep because I wasn’t aware of zipgrep.

Using zgrep to searching a zip file doesn’t really work with two issues:

For example, I created two zip files which contians two plain text files in different orders –aaa.log contained the string “jli” I was looking for, the other one bbb.log didn’t.

test_grep1.zip had aaa.log as the first one while test_grep.zip had bbb.log as the first file within the archive.

zip test_grep1.zip aaa.log bbb.log zip test_grep.zip bbb.log aaa.log root@jlitest:/var/log# unzip -l test_grep.zip Archive: test_grep.zip Length Date Time Name --------- ---------- ----- ---- 376908 03-24-2022 20:52 bbb.log 8 03-24-2022 20:52 aaa.log --------- ------- 376916 2 files root@jlitest:/var/log# unzip -l test_grep1.zip Archive: test_grep1.zip Length Date Time Name --------- ---------- ----- ---- 8 03-24-2022 20:52 aaa.log 376908 03-24-2022 20:52 bbb.log --------- ------- 376916 2 files

When searching the string “jli” within test_grep1.zip, it found it. But searching within test_grep.zip, it didn’t.

root@jlitest:/var/log# zgrep jli test_grep1.zip jli jli root@jlitest:/var/log# zgrep jli test_grep.zip

Then I took a close look at zgrep, it is actually just a bash file wrapping grep & gzip (using gzip’s options “-c”, “-d” to decompress to stdout) as its man page states “zgrep — a wrapper around a grep program that decompresses files as needed”

root@jlitest:/var/log# which zgrep alias zgrep='zgrep --color=auto' /usr/bin/zgrep

No wonder it doesn’t work well with zip files. As you can see from the following example:

root@jlitest:/var/log# gzip -d -c test_grep1.zip jli jli gzip: test_grep1.zip has more than one entry--rest ignored

It will stop after the first file because it only expects 1 compressed file.

Читайте также:  Decoding base64 in linux

Then I realized there is another command zipgrep (bash file again) which wraps egrep & unzip — exactly what I was looking for my task.

Its man page says “zipgrep: Use unzip and egrep to search the specified members of a zip archive for a string or pattern.”

root@jlitest:/var/log# zipgrep jli test_grep1.zip aaa.log:jli aaa.log:jli root@jlitest:/var/log# zipgrep jli test_grep.zip aaa.log:jli aaa.log:jli

Again the concept of zipgrep is to unzip files within an archive (zip) file to stdout and egrep patterns from there. It uses unzip’s “-p” option to extract files (only the file data) to pipe (stdout).

unzip has another “-c” option to extract files to stdout/screeen — similar to “-p” but with the name of each file printed.

So you can use simple commands to implement zipgrep.

Using “-p”, no file name is printed.

root@jlitest:/var/log# unzip -p test_grep1.zip|egrep "extracting|inflating|jli" jli jli 

Using “-c”, the file name is printed.

 extracting: aaa.log jli jli inflating: bbb.log root@jlitest:/var/log# unzip -c test_grep.zip|egrep "extracting|inflating|jli" inflating: bbb.log extracting: aaa.log jli jli

Extracting files to stdout is quite useful for another scenario to search a zip file — what if you only want to search a string in a specific file within a zip file? In this case, you don’t want to use zipgrep because it will try to scan all files within the zip file. For example, you have a file named aaa.log within logs.zip and you want to just search this aaa.log for the string “jli”, as showed in the above example, we could just use the “-p” option of unzip:

root@joetest:~/Service_Tools# unzip -p logs.zip aaa.log|grep jli jli

Having fun with your searching a zip file now!

Источник

Analyze Gzip Log Files in Linux Without Extracting Them

Learn to read and analyze gzipped compressed log files on a Linux box without extracting them first with the help of the lesser known Z commands.

The random text resut shown, while we try to view a gzipped log file using regular cat command

On Linux servers, logs are often compressed in gzip format to save disk space. If you are investigating some issue and you have to deal with gzip compressed logs, the normal workflow is to extract the .gz log files first and then use cat, less, grep etc commands to read and analyze the logs. Why? Unlike regular text files, where you can use the cat command to viewcontent of the file or use grep command on it or use less to read the content without flooding your screen, compressed files cannot be used with the same regular Linux commands. But extracting the compressed log files first and then analyzing them takes more time and disk space. You extract all the required files one by one, analyze them and then remove the extracted files when you don’t need them anymore. There is a better way. Use Z commands!

Читайте также:  Ubuntu версии дистрибутивов linux

Dealing with Gzip compressed files without extracting them

  • zcat: cat to view compressed file
  • zgrep: grep to search inside the compressed file
  • zless for less, zmore for more: to view the file in pages
  • zdiff: diff to see the difference between two compressed files

Don’t worry too much. You don’t have to learn new command syntax. These Z commands work pretty much the same as their regular counterpart for the most popular options.

Viewing compressed files with zcat

If you use the cat command, you can replace it with zcat . zcat is used in exactly the same manner as you use cat. For example:

This will display all the contents of logfile.gz without even extracting it.

You can use regular less and more commands with zcat to see the output in pages:

If you don’t know if the file is compressed (i.e., files without .gz extension), you can use zcat with option -f. This will display the content of the file irrespective of whether it is gzipped or not.

Reading compressed files with zless and zmore

Same as less and more, you can use zless and zmore to read the content of the compressed files without decompressing the files. All the keyboard shortcuts of less and more work the same.

Searching inside compressed files with zgrep

Grep is a hell of a powerful command and I think, one of the most used Linux commands. zgrep is the Z counterpart of grep that allows you to search inside gzipped compressed files without extracting them.

You can use it with all the regular grep options. For example:

zgrep -i keyword_search logfile.gz

Comparing compressed files with zdiff

While this might not be that useful on huge log files, you can use zdiff to see the difference between compressed files, in the same way as you use the diff command.

zdiff logfile1.gz logfile2.gz

Speaking of diff, you may want to look at Meld GUI diff tool.

Summary

Command Use Example
zcat cat to view compressed file zcat
zgrep grep to search inside the compressed file zgrep -i
zless less to view the compressed file in pages zless
zmore more to view the compressed file in pages zmore
zdiff diff to see the difference between two compressed files zdiff

Now you know how to work with gzipped files. Check out more about the gzip command in Linux:

Or perhaps you would want to learn about analyzing journal logs.

The Z commands are awesome! And I know that many people get their ‘Eureka moment’ when they first learn about it.

What about you? Did you find these z commands useful? The comment section is all yours.

Источник

How do I grep recursively through .gz files?

I am using a script to regularly download my gmail messages that compresses the raw .eml into .gz files. The script creates a folder for each day, and then compresses every message into its own file. I would like a way to search through this archive for a «string.» Grep alone doesn’t appear to do it. I also tried SearchMonkey.

6 Answers 6

If you want to grep recursively in all .eml.gz files in the current directory, you can use:

find . -name \*.eml.gz -print0 | xargs -0 zgrep "STRING" 

You have to escape the first * so that the shell does not interpret it. -print0 tells find to print a null character after each file it finds; xargs -0 reads from standard input and runs the command after it for each file; zgrep works like grep , but uncompresses the file first.

They’re necessary if there might be space characters in the paths; there’s no reason other than complexity not to use them.

zgrep actually seems faster than grep run on uncompressed files. It must be because compressed files can be read off the HD and decompressed faster than reading an uncompressed file from the HD.

@JaimeM. xargs uses blanks (whitespace) by default. Sure, files almost never have newlines in them, but spaces are not unheard of (even if most UNIXy types frown on them). That said, you can simplify without worrying about whitespace even more easily: find . -name ‘*.eml.gz’ -exec zgrep «STRING» <> + That gets the same many arguments per-launch of xargs , the safety of -print0 / -0 , and all without the overhead of an extra process launch and piping, and fairly concisely. -exec with + is POSIX specified, so it should be on most semi-recent UNIX-like systems to my knowledge.

@Jared Is there a way to do a wildcard search only knowing the beginning of the file pattern? For example, I have .gz files that have date/time stamps at the end of them. ABCLog04_18_18_2_21.gz Is there a way to recursively look for files beginning with ABC*. I tried replacing \*.eml.gz in your example above with ABCLog* and get an error about file format.: find: paths must precede expression: ABCLog-2018-03-12-10-16-1.log.gz Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path. ] [expression]

Источник

Оцените статью
Adblock
detector