- Linux grep a string within a ZIP file containing multiple files — zgrep or zipgrep?
- Analyze Gzip Log Files in Linux Without Extracting Them
- Dealing with Gzip compressed files without extracting them
- Viewing compressed files with zcat
- Reading compressed files with zless and zmore
- Searching inside compressed files with zgrep
- Comparing compressed files with zdiff
- Summary
- How do I grep recursively through .gz files?
- 6 Answers 6
Linux grep a string within a ZIP file containing multiple files — zgrep or zipgrep?
From time to time, you might need to search certain strings within an archive file in the zip format which contains multiple files on a Linux system. If you have never done this, you might ask — what tools to use? Are there existing commands which can be used instead of writing your own script to unzip the archive and search?
Fortunately the answer is yes. There are commands like zgrep and zipgrep. What’s the difference between them then?
When I got a request as mentioned above to grep a string within an archive, I didn’t know which commands to use and which commands for which scenarios. I first tried the command zgrep because I wasn’t aware of zipgrep.
Using zgrep to searching a zip file doesn’t really work with two issues:
For example, I created two zip files which contians two plain text files in different orders –aaa.log contained the string “jli” I was looking for, the other one bbb.log didn’t.
test_grep1.zip had aaa.log as the first one while test_grep.zip had bbb.log as the first file within the archive.
zip test_grep1.zip aaa.log bbb.log zip test_grep.zip bbb.log aaa.log root@jlitest:/var/log# unzip -l test_grep.zip Archive: test_grep.zip Length Date Time Name --------- ---------- ----- ---- 376908 03-24-2022 20:52 bbb.log 8 03-24-2022 20:52 aaa.log --------- ------- 376916 2 files root@jlitest:/var/log# unzip -l test_grep1.zip Archive: test_grep1.zip Length Date Time Name --------- ---------- ----- ---- 8 03-24-2022 20:52 aaa.log 376908 03-24-2022 20:52 bbb.log --------- ------- 376916 2 files
When searching the string “jli” within test_grep1.zip, it found it. But searching within test_grep.zip, it didn’t.
root@jlitest:/var/log# zgrep jli test_grep1.zip jli jli root@jlitest:/var/log# zgrep jli test_grep.zip
Then I took a close look at zgrep, it is actually just a bash file wrapping grep & gzip (using gzip’s options “-c”, “-d” to decompress to stdout) as its man page states “zgrep — a wrapper around a grep program that decompresses files as needed”
root@jlitest:/var/log# which zgrep alias zgrep='zgrep --color=auto' /usr/bin/zgrep
No wonder it doesn’t work well with zip files. As you can see from the following example:
root@jlitest:/var/log# gzip -d -c test_grep1.zip jli jli gzip: test_grep1.zip has more than one entry--rest ignored
It will stop after the first file because it only expects 1 compressed file.
Then I realized there is another command zipgrep (bash file again) which wraps egrep & unzip — exactly what I was looking for my task.
Its man page says “zipgrep: Use unzip and egrep to search the specified members of a zip archive for a string or pattern.”
root@jlitest:/var/log# zipgrep jli test_grep1.zip aaa.log:jli aaa.log:jli root@jlitest:/var/log# zipgrep jli test_grep.zip aaa.log:jli aaa.log:jli
Again the concept of zipgrep is to unzip files within an archive (zip) file to stdout and egrep patterns from there. It uses unzip’s “-p” option to extract files (only the file data) to pipe (stdout).
unzip has another “-c” option to extract files to stdout/screeen — similar to “-p” but with the name of each file printed.
So you can use simple commands to implement zipgrep.
Using “-p”, no file name is printed.
root@jlitest:/var/log# unzip -p test_grep1.zip|egrep "extracting|inflating|jli" jli jli
Using “-c”, the file name is printed.
extracting: aaa.log jli jli inflating: bbb.log root@jlitest:/var/log# unzip -c test_grep.zip|egrep "extracting|inflating|jli" inflating: bbb.log extracting: aaa.log jli jli
Extracting files to stdout is quite useful for another scenario to search a zip file — what if you only want to search a string in a specific file within a zip file? In this case, you don’t want to use zipgrep because it will try to scan all files within the zip file. For example, you have a file named aaa.log within logs.zip and you want to just search this aaa.log for the string “jli”, as showed in the above example, we could just use the “-p” option of unzip:
root@joetest:~/Service_Tools# unzip -p logs.zip aaa.log|grep jli jli
Having fun with your searching a zip file now!
Analyze Gzip Log Files in Linux Without Extracting Them
Learn to read and analyze gzipped compressed log files on a Linux box without extracting them first with the help of the lesser known Z commands.
On Linux servers, logs are often compressed in gzip format to save disk space. If you are investigating some issue and you have to deal with gzip compressed logs, the normal workflow is to extract the .gz log files first and then use cat, less, grep etc commands to read and analyze the logs. Why? Unlike regular text files, where you can use the cat command to viewcontent of the file or use grep command on it or use less to read the content without flooding your screen, compressed files cannot be used with the same regular Linux commands. But extracting the compressed log files first and then analyzing them takes more time and disk space. You extract all the required files one by one, analyze them and then remove the extracted files when you don’t need them anymore. There is a better way. Use Z commands!
Dealing with Gzip compressed files without extracting them
- zcat: cat to view compressed file
- zgrep: grep to search inside the compressed file
- zless for less, zmore for more: to view the file in pages
- zdiff: diff to see the difference between two compressed files
Don’t worry too much. You don’t have to learn new command syntax. These Z commands work pretty much the same as their regular counterpart for the most popular options.
Viewing compressed files with zcat
If you use the cat command, you can replace it with zcat . zcat is used in exactly the same manner as you use cat. For example:
This will display all the contents of logfile.gz without even extracting it.
You can use regular less and more commands with zcat to see the output in pages:
If you don’t know if the file is compressed (i.e., files without .gz extension), you can use zcat with option -f. This will display the content of the file irrespective of whether it is gzipped or not.
Reading compressed files with zless and zmore
Same as less and more, you can use zless and zmore to read the content of the compressed files without decompressing the files. All the keyboard shortcuts of less and more work the same.
Searching inside compressed files with zgrep
Grep is a hell of a powerful command and I think, one of the most used Linux commands. zgrep is the Z counterpart of grep that allows you to search inside gzipped compressed files without extracting them.
You can use it with all the regular grep options. For example:
zgrep -i keyword_search logfile.gz
Comparing compressed files with zdiff
While this might not be that useful on huge log files, you can use zdiff to see the difference between compressed files, in the same way as you use the diff command.
zdiff logfile1.gz logfile2.gz
Speaking of diff, you may want to look at Meld GUI diff tool.
Summary
Command | Use | Example |
---|---|---|
zcat | cat to view compressed file | zcat |
zgrep | grep to search inside the compressed file | zgrep -i |
zless | less to view the compressed file in pages | zless |
zmore | more to view the compressed file in pages | zmore |
zdiff | diff to see the difference between two compressed files | zdiff |
Now you know how to work with gzipped files. Check out more about the gzip command in Linux:
Or perhaps you would want to learn about analyzing journal logs.
The Z commands are awesome! And I know that many people get their ‘Eureka moment’ when they first learn about it.
What about you? Did you find these z commands useful? The comment section is all yours.
How do I grep recursively through .gz files?
I am using a script to regularly download my gmail messages that compresses the raw .eml into .gz files. The script creates a folder for each day, and then compresses every message into its own file. I would like a way to search through this archive for a «string.» Grep alone doesn’t appear to do it. I also tried SearchMonkey.
6 Answers 6
If you want to grep recursively in all .eml.gz files in the current directory, you can use:
find . -name \*.eml.gz -print0 | xargs -0 zgrep "STRING"
You have to escape the first * so that the shell does not interpret it. -print0 tells find to print a null character after each file it finds; xargs -0 reads from standard input and runs the command after it for each file; zgrep works like grep , but uncompresses the file first.
They’re necessary if there might be space characters in the paths; there’s no reason other than complexity not to use them.
zgrep actually seems faster than grep run on uncompressed files. It must be because compressed files can be read off the HD and decompressed faster than reading an uncompressed file from the HD.
@JaimeM. xargs uses blanks (whitespace) by default. Sure, files almost never have newlines in them, but spaces are not unheard of (even if most UNIXy types frown on them). That said, you can simplify without worrying about whitespace even more easily: find . -name ‘*.eml.gz’ -exec zgrep «STRING» <> + That gets the same many arguments per-launch of xargs , the safety of -print0 / -0 , and all without the overhead of an extra process launch and piping, and fairly concisely. -exec with + is POSIX specified, so it should be on most semi-recent UNIX-like systems to my knowledge.
@Jared Is there a way to do a wildcard search only knowing the beginning of the file pattern? For example, I have .gz files that have date/time stamps at the end of them. ABCLog04_18_18_2_21.gz Is there a way to recursively look for files beginning with ABC*. I tried replacing \*.eml.gz in your example above with ABCLog* and get an error about file format.: find: paths must precede expression: ABCLog-2018-03-12-10-16-1.log.gz Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path. ] [expression]