- Is it possible in unix to search inside zip files
- 5 Answers 5
- Working Code or It Didn’t Happen
- Examples or It Didn’t Happen
- How to find a list of files and zip them into a single zip file?
- Special use of linux find command together with zip
- 3 Answers 3
- Find and zip the individual files and remove the original
- 2 Answers 2
Is it possible in unix to search inside zip files
I have 100s of directories and within those I have a few zip files. Now there are images named abc.jpg in those zip files. The zip files may be in any folder or in any subfolder so its difficult to extract them all in one place. I just want to collect those image files. Is this possible?
You can use zip -sf foo.zip | grep abc.jpg to determine if an archive has abc.jpg ; that should help. I don’t have time to figure out the complete command now, but I’ll try later if nobody else has answered
5 Answers 5
I once needed something similar to find class files in a bunch of zip files. Here it is:
#!/bin/bash function process() < while read line; do if [[ "$line" =~ ^Archive:\s*(.*) ]] ; then ar="$" #echo "$ar" else if [[ "$line" =~ \s*([^ ]*abc\.jpg)$ ]] ; then echo "$: $" fi fi done > find . -iname '*.zip' -exec unzip -l '<>' \; | process
Now you only need to add one line to extract the files and maybe move them. I’m not sure exactly what you want to do, so I’ll leave that to you.
If your unix variant supports FUSE (Linux, *BSD, OSX, Solaris all do), mount AVFS to access archives transparently. The command mountavfs creates a view of the whole filesystem, rooted at ~/.avfs , in which archive files have an associated directory that contains the directories and files in the archive. For example, if you have foo.zip in the current directory, then the following command is roughly equivalent to unzip -l foo.zip :
mountavfs # needs to be done once and for all find ~/.avfs$PWD/foo.zip\# -ls
So, to loop over all images contained in a zip file under the current directory and copy them to /destination/directory (with a prompt in case of clash):
find ~/.avfs"$PWD" -name '*.zip' -exec sh -c ' find "$#" -name "*.jpg" -exec cp -ip <> "$1" \; ' <> /destination/directory \;
cp -ip ~/.avfs$PWD/**/*.zip(e\''REPLY=($REPLY\#/**/*.jpg(N))'\') /destination/directory
Deconstruction: ~/.avfs$PWD/**/*.zip expands to the AVFS view of the zip files under the current directory. The glob qualifier e is used to modify the output of the glob: …/*.zip(e\»REPLY=$REPLY\#’\’) would just append a # to each match. REPLY=($REPLY\#/**/*.jpg(N)) transforms each match into the array of .jpg files in the .zip# directory.
some reasons not to use FUSE if you don’t have to: portability (some OSs don’t have FUSE), maintainability (not everybody knows FUSE)
I assume you have a new version of Bash, so you should be able to use this:
shopt -s globstar for path in topdir/**/*.zip do unzip "$path" '.*abc.jpg' done
Similar to Kims answer but slightly modified. Just use sed :
find . -name *.zip -exec unzip -l '<>' \; | sed -n -e '/^Archive/ ' -e '/abc.jpg$/ '
Let’s do this! Tragically, existing answers are deficient in various obvious ways – including those both here and at a popular duplicate.
The accepted answer, for example, is Bash-specific (that’s bad) and hardcodes the desired search pattern into a one-off 10-line shell function (that’s even badder). The next most upvoted answer leverages FUSE-based pseudo-filesystems (that’s patently insane). Likewise, the most upvoted answer at the aforementioned duplicate yields ambiguous, non-human-readable output (just. ugh).
I am Jack’s wizened disapproval.
Working Code or It Didn’t Happen
A new contender has entered the ring:
# str find_in_zip(str regex, str zip_filename1, . ) # # Find all paths contained in any zip-formatted archives with the passed # filenames such that the relative pathnames of these paths in these # archives match the passed extended regular expression. function find_in_zip() < (( $# >= 2 )) || < echo 'Expected one extended regular expression and one or more zip filenames.' 1>&2 return 1 > # Localize and remove the passed regex from the argument list. local regex="$" zip_filename shift # For each passed zip filename. for zip_filename in "$"; do # Print the name of this filename for disambiguity. echo "$:" # Print all paths in this file matching this regex. command unzip -l "$" | command grep --extended-regexp --color=always "$" # Page the above output for readability. done | less --RAW-CONTROL-CHARS >
For usability, this function is called with the exact same signature as grep . Namely, this function first accepts the regular expression to be searched for and then a variadic sequence of one or more zip filenames.
Likewise, this function has been tested under both Bash and zsh. Add the above code to either ~/.bashrc or ~/.zshrc and great zipfile glory shall be yours, ideally with set -e enabled for sanity and strictness.
Examples or It Didn’t Happen
To demonstrate, let’s find the set of all classes embedded in I2P JAR files installed under Gentoo Linux whose names begin with exactly seven uppercase characters followed by one lowercase character – just ’cause:
$ find_in_zip '/[A-Z][a-z]' /usr/share/i2p/lib/*.jar /usr/share/i2p/lib/addressbook.jar: /usr/share/i2p/lib/BOB.jar: /usr/share/i2p/lib/commons-el.jar: /usr/share/i2p/lib/desktopgui.jar: /usr/share/i2p/lib/i2p.jar: 568 01-16-2020 00:20 freenet/support/CPUInformation/AMDCPUInfo.class 236 01-16-2020 00:20 freenet/support/CPUInformation/VIACPUInfo.class /usr/share/i2p/lib/i2psnark.jar: /usr/share/i2p/lib/i2ptunnel.jar: /usr/share/i2p/lib/jasper-compiler.jar: /usr/share/i2p/lib/jasper-runtime.jar: /usr/share/i2p/lib/jetty-continuation.jar: /usr/share/i2p/lib/jetty-deploy.jar: /usr/share/i2p/lib/jetty-http.jar: /usr/share/i2p/lib/jetty-i2p.jar: /usr/share/i2p/lib/jetty-io.jar: /usr/share/i2p/lib/jetty-java5-threadpool.jar: /usr/share/i2p/lib/jetty-rewrite-handler.jar: /usr/share/i2p/lib/jetty-security.jar: /usr/share/i2p/lib/jetty-servlet.jar: /usr/share/i2p/lib/jetty-servlets.jar: /usr/share/i2p/lib/jetty-sslengine.jar: /usr/share/i2p/lib/jetty-start.jar: /usr/share/i2p/lib/jetty-util.jar: /usr/share/i2p/lib/jetty-webapp.jar: /usr/share/i2p/lib/jetty-xml.jar: /usr/share/i2p/lib/jstl.jar: /usr/share/i2p/lib/mstreaming.jar: /usr/share/i2p/lib/org.mortbay.jetty.jar: /usr/share/i2p/lib/org.mortbay.jmx.jar: /usr/share/i2p/lib/routerconsole.jar: /usr/share/i2p/lib/router.jar: 5598 01-16-2020 00:20 org/cybergarage/upnp/ssdp/HTTPMUSocket.class /usr/share/i2p/lib/sam.jar: /usr/share/i2p/lib/standard.jar: /usr/share/i2p/lib/streaming.jar: /usr/share/i2p/lib/systray.jar:
You. probably wouldn’t want to do that by hand.
How to find a list of files and zip them into a single zip file?
I want to collect all the files I have used in a project. I am using find command, and I want it to find a list of files and then I pass its result to zip command to create a single zip file containing all the matched files. Just a convenience if it is possible. However, it seems there are problems with it and it does not work.
find /lmms/samples/ -name warp01*,JR_effect2k*,clean_low_key*,q_kick_2*,sticky_q_kick*,upright_bass*,pizzi*,chorded_perc*,Tr77_kick*,Tr77_tom1*,Tr77_cym*,hihat_008a*,Hat_o.ds,Hat_c.ds,Kickhard.ds,Tr77_snare* -exec zip <> ~/Desktop/files.zip
find: missing argument to `-exec'
PS. After fixing some errors pointed out in the below answers and following their guidelines, I have reformatted the code as below:
find ~/lmms/samples/ (-name warp01* -o -name JR_effect2k* -or -name clean_low_key* -or -name q_kick_2* -or -name sticky_q_kick* -or -name upright_bass*-or -name pizzi* -or -name chorded_perc* -or -name Tr77_kick* -or -name Tr77_tom1* -or -name Tr77_cym* -or -name hihat_008a* -or -name Hat_o.ds -or -name Hat_c.ds -or -name Kickhard.ds -or -name Tr77_snare*) -exec zip -add ~/Desktop/files.zip <> +
It still fails with the message bash: syntax error near unexpected token (‘ ` Removing the parentheses eliminates error but does seem to only add one file to the archive, which surprisingly, I do not seem to find on my Desktop.
find ~/lmms/samples/ -name warp01* -o -name JR_effect2k* -or -name clean_low_key* -or -name q_kick_2* -or -name sticky_q_kick* -or -name upright_bass*-or -name pizzi* -or -name chorded_perc* -or -name Tr77_kick* -or -name Tr77_tom1* -or -name Tr77_cym* -or -name hihat_008a* -or -name Hat_o.ds -or -name Hat_c.ds -or -name Kickhard.ds -or -name Tr77_snare* -exec zip ~/Desktop/files.zip <> + adding: home/john/lmms/samples/drumsynth/tr77/Tr77_snare.ds (deflated 49%)
Special use of linux find command together with zip
In each subfolder there will be files and more subfolders with files/folders. I’m interested in to zip all the files in the subfolders called ‘archive’ and ignore all other folders in the structure.
If I use the command: find * -type d -name ‘archive’
What I’m interested in is to have an output like:
And so on so that I can use the commander: find * -type d -name ‘archive’ [with some more/other options] | zip all_archive_files.zip -@
3 Answers 3
you can match on the whole path using -path or regex : for example
find . -regex ‘./1+/archive/.*’ -type f -exec zip all.zip <> \;
This is crude, but you could:
find . -type d -name 'archive' -exec find <> \; | zip stuff.zip -@
Caveat — OK, but why not uncommon? In about 40 years of programming, I don’t remember having encountered a case, where someone by accident would have created a filename with embedded newline. Of course, this can happen, and one should be aware of this possibility, but I would call it a pretty rare situation.
I said the caveat itself was common, not filenames with newlines in them. In other words it’s super common to write obvious but subtly wrong scripts like this.
The normal way to run zip takes filenames from the command line and with the -r flag the command will recurse into directories by itself. Consider
find . -name 'archive' -type d -exec zip -r all_archive_files.zip <> +
The -exec option will run zip -r all_archive_files.zip . where . is replaced by a list of all the files that find found. Run it with echo between -exec and zip to see what it will do:
$ find . -name 'archive' -type d 411/archive 412/archive 488/archive 512/archive $ find . -name 'archive' -type d -exec echo zip -r all_archive_files.zip <> + zip -r all_archive_files.zip 411/archive 412/archive 488/archive 512/archive
Find and zip the individual files and remove the original
Let have a directory with lots of individual .txt files. My purpose is to find the individual files in the directory, zip them with the same name (excluding .txt) individually and remove the original file. It is very easy to use gzip like below:
Is the zip package installed on the system that you are using? Do you just want to operate on the files in the current directory or in the subdirectores (if they exist)? Are there files in that directory other than the ones ending in .txt ?`
@NasirRiley yes it is installed. Only current directory. Yes there are a mixture of files in the directory
2 Answers 2
The following command will zip each .txt file and remove the original file:
find . -maxdepth 1 -type f -name '*.txt' -exec zip -Tm <>.zip <> \;
$ ls -1 a.txt.zip b.txt.zip c.txt.zip d.jpg
Note: we used the -T option to test the integrity of the archive before removing the input file. This is recommended in the zip man page for the -m option:
-m, —move
Move the specified files into the zip archive; actually, this deletes the target directories/files after making the specified zip archive. If a directory becomes empty after removal of the files, the directory is also removed. No deletions are done until zip has created the archive without error. This is useful for conserving disk space, but is potentially dangerous so it is recommended to use it in combination with -T to test the archive before removing all input files.
Note that the .txt part is still present in the filename. This is how gzip behaves as well.
To remove the .txt part:
If you don’t want the .txt part to remain in the filename, the following command will achieve this:
find . -maxdepth 1 -name '*.txt' -type f -exec bash -c \ 'zip -Tm "$".zip "$1"' inline-bash <> \;
Note: The order of predicates to the find command invocation above avoids applying -type f (which potentially involves an expensive lstat() system call) on those files whose names don’t match the pattern *.txt . (ref)
Note: We provided inline-bash as the first argument to our inline script. This has two benefits:
- $0 within our inline script will be set to inline-bash . (Recall that «$0 expands to the name of the shell or shell script» —bash manual.) For an inline script executed with -c , using inline-bash , or similar, is logical for this purpose and results in more meaningful error messages than if we chose _ , another popular choice.
- Positional parameters to our script will start at 1 as usual.