Bash : Check if file contains other file contents
So I’m trying to append the contents of one file to another file, if it’s not already included. This is how I try:
- I’m on OSX 10.11.5 (but a solution for Linux / cross-platform could also be relevant both for me at home or for someone else reading this)
- My choice of using catAndAppend over cat $file1 >> $file2 is to handle cases where sudo is needed and separate the appended stuff from what’s already there by adding newlines as needed.
- I don’t wish to append if file $1 is anywhere in file $2 (not only at the beginning or the end)
- For info, here’s one of the files $1 contents that I tried against:
alias ls='ls -a' alias mkdir="mkdir -pv" alias wget="wget -c" alias histg="history | grep" alias echopath='echo $PATH | tr -s ":" "\n"' alias myip="curl -sSL http://ipecho.net/plain | xargs echo" alias webpic="mogrify -resize 690\> *.png" alias cddog='cd ~/dev/go/src/github.com/dogtools/dog' alias xp='cd ~/dev/go/src/experiments'
- but I will need to use it with other files containing var exports, code, commands, configs, any kind of text basically
catAndAppend works with permission denied files, and adds a newline after what’s appended to accomodate future appends to the same file. That’s why I use something custom over cat «$1» >> «$2»
also @BigDataLearner if grep cat $1 $2 still seems to fail (it always goes into the else case and appends whatever the file already contained).
did you try executing your grep line by itself? Get that to work from cmd line before you worry about the other stuff. Grep normally uses words/phrases/regular expressions as its search target and would not try to open file1 to get all of the «words» inside of it. You might be able to munge the output of comm to see that two files are identical. Good luck.
4 Answers 4
Don’t append if file $1 is anywhere in file $2 :
catAndAppendIfMissing()< f1=$(wc -c < "$1") diff -y <(od -An -tx1 -w1 -v "$1") <(od -An -tx1 -w1 -v "$2") | \ rev | cut -f2 | uniq -c | grep -v '[>|]' | numgrep /$../ | \ grep -q -m1 '.+*' || cat "$1" >> "$2"; >
- Count chars in file $1 using wc .
- Use od to produce a one byte per line hex dump of both files, and using a bash ism, obtain a diff file, which is piped to.
- rev , then cut the 2nd field, and do a uniq count of the consecutive lines that have blanks instead of ‘>’s.
- If one of those counts is equal to or greater than $f1 , it’s OK to append. This could be checked with variables, but numgrep was convenient and helps avoid variables.
Notes. Good: works with binary files too. Bad: inefficient, od reads the whole of both files, and diff reads the whole of od ‘s output. If file1 was a one line string, which was in the first line of a 1TB file2, much time would be wasted.
(Old version). Don’t append if file $1 is already appended to file $2 :
- Get file lengths with wc , store in $f1 and $f2 .
- If the first file is longer than the second file, (or if shorter, if cmp shows the the first file isn’t already appended to the second file), then append it to the second file with cat . Otherwise return with an error code.
This will only work if $2 is not smaller than $1 . And you properly want to change it to: cmp . || cat «$1» >> «$2»
This works if the string I’m looking for is at the end of the file. If it’s anywhere else in the file it’s going to append it again
@n-marshall, the Q being somewhat append oriented, it gave the impression that the goal was solely to avoid double appending. Please consider improving the Q’s wording to clarify the point that $f1 shouldn’t occur anywhere in $f2 .
I finally got the time to try your new solution, but i’m now getting an error : ./common/configs/.shell-functions: line 65: syntax error near unexpected token `(‘ ./common/configs/.shell-functions: line 65: ` diff -y <(od -An -tx1 -w1 -v "$1") <(od -An -tx1 -w1 -v "$2") | \' I could definitely use some help, as i'm not able to understand that code (yes, even with the explanations ! :) I don't know how you came up with that solution)
It probably isn’t worth trying to conditionally update the file; just source each file to make sure all the aliases are defined, then unconditionally store the output of alias to the file you would otherwise be appending to.
source "$1" # Original aliases source "$2" # New aliases alias > "$1" # Combined aliases
this file containing only aliases is only one example of a file I want to deal with. I have other files containing exports, functions, configuration files. basically I’m trying to put all the config files of my system on git
I just chose this example because it has a non-trivial things to deal with, like newlines ( \n and a literal newline in the file) and all kinds of quotation marks
file1_Content=`cat $1` if grep $ $2
file1_Content=`cat $1` grep $ $2 if [ $? == 0 ];then echo "found" else #catAndAppend fi
I really can’t get the syntax if grep `cat $1` $2 to work. I get an output like grep: invalid option — ‘
So I did my homework and came up with a solution which (almost) fits the bill, with the only difference that it’s done in python instead of bash. My python script is then called from bash.
import re, os, subprocess, mmap, sys, pprint def isFile1InFile2(file1Path, file2Path): with open(file2Path) as file2: file2Access = mmap.mmap(file2.fileno(), 0, access=mmap.ACCESS_READ) file1Contents = open(file1Path).read() if file2Access.find(file1Contents) != -1: return True else: return False def appendIfMissing(source, dest): destFullPath = os.path.expanduser(dest) if os.path.isfile(destFullPath): if isFile1InFile2(source, destFullPath): print ('Source\'s contents found in dest file, no need to append') else: print('Source\'s contents cannot be found in dest file, appending. ') # append source file to destfile command = ' '.join(['source', './common/configs/.shell-functions', '&&', 'catAndAppend', source, destFullPath]) os.system(command) else: print "destfile not a file yet, copying sourcefile to destfile. " # copy source file to destfile command = ' '.join(['source', './common/configs/.shell-functions', '&&', 'catAndAppend', source, destFullPath]) print command os.system(command) if len(sys.argv) != 3: sys.exit('[ERROR] appendIfMissing.py, line 31: number of arguments passed is not 3') else: appendIfMissing(sys.argv[1], sys.argv[2])
And then to call it from bash:
With the bash function (the one called from python) staying the same:
createFileIfMissing() < # create file if doesn't exist, with right permission [[ ! -s $1 ]] && touch "$1" || [[ ! -s $1 ]] && sudo touch "$1" >addNewLineToFile()< [[ ! -e $1 ]] || [[ -w $1 ]] && printf "\n" >> $1 || [[ -e $1 ]] && [[ ! -w $1 ]] && sudo bash -c "printf \"\n\" >> $1" > catAndAppend()< createFileIfMissing $2 # append stuff to it [[ ! -e $2 ]] || [[ -w $2 ]] && cat $1 >> $2 || [[ -e $2 ]] && [[ ! -w $2 ]] && sudo bash -c "cat $1 >> $2" addNewLineTo $2 >
- It’s not bash. I asked for a bash solution in my question (but really all I care about is to have a solution)
- It’s not bash. And as it’s intended for system setup scripts, I’d have to install python first for this to work. But I want that eventually installed anyway.
- It’s more readable / maintainable / customizable than bash IMHO (being a newbie to both of those languages, python is more intuitive to start with)
- It’s cross platform
Check if a file contains a certain pattern?
Now if I want to check whether test.txt has certain keywords present, in it, how do we go about? Let’s say ‘apache’ or ‘fast’. Can we use the if statement here, if yes, how?
2 Answers 2
first of all the first shebang line is not what you want for your script. expect as a shell has a limited use and this is not one of them
first line should be something like
ps -ef > test.txt grep -e fast -e apache test.txt
will print you all the lines containing either of these words.
or you can skip the writing to file step and do it in one line as such:
ps -ef | grep -e fast -e apache
EDIT for conditional check:
ps -ef | grep -e fast -e apache | grep -v grep > dev/null; result=$ if [ $ -eq 0 ] then echo "Found one or more occurrences of 'apache' and/or 'fast'" else echo "Searched strings were not found" fi
I think you want to drop test.txt from your last option, also, your command will also find itself which is probably not what’s desired, probably pgrep could be used though I’m not sure I’ve used it to match either usernames or commands (e.g., the apache user and the apache process) if that is indeed the intent of the OP
Thanks. Can we use a conditional statement to check whether a string is present and print/echo yes/no?
You can declare an array in here
#!/bin/bash string=('fast' 'apache') ps -ef > test.txt for i in "$string[@]" do grep "$i" test.txt done
Or you can do it directly in the ps line to save only those processes
#!/bin/bash string=('fast' 'apache') for i in "$string[@]" do ps -ef | grep "$i" > ps_output_of_$i.txt done
You must log in to answer this question.
Related
Hot Network Questions
Subscribe to RSS
To subscribe to this RSS feed, copy and paste this URL into your RSS reader.
Site design / logo © 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2023.7.14.43533
Linux is a registered trademark of Linus Torvalds. UNIX is a registered trademark of The Open Group.
This site is not affiliated with Linus Torvalds or The Open Group in any way.
By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.
Find files containing a given text
In bash I want to return file name (and the path to the file) for every file of type .php|.html|.js containing the case-insensitive string «document.cookie» | «setcookie» How would I do that?
6 Answers 6
egrep -ir --include=*. "(document.cookie|setcookie)" .
The r flag means to search recursively (search subdirectories). The i flag means case insensitive.
If you just want file names add the l (lowercase L ) flag:
egrep -lir --include=*. "(document.cookie|setcookie)" .
that didn’t seem to work for me(at least not on mac). just hangs. egrep -lir —include=* «repo» egrep: warning: recursive search of stdin
You forgot to add the path to search. The path is ‘.’ in the above example. In your case, the script is waiting for the input to search on stdin. Try: egrep -lir —include=* «repo» / (or any other path)
Try something like grep -r -n -i —include=»*.html *.php *.js» searchstrinhere .
the -i makes it case insensitlve
the . at the end means you want to start from your current directory, this could be substituted with any directory.
the -r means do this recursively, right down the directory tree
the -n prints the line number for matches.
the —include lets you add file names, extensions. Wildcards accepted
find them and grep for the string:
This will find all files of your 3 types in /starting/path and grep for the regular expression ‘(document\.cookie|setcookie)’ . Split over 2 lines with the backslash just for readability.
find /starting/path -type f -name "*.php" -o -name "*.html" -o -name "*.js" | \ xargs egrep -i '(document\.cookie|setcookie)'
Thanks @Michael Berkowski This way fastest more than 5 or 8 times # egrep -ir —include=file.foo «(foo|bar)» /dir on ~500Gb weigth directory.
Sounds like a perfect job for grep or perhaps ack
Or this wonderful construction:
find . -type f \( -name *.php -o -name *.html -o -name *.js \) -exec grep "document.cookie\|setcookie" /dev/null <> \;
@MichaelBerkowski : You can use it like this to deal with whitespace in filenames: find . -type f -print0 | xargs -0 -I <> grep «search_string» <> . Of course, the other options can be added as well.
find . -type f -name '*php' -o -name '*js' -o -name '*html' |\ xargs grep -liE 'document\.cookie|setcookie'
Just to include one more alternative, you could also use this:
find «/starting/path» -type f -regextype posix-extended -regex «^.*\.(php|html|js)$» -exec grep -EH ‘(document\.cookie|setcookie)’ <> \;
- -regextype posix-extended tells find what kind of regex to expect
- -regex «^.*\.(php|html|js)$» tells find the regex itself filenames must match
- -exec grep -EH ‘(document\.cookie|setcookie)’ <> \; tells find to run the command (with its options and arguments) specified between the -exec option and the \; for each file it finds, where <> represents where the file path goes in this command. while
- E option tells grep to use extended regex (to support the parentheses) and.
- H option tells grep to print file paths before the matches.
And, given this, if you only want file paths, you may use:
find «/starting/path» -type f -regextype posix-extended -regex «^.*\.(php|html|js)$» -exec grep -EH ‘(document\.cookie|setcookie)’ <> \; | sed -r ‘s/(^.*):.*$/\1/’ | sort -u
- | [pipe] send the output of find to the next command after this (which is sed , then sort )
- r option tells sed to use extended regex.
- s/HI/BYE/ tells sed to replace every First occurrence (per line) of «HI» with «BYE» and.
- s/(^.*):.*$/\1/ tells it to replace the regex (^.*):.*$ (meaning a group [stuff enclosed by () ] including everything [ .* = one or more of any-character] from the beginning of the line [ ^ ] till’ the first ‘:’ followed by anything till’ the end of line [ $ ]) by the first group [ \1 ] of the replaced regex.
- u tells sort to remove duplicate entries (take sort -u as optional).
. FAR from being the most elegant way. As I said, my intention is to increase the range of possibilities (and also to give more complete explanations on some tools you could use).