How can I remove all comments from a file?
Your text and your example contradict. You write about lines being commented out, but clearly from the last line you mean line parts. And then the first line with a comment is deleted including EOL, and second second might be, but it is not clear as that is the last line. Please rephrase ‘lines commented out’ to be exact and disambiguate your examples.
15 Answers 15
One way to remove all comments is to use grep with -o option:
- -o : prints only matched part of the line
- first ^ : beginning of the line
- [^#]* : any character except # repeated zero or more times
Note that empty lines will be removed too, but lines with only spaces will stay.
It should be noted this is NOT a general method for shell scripts, as for example the line somvar=’I am a long complicated string ## with special characters’ # and I am a comment will not be handled correctly.
How did this get 40 upvotes and become selected as the best answer. It doesn’t even handle the simple case print «#tag» # Print a hashtag. .
I believe sed can do a much better job of this than grep . Something like this:
Explanation
- sed will by default look at your file line by line and print each line after possibly applying the transformations in the quotes. ( sed » your_file will just print all the lines unchanged).
- Here we’re giving sed two commands to perform on each line (they’re separated by a semicolon).
- The first command says: /^[[:blank:]]*#/d . In English, that means if the line matches a hash at its beginning (preceded by any number of leading blanks), delete that line (it will not be printed).
- The second command is: s/#.*// . In English that is, substitute a hash mark followed by as many things as you can find (till the end of the line, that is) with nothing (nothing is the empty space between the final two // ).
- In summary, this will run through your file deleting lines that consist entirely of comments and any lines left after that will have the comments stricken out of them.
It will also delete anything found after a hash inside a string, no ? E.g. mystring=»Hello I am a #hash» will become mystring=»Hello I am a»
@javadba, yes, but at that point you might as well use a full parser. What’s going to be using this data that can understand quotes and variable assignments but can’t handle comments? (This is why many config files such as crontab only allow full-line comments, with or without leading whitespace, but do not allow trailing comments on a line. The logic is MUCH simpler. Use only the first of the two Sed instructions in this answer for a crontab comment stripper.)
great answer, this looks like a great balance of utility vs. complexity for a wide array of general use-cases, but in the case that you know ahead of time that you only need to delete lines starting directly with # (in column 1), is there any benefit to sed over grep -v «^#» ?
A small enhancement. sed ‘/^[[:blank:]]*#/d;s/[[:blank:]]*#.*//’ your_file . Your original command leaves some trailing blanks in a line. My enhancement gets rid of those, too.
As others have pointed out, sed and other text-based tools won’t work well if any parts of a script look like comments but actually aren’t. For example, you could find a # inside a string, or the rather common $# and $ .
I wrote a shell formatter called shfmt, which has a feature to minify code. That includes removing comments, among other things:
$ cat foo.sh echo $# # inline comment # lone comment echo '# this is not a comment' [mvdan@carbon:12] [0] [/home/mvdan] $ shfmt -mn foo.sh echo $# echo '# this is not a comment'
The parser and printer are Go packages, so if you’d like a custom solution, it should be fairly easy to write a 20-line Go program to remove comments in the exact way that you want.
Delete all comments in a file using sed
How would you delete all comments using sed from a file(defined with #) with respect to ‘#’ being in a string? This helped out a lot except for the string portion.
If you are talking about comments in a shell script, you need to worry about a lot more than strings. For example, there is no comment in echo foo#bar or echo $
7 Answers 7
If # always means comment, and can appear anywhere on a line (like after some code):
If you want to change it in place, add the -i switch:
This will delete from any # to the end of the line, ignoring any context. If you use # anywhere where it’s not a comment (like in a string), it will delete that too.
If comments can only start at the beginning of a line, do something like this:
If they may be preceded by whitespace, but nothing else, do:
These two will be a little safer because they likely won’t delete valid usage of # in your code, such as in strings.
There’s not really a nice way of detecting whether something is in a string. I’d use the last two if that would satisfy the constraints of your language.
The problem with detecting whether you’re in a string is that regular expressions can’t do everything. There are a few problems:
- Strings can likely span lines
- A regular expression can’t tell the difference between apostrophies and single quotes
- A regular expression can’t match nested quotes (these cases will confuse the regex):
# "hello there" # hello there" "# hello there"
If double quotes are the only way strings are defined, double quotes will never appear in a comment, and strings cannot span multiple lines, try something like this:
That’s a lot of pre-conditions, but if they all hold, you’re in business. Otherwise, I’m afraid you’re SOL, and you’d be better off writing it in something like Python, where you can do more advanced logic.
Requiring an space before the # is a good idea. Do a two step, lines starting with # and lines containing (space#) like this: sed -e ‘s/^[ \t]*#[^!].*$//g’ -e ‘s/[ \t]#.*$//g’ . That will avoid most other script uses for #: echo «$ <#a>$# $ $(( 16#11 ))» . `#a>
This might work for you (GNU sed):
sed '/#/!b;s/^/\n/;ta;:a;s/\n$//;t;s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta;s/\n\([^#]\)/\1\n/;ta;s/\n.*//' file
- /#/!b if the line does not contain a # bail out
- s/^/\n/ insert a unique marker ( \n )
- ta;:a jump to a loop label (resets the substitute true/false flag)
- s/\n$//;t if marker at the end of the line, remove and bail out
- s/\n\(\(«[^»]*»\)\|\(‘\»[^’\»]*’\»\)\)/\1\n/;ta if the string following the marker is a quoted one, bump the marker forward of it and loop.
- s/\n\([^#]\)/\1\n/;ta if the character following the marker is not a # , bump the marker forward of it and loop.
- s/\n.*// the remainder of the line is comment, remove the marker and the rest of line.
This code does not handle an escaped double quote inside a double-quoted string. Nor does it handle expansion commands like $
Since there is no sample input provided by asker, I will assume a couple of cases and Bash is the input file because bash is used as the tag of the question.
Case 1: entire line is the comment
The following should be sufficient enough in most case:
It matches any line has which has none or at least one leading white-space characters (space, tab, or a few others, see man isspace ), followed by a # , then delete the line by d command.
# comment started from beginning. # any number of white-space character before # or 'quote' in "here"
will not be deleted, which is the desired result.
Case 2: comment after actual code
if [[ $foo == "#bar" ]]; then # comment here
The comment part can be removed by
[^\»‘] is used to prevent quoted string confusion, however, it also means that comments with quotations ‘ or » will not to be removed.
No, this specifically fails for the soecific problem stated in the question; namely, it doesn’t leave quoted strings alone.
Or the best, a complete file to test with expected output. We are not psychic, we can’t guess. In my answer, it does handle quoted strings.
@tripleee, I just realized you are not asker. My code does leave # in a string or quoted string alone, at least in one-liner string. As the asker did not specify what exact the string is, my answer stands correct.
@tripleee still, I want to see what quoted string you were talking about, please comment with a sample case.
To remove comment lines (lines whose first non-whitespace character is # ) but not shebang lines (lines whose first characters are #! ):
The first argument to sed is a string containing a sed program consisting of two delete-line commands of the form / regex /d . Commands are separated by ; . The first command deletes comment lines but not shebang lines. The second command deletes any remaining empty comment lines. It does not handle trailing comments.
The last argument to sed is a file to use as input. In Bash, you can also operate on a string variable like this:
# test.sh S0=$(cat \n\n" S1=$(sed '/^[[:space:]]*#[^!]/d; /#$/d' ") printf "\nAFTER removal:\n\n$\n\n"
$ bash test.sh BEFORE removal: #!/usr/bin/env bash # comment # indented comment echo 'FOO' # trailing comment # last line is an empty, indented comment # AFTER removal: #!/usr/bin/env bash echo 'FOO' # trailing comment
Supposing «being in a string» means «occurs between a pair of quotes, either single or double», the question can be rephrased as «remove everything after the first unquoted #». You can define the quoted strings, in turn, as anything between two quotes, excepting backslashed quotes. As a minor refinement, replace the entire line with everything up through just before the first unquoted #.
So we get something like [^\»‘#] for the trivial case — a piece of string which is neither a comment sign, nor a backslash, nor an opening quote. Then we can accept a backslash followed by anything: \\. — that’s not a literal dot, that’s a literal backslash, followed by a dot metacharacter which matches any character.
Then we can allow zero or more repetitions of a quoted string. In order to accept either single or double quotes, allow zero or more of each. A quoted string shall be defined as an opening quote, followed by zero or more of either a backslashed arbitrary character, or any character except the closing quote: «\(\\.\|[^\»]\)*» or similarly for single-quoted strings ‘\(\\.\|[^\’]\)*’ .
Piecing all of this together, your sed script could look something like this:
But because it needs to be quoted, and both single and double quotes are included in the string, we need one more additional complication. Recall that the shell allows you to glue together strings like «foo»‘bar’ gets replaced with foobar — foo in double quotes, and bar in single quotes. Thus you can include single quotes by putting them in double quotes adjacent to your single-quoted string — ‘»foo»‘»‘» is «foo» in single quotes next to ‘ in double quotes, thus «foo»‘ ; and «‘ can be expressed as ‘»‘ adjacent to «‘» . And so a single-quoted string containing both double quotes foo»‘bar can be quoted with ‘foo»‘ adjacent to «‘bar» or, perhaps more realistically for this case ‘foo»‘ adjacent to «‘» adjacent to another single-quoted string ‘bar’ , yielding ‘foo'»‘»‘bar’ .
This was tested on Linux; on other platforms, the sed dialect may be slightly different. For example, you may need to omit the backslashes before the grouping and alteration operators.
Alas, if you may have multi-line quoted strings, this will not work; sed , by design, only examines one input line at a time. You could build a complex script which collects multiple lines into memory, but by then, switching to e.g. Perl starts to make a lot of sense.