- How to remove the first word of a line in Bash — comparing awk vs. cut vs. sed
- Remove the first word of a line with awk
- Remove the first word of a line with sed
- Remove the first word of a line with cut
- awk vs. sed vs. cut: Who’s the winner?
- Add a comment
- Comments (newest first)
- Blog Tags:
- Remove first character of a string in Bash
- method 1) using bash
- method 2) using cut
- method 3) using sed
- method 4) using awk
How to remove the first word of a line in Bash — comparing awk vs. cut vs. sed
Many paths lead to Rome. The same also applies when doing text manipulations in Bash. In this short article awk , cut and sed are compared how to remove the first word of a line.
The line itself is an output from another command — but it doesn’t matter if the output comes from a file with content or from another command’s stdout. As I’m currently working on fixing issue 9 of check_netio, I was looking for a generic way to remove the first word of a line:
root@linux:~# cat /proc/net/netstat
TcpExt: SyncookiesSent SyncookiesRecv SyncookiesFailed EmbryonicRsts PruneCalled RcvPruned OfoPruned OutOfWindowIcmps LockDroppedIcmps ArpFilter TW TWRecycled TWKilled PAWSActive PAWSEstab DelayedACKs DelayedACKLocked DelayedACKLost ListenOverflows ListenDrops TCPHPHits TCPPureAcks TCPHPAcks TCPRenoRecovery TCPSackRecovery TCPSACKReneging TCPSACKReorder TCPRenoReorder TCPTSReorder TCPFullUndo TCPPartialUndo TCPDSACKUndo TCPLossUndo TCPLostRetransmit TCPRenoFailures TCPSackFailures TCPLossFailures TCPFastRetrans TCPSlowStartRetrans TCPTimeouts TCPLossProbes TCPLossProbeRecovery TCPRenoRecoveryFail TCPSackRecoveryFail TCPRcvCollapsed TCPDSACKOldSent TCPDSACKOfoSent TCPDSACKRecv TCPDSACKOfoRecv TCPAbortOnData TCPAbortOnClose TCPAbortOnMemory TCPAbortOnTimeout TCPAbortOnLinger TCPAbortFailed TCPMemoryPressures TCPMemoryPressuresChrono TCPSACKDiscard TCPDSACKIgnoredOld TCPDSACKIgnoredNoUndo TCPSpuriousRTOs TCPMD5NotFound TCPMD5Unexpected TCPMD5Failure TCPSackShifted TCPSackMerged TCPSackShiftFallback TCPBacklogDrop PFMemallocDrop TCPMinTTLDrop TCPDeferAcceptDrop IPReversePathFilter TCPTimeWaitOverflow TCPReqQFullDoCookies TCPReqQFullDrop TCPRetransFail TCPRcvCoalesce TCPOFOQueue TCPOFODrop TCPOFOMerge TCPChallengeACK TCPSYNChallenge TCPFastOpenActive TCPFastOpenActiveFail TCPFastOpenPassive TCPFastOpenPassiveFail TCPFastOpenListenOverflow TCPFastOpenCookieReqd TCPFastOpenBlackhole TCPSpuriousRtxHostQueues BusyPollRxPackets TCPAutoCorking TCPFromZeroWindowAdv TCPToZeroWindowAdv TCPWantZeroWindowAdv TCPSynRetrans TCPOrigDataSent TCPHystartTrainDetect TCPHystartTrainCwnd TCPHystartDelayDetect TCPHystartDelayCwnd TCPACKSkippedSynRecv TCPACKSkippedPAWS TCPACKSkippedSeq TCPACKSkippedFinWait2 TCPACKSkippedTimeWait TCPACKSkippedChallenge TCPWinProbe TCPKeepAlive TCPMTUPFail TCPMTUPSuccess TCPWqueueTooBig
TcpExt: 0 0 0 105 0 0 0 0 0 0 2202215 0 0 0 6 211917 2726 446896 0 3 5747904 26883443 3968155 0 9122 0 704 0 0 0 0 83 333 7 0 21 2 9728 192 1702 12938 3042 0 72 0 446953 9 7961 11 518150 10 0 88 0 0 0 0 0 11 5637 52 0 0 0 22379 10366 21027 0 0 0 0 0 0 0 0 0 858733 631 0 8 0 0 0 0 0 0 0 0 0 1 0 1343155 0 0 0 2199 62739777 3573 69078 88 3242 3 0 8 0 0 0 7 0 0 0 0
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts ReasmOverlaps
IpExt: 0 0 21 8885283 227435 0 27399435386 27248510872 812 390952368 49094631 0 0 106478999 0 462174 0 0
Note the lines start with an informational «TcpExt:» or «IpExt:». These need to be removed. Globally saying: The first word of each line needs to be removed.
Remove the first word of a line with awk
When working with awk , it’s obvious that the fields can be printed out manually and leaving out the first field/word, such as:
root@linux:~# echo «first second third fourth fifth» | awk ‘< print $2" "$3" "$4" "$5 >‘
second third fourth fifth
But obviously this method only works if you know the exact number of words/columns in a line and you really like to type.
A better way is to use a for loop and tell awk where to start:
root@linux:~# echo «first second third fourth fifth» | awk »
second third fourth fifth
The for loop starts with the second entry (i=2) and it should continue to loop through all the fields until NF is reached. NF is an internal variable used in awk to represent the last field (word «fifth» in this case). I agree, it looks complicated, but this can be used generally across all kinds of files or output, no matter the length of a line.
Remove the first word of a line with sed
sed is another powerful command which comes with more functions than anyone would think of. The problem: Using these functions is sometimes pretty «weird» and complicated — depending what one wants to achieve (well, awk is not much better in this case). However for this particular use-case to remove the first word of a line, the sed command is pretty easy:
root@linux:~# echo «first second third fourth fifth» | sed «s/^[^ ]* //»
second third fourth fifth
Basically sed is told here to use a substitution (= search and replace) function and to look for «anything but whitespace» at the beginning of the line. The «anything but» here is defined by using a special bracket expression: [^ ] . From the sed documentation:
A bracket expression is a list of characters enclosed by ‘[’ and ‘]’. It matches any single character in that list; if the first character of the list is the caret ‘^’, then it matches any character not in the list.
This means the substitution is applied on everything until the first blank space/white-space is found. And in this case this is the first word at the line beginning.
Remove the first word of a line with cut
Just by hearing the command’s name » cut «, would let one think that this is the obvious command to use. Simply cut the first word off, right? And yes — it basically is that simple. There are two ways how to achieve this with cut :
root@linux:~# echo «first second third fourth fifth» | cut -d ‘ ‘ -f 2-
second third fourth fifth
In the above example, cut is told to use a white-space as field delimiter -d ‘ ‘ (to separate the words) and print fields 2 and later ( -f 2- ).
The other method is to «reverse» the cut command by saying it should print everything except the first field. This can be achieved by using the additional parameter —complement :
root@linux:~# echo «first second third fourth fifth» | cut -d ‘ ‘ -f 1 —complement
second third fourth fifth
awk vs. sed vs. cut: Who’s the winner?
That’s the nice part: Every command is a winner. The goal was achieved and every admin or developer should use the command one prefers. But if there’s a measurement to declare a winner, it’s the time factor.
On a Debian 9 (Stretch) system with a current system load of almost 0, the different commands were run alongside the time command.
ck@linux:~$ time echo «first second third fourth fifth» | awk »; \
> time echo «first second third fourth fifth» | sed «s/^[^ ]* //»; \
> time echo «first second third fourth fifth» | cut -d ‘ ‘ -f 2-; \
> time echo «first second third fourth fifth» | cut -d ‘ ‘ -f 1 —complement
second third fourth fifth
real 0m0.004s
user 0m0.000s
sys 0m0.000s
second third fourth fifth
real 0m0.005s
user 0m0.000s
sys 0m0.000s
second third fourth fifth
real 0m0.003s
user 0m0.000s
sys 0m0.000s
second third fourth fifth
real 0m0.003s
user 0m0.000s
sys 0m0.000s
The same command was run ten times with a random sleep time in between. This finally results in the following table:
awk | sed | cut | cut reverse | |
1 | 0.004 | 0.005 | 0.003 | 0.003 |
2 | 0.004 | 0.005 | 0.003 | 0.002 |
3 | 0.004 | 0.005 | 0.004 | 0.002 |
4 | 0.004 | 0.005 | 0.004 | 0.003 |
5 | 0.004 | 0.005 | 0.004 | 0.003 |
6 | 0.004 | 0.005 | 0.004 | 0.003 |
7 | 0.004 | 0.005 | 0.003 | 0.002 |
8 | 0.004 | 0.005 | 0.003 | 0.002 |
9 | 0.004 | 0.004 | 0.003 | 0.003 |
10 | 0.004 | 0.005 | 0.004 | 0.003 |
Avg | 0.0040 | 0.0049 | 0.0035 | 0.0026 |
I’m actually quite surprised, but the winner, according to the command runtime is clearly the «reversed» cut command! sed on the other hand is clearly the slowest command.
Add a comment
Comments (newest first)
Yassine Chaouche from Algiers wrote on Aug 25th, 2022:
But how to do it in pure bash?
ck from Switzerland wrote on Sep 4th, 2020:
That is correct, cut does not do any (regex) parsing. And the program itself is also much smaller (hence quicker startup):
claudio@nas:~$ du /usr/bin/cut
44 /usr/bin/cut
claudio@nas:~$ ls -la /usr/bin/awk
lrwxrwxrwx 1 root root 21 Sep 14 2018 /usr/bin/awk -> /etc/alternatives/awk
claudio@nas:~$ file /etc/alternatives/awk
/etc/alternatives/awk: symbolic link to /usr/bin/mawk
claudio@nas:~$ du /usr/bin/mawk
120 /usr/bin/mawk
claudio@nas:~$ du /bin/sed
104 /bin/sed
However comparing the commands with the output of /proc/net/netstat does not show a larger difference (the output of netstat is still small): 0.005s for awk, 0.005s for sed, 0.003 for both cut. But a much larger file would most likely show a larger time difference, agreed.
Michael Heiniger from wrote on Sep 4th, 2020:
It’s actually not surprising that the cut command wins, it does less work. It just searches one well-defined character on each line and omits anything before the first one. It does not have to apply a regex for each character.
Also the startup time of the command has to be considered. There is not much parsing in cut, while in sed and awk it first needs to parse the command you pass.
It would have been a bit more representative if you piped in a copy of your netstat than just 5 strings.
Blog Tags:
© 2008 — 2023 by Claudio Kuenzler. Powered by .
This website uses own and third-party 🍪 cookies to improve your browsing experience. By continuing using our website you agree to the Cookie and Privacy Policy. I agree
Remove first character of a string in Bash
Starting at character number 1 of myString (character 0 being the left-most character) return the remainder of the string. The «s allow for spaces in the string. For more information on that aspect look at $IFS.
This answer is fastest because it is pure bash. The others run an external process for all strings. This is not be important at all if you have only a handful of strings to process, but can be important if you have a high number of strings and little work to do (on average on each string).
This is also a more general and therefore may be applied to many more cases. Should be the accepted answer.
while read line do echo $line | cut -c2- | md5sum done ./g.sh < directory_listnitg.txt
for those wondering what on earth could this =syntax= mean, the -c2- argument could be interpreted as: return characters ( -c ) starting from second one and till the end 2- , e.g. other examples could be -c2-5 for range or -c3 for a single letter; to cut counting from the end, use rev | cut .. | rev
remove first n characters from a line or string
method 1) using bash
method 2) using cut
method 3) using sed
method 4) using awk
There ia a very easy way to achieve this:
Suppose that we don't want the prefix "i-" from the variable
$ ROLE_TAG=role $ INSTANCE_ID=i-123456789
You just need to add '#'+[your_exclusion_pattern], e.g:
$ echo $MYHOSTNAME role-123456789
Different approach, using sed, which has the benefit that it can handle input that doesn't start with a dot. Also, you won't run into problems with echo appending a newline to the output, which will cause md5sum to report bogus result.
#!/bin/bash while read line do echo -n $line | sed 's/^.//' | md5sum done < input
$ echo "a" | md5sum 60b725f10c9c85c70d97880dfe8191b3 - $ echo -n "a" | md5sum 0cc175b9c0f1b6a831c399e269772661 -
Testing on Ubuntu 18.04.4 LTS, bash 4.4.20:
$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionic $ echo $BASH_VERSION 4.4.20(1)-release $ myString="./r/g4/f1.JPG" $ myString="$" $ echo $myString /r/g4/f1.JPG
Set the field separator to the path separator and read everything except the stuff before the first slash into $name :
while IFS=/ read junk name do echo $name done < directory_listing.txt
You can do the entire thing like this:
Really, I'm thinking you can make this a lot more efficient, but I don't know what is generating your list. If you can pipe it from that, or run that command through a heredoc to keep its output sane, you can do this whole job streamed, probably.
OK, you say it's from an "ls dump." Well, here's something a little flexible:
% ls_dump() < >sed 's@^.\(.*\)$@md5sum \1@' > `ls $` >> _EOF_ > > % ls_dump -all -args -you /would/normally/give/ls
I think this calls only a single subshell in total. It should be pretty good, but in my opinion, find . -exec md5sum <> . + is probably safer, faster, and all you need.
OK, so now I will actually answer the question. To remove the first character of a string in any POSIX compatible shell you need only look to parameter expansion like: