Cut linux multiple delimiters

How to specify more spaces for the delimiter using cut?

Is there any way to specify a field delimiter for more spaces with the cut command? (like » «+) ? For example: In the following string, I like to reach value ‘3744’, what field delimiter I should say?

$ps axu | grep jboss jboss 2574 0.0 0.0 3744 1092 ? S Aug17 0:00 /bin/sh /usr/java/jboss/bin/run.sh -c example.com -b 0.0.0.0 

cut -d’ ‘ is not what I want, for it’s only for one single space. awk is not what I am looking for either, but how to do with ‘cut’? thanks.

Not directly relevant to the actual question being asked but instead of ps + grep you could use pgrep which is available in most modern distros. It will return the result exactly in the form you need it.

These days I just use hck as a drop in cut replacement. By default it splits on all whitespace, like awk. And the key feature is that you can specify a delimiter with -d like cut, but unlike cut that delimiter can be a regex! No more needing to pre-process with tr -s before passing to cut. You can find hck here: github.com/sstadick/hck

12 Answers 12

Actually awk is exactly the tool you should be looking into:

or you can ditch the grep altogether since awk knows about regular expressions:

But if, for some bizarre reason, you really can’t use awk , there are other simpler things you can do, like collapse all whitespace to a single space first:

ps axu | grep '[j]boss' | sed 's/\s\s*/ /g' | cut -d' ' -f5 

That grep trick, by the way, is a neat way to only get the jboss processes and not the grep jboss one (ditto for the awk variant as well).

The grep process will have a literal grep [j]boss in its process command so will not be caught by the grep itself, which is looking for the character class [j] followed by boss .

This is a nifty way to avoid the | grep xyz | grep -v grep paradigm that some people use.

I keep learning and forgetting the grep trick. Thanks for my most recent reminder. Maybe this time it’ll stick. But I wouldn’t bet on it.

@Michael, you should set up a cron job somewhere to mail that tip (and possibly others) to you once a month 🙂

Oliver, sometimes the best answer to «how do I do X with Y?» is «Don’t use Y, use Z instead». Since OP accepted this answer, it’s likely I convinced them of that 🙂

awk version is probably the best way to go, but you can also use cut if you firstly squeeze the repeats with tr :

ps axu | grep jbos[s] | tr -s ' ' | cut -d' ' -f5 # ^^^^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^^ # | | | # | | get 5th field # | | # | squeeze spaces # | # avoid grep itself to appear in the list 

@fedorqui When it comes to print nth field to the end, the cut -f5- grammar, «-fN-» is much simpler than awk .

Читайте также:  Локальная сеть через wifi роутер linux

I like to use the tr -s command for this

 ps aux | tr -s [:blank:] | cut -d' ' -f3 

This squeezes all white spaces down to 1 space. This way telling cut to use a space as a delimiter is honored as expected.

I think this should be the answer, it is closer to the OP request (asked to use cut). This approach is 5-10% slower than the awk approach (because there is one more pipe to handle with tr), but in general this will be irrelevant.

I am going to nominate tr -s [:blank:] as the best answer.

Why do we want to use cut? It has the magic command that says «we want the third field and every field after it, omitting the first two fields»

cat log | tr -s [:blank:] |cut -d' ' -f 3- 

I do not believe there is an equivalent command for awk or perl split where we do not know how many fields there will be, ie out put the 3rd field through field X.

Shorter/simpler solution: use cuts (cut on steroids I wrote)

ps axu | grep '[j]boss' | cuts 4 

Note that cuts field indexes are zero-based so 5th field is specified as 4

And even shorter (not using cut at all) is:

One way around this is to go:

$ps axu | grep jboss | sed 's/\s\+/ /g' | cut -d' ' -f3 

to replace multiple consecutive spaces with a single one.

\s is a GNU sed extension. On OS X you can pass the -E flag to sed to enable extended regular expressions, then use [[:space:]] in place of \s , like so: sed -E ‘s/[[:space:]]+/ /g’

Personally, I tend to use awk for jobs like this. For example:

ps axu| grep jboss | grep -v grep | awk '' 

That can be compressed down to ps axu | awk ‘/[j]boss/ ‘ .

Isn’t awk slower (especially when there are some superfluous other processes), then sed / grep / cut?

As an alternative, there is always perl:

ps aux | perl -lane 'print $F[3]' 

Or, if you want to get all fields starting at field #3 (as stated in one of the answers above):

ps aux | perl -lane 'print @F[3 .. scalar @F]' 

This does not work with the output of lsof I tried lsof|perl -lane ‘print $F[5]’ this sometimes gets the 5th column, sometimes the 6th

I think the question just was how to use delimiters that might contain a varying number of spaces. For this purpose the answer was correct.

If you want to pick columns from a ps output, any reason to not use -o?

ps ax -o pid,vsz ps ax -o pid,cmd 

Minimum column width allocated, no padding, only single space field separator.

ps ax --no-headers -o pid:1,vsz:1,cmd 3443 24600 -bash 8419 0 [xfsalloc] 8420 0 [xfs_mru_cache] 8602 489316 /usr/sbin/apache2 -k start 12821 497240 /usr/sbin/apache2 -k start 12824 497132 /usr/sbin/apache2 -k start 

Pid and vsz given 10 char width, 1 space field separator.

ps ax --no-headers -o pid:10,vsz:10,cmd 3443 24600 -bash 8419 0 [xfsalloc] 8420 0 [xfs_mru_cache] 8602 489316 /usr/sbin/apache2 -k start 12821 497240 /usr/sbin/apache2 -k start 12824 497132 /usr/sbin/apache2 -k start 
oldpid=12824 echo "PID: $" echo "Command: $(ps -ho cmd $)" 

Another way if you must use cut command

ps axu | grep [j]boss |awk '$1=$1'|cut -d' ' -f5 

In Solaris, replace awk with nawk or /usr/xpg4/bin/awk

Читайте также:  Linux desktop at work

I still like the way Perl handles fields with white space.
First field is $F[0].

$ ps axu | grep dbus | perl -lane 'print $F[4]' 

My approach is to store the PID to a file in /tmp, and to find the right process using the -S option for ssh . That might be a misuse but works for me.

#!/bin/bash TARGET_REDIS=$ PROXY="proxy.somewhere.com" LOCAL_PORT=$ if [ "$1" == "stop" ] ; then kill `cat /tmp/sshTunel$-pid` exit fi set -x ssh -f -i ~/.ssh/aws.pem centos@$PROXY -L $LOCAL_PORT:$TARGET_REDIS:6379 -N -S /tmp/sshTunel$LOCAL_PORT ## AWS DocService dev, DNS alias # SSH_PID=$! ## Only works with & SSH_PID=`ps aux | grep sshTunel$ | grep -v grep | awk ''` echo $SSH_PID > /tmp/sshTunel$-pid 

Better approach might be to query for the SSH_PID right before killing it, since the file might be stale and it would kill a wrong process.

Источник

Shell linux cut command multiple space delimiter

Some versions support an option, usually , that means ‘ignore blank fields’ or, equivalently, allow multiple separators between fields. If you’re using bash, you can have this: Or use (can be builtin or an external binary): You can also verify your output with : Solution 2: I can’t really understand your question, but would suggest you use to change your Control-As into something else more workable and maybe then change them back when you are finished: Solution: This could work for you Solution 1: The job of replacing multiple delimiters with just one is left to : translates or deletes characters, and is perfectly suited to prepare your data for to work properly.

In shell script, can I specify more than one output delimiter in cut?

cut supports a single delimiter at a time, but you can easily replace this with a simple sed or Awk script.

Coincidentally the cat is useless.

$ sed -r 's/.(.).(.)(.)(.)/\1|\2.\3|\4./' file.txt bcd|klm.no|pq.rst 

Two cut s (suggested by Gilez ‘s comment to OP):

cut -c2-4,11-15,16-20 --output-delimiter='|' file.txt | \ cut -c1-7,8-13,14-15 --output-delimiter='.' 

Given such regular input, tr can also «answer» this with no pipe:

Note: tr can’t insert, so it sounds impossible. Generally is is impossible, but the cheat is to include the answer in SET2 , and squeeze out the rest.

Space as delimiter in cut command Code Example, # taking the 2nd value (after space character) awk ‘‘ or tr -s ‘ ‘ | cut -d ‘ ‘ -f 2

Cut command with delimiter Control-A

^A is character number 1 in the ASCII table a.k.a Start of Heading character. If you’re using bash, you can have this:

Or use printf (can be builtin or an external binary):

CTRL_A=$(printf '\x01') cut -f 2-8 -d "$CTRL_A" 

You can also verify your output with hexdump :

I can’t really understand your question, but would suggest you use tr to change your Control-As into something else more workable and maybe then change them back when you are finished:

Using tab delimiter in Cut in Unix Shell Scripting, The default field delimiter for cut is the tab character, so there’s no need to further specify this. If the delimiter is actually a space, use. cut -d ‘ ‘ -f 1 input.txt. If it turns out that there are multiple tabs and/or spaces, use awk: awk ‘ < print $1 >‘ input.txt. The shell loop is not necessary for this operation, regardless …

Читайте также:  Linux directory with many files

Linux, How to using cut command with delimiters double quote?

 echo '"4.027.160921.1";' | cut -d'"' -f 2 

Shell — How do I split a string on a delimiter in Bash?, Split string based on delimiter in shell. If you can’t use bash, or if you want to write something that can be used in many different shells, you often can’t use bashisms— and this includes the arrays we’ve been using in the solutions above. However, we don’t need to use arrays to loop over «elements» of a string.

Unix — Need to cut a file which has multiple blanks as delimiter

The job of replacing multiple delimiters with just one is left to tr :

tr translates or deletes characters, and is perfectly suited to prepare your data for cut to work properly.

-s, --squeeze-repeats replace each sequence of a repeated character that is listed in the last specified SET, with a single occurrence of that character 

It depends on the version or implementation of cut on your machine. Some versions support an option, usually -i , that means ‘ignore blank fields’ or, equivalently, allow multiple separators between fields. If that’s supported, use:

If not (and it is not universal — and maybe not even widespread, since neither GNU nor MacOS X have the option), then using awk is better and more portable.

You need to pipe the output of awk into your loop, though:

awk -F' ' '' $/test_file.txt | while read readline do read_int=`echo "$readline"` cnt_exc=`grep "$read_int" $/file1.txt| wc -l` if [ $cnt_exc -gt 0 ] then int_1=0 else int_2=0 fi done 

The only residual issue is whether the while loop is in a sub-shell and and therefore not modifying your main shell scripts variables, just its own copy of those variables.

With bash, you can use process substitution:

while read readline do read_int=`echo "$readline"` cnt_exc=`grep "$read_int" $/file1.txt| wc -l` if [ $cnt_exc -gt 0 ] then int_1=0 else int_2=0 fi done < <(awk -F' ' '' $/test_file.txt) 

This leaves the while loop in the current shell, but arranges for the output of the command to appear as if from a file.

The blank in $ is not normally legal — unless it is another Bash feature I’ve missed out on; you also had a typo ( Directoty ) in one place.

Other ways of doing the same thing aside, the error in your program is this: You cannot redirect from ( < ) the output of another program. Turn your script around and use a pipe like this:

awk -F' ' '< print $2 >' $/test_file.txt | while read readline 

Besides, the use of «readline» as a variable name may or may not get you into problems.

Cut: can we set multiple spaces as the delimiter?, As others have stated, cut can’t do it alone (and awk is the best choice, because it’s the only tool required). If you still want to use cut, you can combine it with tr, however: tr -s ‘ ‘

Источник

Оцените статью
Adblock
detector