Linux if string length

Unix / Linux — Shell String Operators Example

The following string operators are supported by Bourne Shell.

Assume variable a holds «abc» and variable b holds «efg» then −

Operator Description Example
= Checks if the value of two operands are equal or not; if yes, then the condition becomes true. [ $a = $b ] is not true.
!= Checks if the value of two operands are equal or not; if values are not equal then the condition becomes true. [ $a != $b ] is true.
-z Checks if the given string operand size is zero; if it is zero length, then it returns true. [ -z $a ] is not true.
-n Checks if the given string operand size is non-zero; if it is nonzero length, then it returns true. [ -n $a ] is not false.
str Checks if str is not the empty string; if it is empty, then it returns false. [ $a ] is not false.

Example

Here is an example which uses all the string operators −

#!/bin/sh a="abc" b="efg" if [ $a = $b ] then echo "$a = $b : a is equal to b" else echo "$a = $b: a is not equal to b" fi if [ $a != $b ] then echo "$a != $b : a is not equal to b" else echo "$a != $b: a is equal to b" fi if [ -z $a ] then echo "-z $a : string length is zero" else echo "-z $a : string length is not zero" fi if [ -n $a ] then echo "-n $a : string length is not zero" else echo "-n $a : string length is zero" fi if [ $a ] then echo "$a : string is not empty" else echo "$a : string is empty" fi

The above script will generate the following result −

abc = efg: a is not equal to b abc != efg : a is not equal to b -z abc : string length is not zero -n abc : string length is not zero abc : string is not empty

The following points need to be considered while using the operator −

  • There must be spaces between the operators and the expressions. For example, 2+2 is not correct. It should be written as 2 + 2.
  • if. then. else. fi statement is a decision-making statement which has been explained in the next chapter.

Источник

Verify the length of a variable

I’ve to verify the length of variable read (my script limit to five the characters inserted), I think about something like this:

#!/bin/bash read string check=$ echo $check if [ $check -ge 5 ]; then echo "error" ; exit else echo "done" fi 

Your script works correctly in standard POSIX /bin/sh . You should consider changing the shebang line to #!/bin/sh so that it will be more portable and run in environments where bash isn’t available. Plus, /bin/sh might be a more lightweight shell like dash which isn’t burdened with features meant for interactive use.

@Celada, true though in this case, dash ‘s $ <#string>would give you the length in number of bytes instead of characters.

Читайте также:  Linux firewall with vpn

2 Answers 2

#!/bin/bash read string if [ $ -ge 5 ]; then echo "error" ; exit else echo "done" fi 

And if you have no problem on trading more elegance in favor of being shorter, you can have a script with 2 lines less:

#!/bin/bash read string [ $ -ge 5 ] && echo "error" || echo "done" 

You could use double brackets if you think it is safer. Explanation here.

A Bourne-compatible alternative ( $ is POSIX but not Bourne (not that you’re likely to ever come across a Bourne shell these days)):

case $string in . *) echo >&2 Too long; exit 1;; *) echo OK esac 

Note that for both $ and . , whether it will be the number of bytes or characters will depend on the shell. Generally (and it’s required by POSIX), it is the number of characters. But for some shells like dash that are not multi-byte aware, it will be bytes instead.

With mksh , you need set -o utf8-mode (in UTF-8 locales) for it to understand multi-byte characters:

$ string=€€€ bash -c 'echo "$"' 3 $ string=€€€ dash -c 'echo "$"' 9 $ string=€€€ mksh -c 'echo "$"' 9 $ string=€€€ mksh -o utf8-mode -c 'echo "$"' 3 $ locale charmap UTF-8 

Источник

Length of string in bash

To get the length of a string stored in a variable, say:

To confirm it was properly saved, echo it:

You can also use it directly in other parameter expansions — for example in this test I check that $rulename starts with the $RULE_PREFIX prefix: [ «$>» == «$RULE_PREFIX» ]

@lerneradams see Bash reference manual →3.5.3 Shell Parameter Expansion on $ : The length in characters of the expanded value of parameter is substituted.

Edit 2023-02-13: Use of printf %n instead of locales.

UTF-8 string length

In addition to fedorqui’s correct answer, I would like to show the difference between string length and byte length:

myvar='Généralités' chrlen=$ oLang=$LANG oLcAll=$LC_ALL LANG=C LC_ALL=C bytlen=$ LANG=$oLang LC_ALL=$oLcAll printf "%s is %d char len, but %d bytes len.\n" "$" $chrlen $bytlen 
Généralités is 11 char len, but 14 bytes len. 

you could even have a look at stored chars:

myvar='Généralités' chrlen=$ oLang=$LANG oLcAll=$LC_ALL LANG=C LC_ALL=C bytlen=$ printf -v myreal "%q" "$myvar" LANG=$oLang LC_ALL=$oLcAll printf "%s has %d chars, %d bytes: (%s).\n" "$" $chrlen $bytlen "$myreal" 
Généralités has 11 chars, 14 bytes: ($'G\303\251n\303\251ralit\303\251s'). 

Nota: According to Isabell Cowan’s comment, I’ve added setting to $LC_ALL along with $LANG .

Same, but without having to play with locales

I recently learn %n format of printf command (builtin):

myvar='Généralités' chrlen=$ printf -v _ %s%n "$myvar" bytlen printf "%s is %d char len, but %d bytes len.\n" "$" $chrlen $bytlen Généralités is 11 char len, but 14 bytes len. 

Syntax is a little counter-intuitive, but this is very efficient! (further function strU8DiffLen is about 2 time quicker by using printf than previous version using local LANG=C .)

Length of an argument, working sample

Argument work same as regular variables

showStrLen() < local -i chrlen=$bytlen printf -v _ %s%n "$1" bytlen LANG=$oLang LC_ALL=$oLcAll printf "String '%s' is %d bytes, but %d chars len: %q.\n" "$1" $bytlen $chrlen "$1" > 
showStrLen théorème String 'théorème' is 10 bytes, but 8 chars len: $'th\303\251or\303\250me' 

Useful printf correction tool:

for string in Généralités Language Théorème Février "Left: ←" "Yin Yang ☯";do printf " - %-14s is %2d char length\n" "'$string'" $ done - 'Généralités' is 11 char length - 'Language' is 8 char length - 'Théorème' is 8 char length - 'Février' is 7 char length - 'Left: ←' is 7 char length - 'Yin Yang ☯' is 10 char length 

For this, here is a little function:

for string in Généralités Language Théorème Février "Left: ←" "Yin Yang ☯";do strU8DiffLen "$string" printf " - %-$((14+$?))s is %2d chars length, but uses %2d bytes\n" \ "'$string'" $ $(($+$?)) done - 'Généralités' is 11 chars length, but uses 14 bytes - 'Language' is 8 chars length, but uses 8 bytes - 'Théorème' is 8 chars length, but uses 10 bytes - 'Février' is 7 chars length, but uses 8 bytes - 'Left: ←' is 7 chars length, but uses 9 bytes - 'Yin Yang ☯' is 10 chars length, but uses 12 bytes 

Unfortunely, this is not perfect!

But there left some strange UTF-8 behaviour, like double-spaced chars, zero spaced chars, reverse deplacement and other that could not be as simple.

Читайте также:  Linux mint ждущий режим

Have a look at diffU8test.sh or diffU8test.sh.txt for more limitations.

@F.Hauri But, it none the less follows that on some systems your solution will not work, because it leaves LC_ALL alone. It might work fine on default installs of Debian and it’s derivatives, but on others (like Arch Linux) it will fail to give the correct byte length of the string.

@F8ER In order to prevent forks. For sample: Trying to replace return by echo , adding OFF=$(strU8DiffLen. ) and replacing ? by OFF in last sample take 10ms in my host, where published proposition do the jobs in 1ms. (10x faster!)

I wanted the simplest case, finally this is a result:

echo -n 'Tell me the length of this sentence.' | wc -m; 36 

sorry mate 🙁 This is bash. the cursed hammer that sees everything as a nail, particularly your thumb. ‘Tell me the length of this sentence.’ contains 36 characters. echo » | wc -m => 1 . You’d need to use -n : echo -n » | wc -m => 0 . in which case it’s a good solution 🙂

MYSTRING="abc123" MYLENGTH=$(printf "%s" "$MYSTRING" | wc -c) 
  • wc -c or wc —bytes for byte counts = Unicode characters are counted with 2, 3 or more bytes.
  • wc -m or wc —chars for character counts = Unicode characters are counted single until they use more bytes.

this handles something like mylen=$(printf «%s» «$HOME/.ssh» | wc -c) whereas the accepted solution fails and you need to myvar=$HOME/.ssh first.

This isn’t any better than $ <#var>. You still need LC_ALL / LANG set to an UTF-8 locale, otherwise -m will return byte count.

In response to the post starting:

If you want to use this with command line or function arguments.

There might be the case where you just want to check for a zero length argument and have no need to store a variable. I believe you can use this sort of syntax:

if [ -z "$1" ]; then #zero length argument else #non-zero length fi 

See GNU and wooledge for a more complete list of Bash conditional expressions.

If you want to use this with command line or function arguments, make sure you use size=$ instead of size=$ . The second one may be more instinctual but is incorrect syntax.

Part of the problem with «you can’t do » is that, that syntax being invalid, it’s unclear what a reader should interpret it to mean. size=$ is certainly valid.

Читайте также:  Geforce 9500 gt драйвер linux

It isn’t. # isn’t replacing the $ — the $ outside the braces is still the expansion operator. The # is the length operator, as always.

I’ve fixed this answer since it is a useful tip but not an exception to the rule — it follows the rule exactly, as pointed out by @CharlesDuffy

Using your example provided

#KISS (Keep it simple stupid) size=$ echo $size 

@Angel The question was about setting a variable to the output of the length command, and this question answers that.

Here is couple of ways to calculate length of variable :

echo $ echo -n $VAR | wc -m echo -n $VAR | wc -c printf $VAR | wc -m expr length $VAR expr $VAR : '.*' 

and to set the result in another variable just assign above command with back quote into another variable as following:

otherVar=`echo -n $VAR | wc -m` echo $otherVar 

I know that the Q and A’s are old enough, but today I faced this task for first time. Usually I used the $ combination, but it fails with unicode: most text I process with the bash is in Cyrillic. Based on @atesin’s answer, I made short (and ready to be more shortened) function which may be usable for scripting. That was a task which led me to this question: to show some message of variable length in pseudo-graphics box. So, here it is:

$ cat draw_border.sh #!/bin/sh #based on https://stackoverflow.com/questions/17368067/length-of-string-in-bash border() < local BPAR="$1" local BPLEN=`echo $BPAR|wc -m` local OUTLINE=\|\ "$1"\ \| # line below based on https://www.cyberciti.biz/faq/repeat-a-character-in-bash-script-under-linux-unix/ # comment of Bit Twiddler Jun 5, 2021 @ 8:47 local OUTBORDER=\+`head -c $(($BPLEN+1))border "Généralités" border 'А вот еще одна '$LESSCLOSE' ' border "pure ENGLISH" 

And what this sample produces:

$ draw_border.sh +-------------+ | Généralités | +-------------+ +----------------------------------+ | А вот еще одна /usr/bin/lesspipe | +----------------------------------+ +--------------+ | pure ENGLISH | +--------------+ 

First example (in French?) was taken from someone’s example above. Second one combines Cyrillic and the value of some variable. Third one is self-explaining: only 1s 1/2 of ASCII chars.

I used echo $BPAR|wc -m instead of printf . in order to not rely on if the printf is buillt-in or not.

Above I saw talks about trailing newline and -n parameter for echo . I did not used it, thus I add only one to the $BPLEN . Should I use -n , I must add 2.

To explain the difference between wc -m and wc -c , see the same script with only one minor change: -m was replaced with -c

$ draw_border.sh +----------------+ | Généralités | +----------------+ +---------------------------------------------+ | А вот еще одна /usr/bin/lesspipe | +---------------------------------------------+ +--------------+ | pure ENGLISH | +--------------+ 

Accented characters in Latin, and most of characters in Cyrillic are two-byte, thus the length of drawn horizontals are greater than the real length of the message. Hope, it will save some one some time 🙂

p.s. Russian text says «here is one more»

#!/bin/sh #based on https://stackoverflow.com/questions/17368067/length-of-string-in-bash border() < # line below based on https://www.cyberciti.biz/faq/repeat-a-character-in-bash-script-under-linux-unix/ # comment of Bit Twiddler Jun 5, 2021 @ 8:47 local OUTBORDER=\+`head -c $(( $(echo "$1"|wc -m) +1))border "Généralités" border 'А вот еще одна '$LESSCLOSE' ' border "pure ENGLISH" 

In order to not clutter the code with repetitive OUTBORDER’s drawing, I put the forming of OUTBORDER into separate command

Источник

Оцените статью
Adblock
detector