Linux shell get string

Extract substring in Bash

Given a filename in the form someletters_12345_moreleters.ext , I want to extract the 5 digits and put them into a variable. So to emphasize the point, I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable. I am very interested in the number of different ways that this can be accomplished.

Most of the answers don’t seem to answer your question because the question is ambiguous. «I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters». By that definition abc_12345_def_67890_ghi_def is a valid input. What do you want to happen? Let’s assume there is only one 5 digit sequence. You still have abc_def_12345_ghi_jkl or 1234567_12345_1234567 or 12345d_12345_12345e as valid input based on your definition of input and most of the answers below will not handle this.

This question has an example input that’s too specific. Because of that, it got a lot of specific answers for this particular case (digits only, same _ delimiter, input that contains the target string only once etc.). The best (most generic and fastest) answer has, after 10 years, only 7 upvotes, while other limited answers have hundreds. Makes me lose faith in developers 😞

Clickbait title. The meaning of substring function is well established and means getting a part by numerical positions. All the other things, (indexOf, regex) are about search. A 3-month older question that asks precisely about substring in bash, answered the same, but w/o «substring» in the title. Not misleading, but not properly named. Results: the answer about built-in function in most voted question buried 5 screens down with activity sorting; older and more precise question, marked duplicate. stackoverflow.com/questions/219402/…

Well I’ll note that I’m well aware of the efficiency and practicality of regexes in variable expansions, but I came here because I forgot how to get the ith-through-jth index substring of a bash string variable. And based on the upvote count on answers this is why most of us came here. It doesn’t really matter that the OP’s specific question turned out to more elegantly be answered by the regex implementation.

26 Answers 26

If a is constant, the following parameter expansion performs substring extraction:

where 12 is the offset (zero-based) and 5 is the length

If the underscores around the digits are the only ones in the input, you can strip off the prefix and suffix (respectively) in two steps:

tmp=$ # remove prefix ending in "_" b=$ # remove suffix starting with "_" 

If there are other underscores, it’s probably feasible anyway, albeit more tricky. If anyone knows how to perform both expansions in a single expression, I’d like to know too.

Читайте также:  How to mount cd rom linux

Both solutions presented are pure bash, with no process spawning involved, hence very fast.

@jonnyB, Some time in the past that worked. I am told by my coworkers it stopped, and they changed it to be a sed command or something. Looking at it in the history, I was running it in a sh script, which was probably dash. At this point I can’t get it to work anymore.

JB, you should clarify that «12» is the offset (zero-based) and «5» is the length. Also, +1 for @gontard ‘s link that lays it all out!

While running this inside a script as «sh run.sh», one might get Bad Substitution error. To avoid that, change permissions for run.sh (chmod +x run.sh) and then run the script as «./run.sh»

The offset param can be negative too, BTW. You just have to take care not to glue it to the colon, or bash will interpret it as a :- “Use Default Values” substitution. So $ yields the 5 characters 12 characters from the end, and $ the 7 characters between end-12 and end-5.

echo 'someletters_12345_moreleters.ext' | cut -d'_' -f 2 
INPUT='someletters_12345_moreleters.ext' SUBSTRING=$(echo $INPUT| cut -d'_' -f 2) echo $SUBSTRING 

You should properly use double quotes around the arguments to echo unless you know for sure that the variables cannot contain irregular whitespace or shell metacharacters. See further stackoverflow.com/questions/10067266/…

just try to use cut -c startIndx-stopIndx

The problem is that the input is dynamic since I also use the pipe to get it so it’s basically. git log —oneline | head -1 | cut -c 9-(end -1)

This can be done with cut if break into two parts as line= git log —oneline | head -1` && echo $line | cut -c 9-$(($<#line>-1))` but in this particular case, might be better to use sed as git log —oneline | head -1 | sed -e ‘s/^[a-z0-9]* //g’

Generic solution where the number can be anywhere in the filename, using the first of such sequences:

number=$(echo "$filename" | egrep -o '[[:digit:]]' | head -n1) 

Another solution to extract exactly a part of a variable:

If your filename always have the format stuff_digits_. you can use awk:

number=$(echo "$filename" | awk -F _ '< print $2 >') 

Yet another solution to remove everything except digits, use

number=$(echo "$filename" | tr -cd '[[:digit:]]') 

My requirement was to remove few characters at last fileName=»filename_timelog.log» number=$ echo $number O/P: filename

echo $filename | is itself broken — it should be echo «$filename» | . . See I just assigned a variable, but echo $variable shows something else!. Or, for a bash-only more-efficient approach (at least, more efficient if your TMPDIR is stored on tmpfs, as is conventional on modern distros),

FN=someletters_12345_moreleters.ext [[ $ =~ _([[:digit:]])_ ]] && NUM=$

Regular Expressions (RE): _([[:digit:]])_

  • _ are literals to demarcate/anchor matching boundaries for the string being matched
  • () create a capture group
  • [[:digit:]] is a character class, i think it speaks for itself
  • means exactly five of the prior character, class (as in this example), or group must match
Читайте также:  Linux copy all txt files

In english, you can think of it behaving like this: the FN string is iterated character by character until we see an _ at which point the capture group is opened and we attempt to match five digits. If that matching is successful to this point, the capture group saves the five digits traversed. If the next character is an _ , the condition is successful, the capture group is made available in BASH_REMATCH , and the next NUM= statement can execute. If any part of the matching fails, saved details are disposed of and character by character processing continues after the _ . e.g. if FN where _1 _12 _123 _1234 _12345_ , there would be four false starts before it found a match.

This is the most generic answer indeed, and should be accepted one. It works for a regular expression, not just a string of characters at a fixed position, or between the same delimiter (which enables cut ). It also doesn’t rely on executing an external command.

This is great! I adapted this to use different start/stop dilimeters (replace the _) and variable length numbers (. for <5>) for my situation. Can someone break down this black magic and explain it?

@UrsineRaven Personal preference. I generally prefer using POSIX class-names because I think they make regular expressions more readable.

In case someone wants more rigorous information, you can also search it in man bash like this

$ man bash [press return key] /substring [press return key] [press "n" key] [press "n" key] [press "n" key] [press "n" key] 
$ $ Substring Expansion. Expands to up to length characters of parameter starting at the character specified by offset. If length is omitted, expands to the substring of parameter start‐ ing at the character specified by offset. length and offset are arithmetic expressions (see ARITHMETIC EVALUATION below). If offset evaluates to a number less than zero, the value is used as an offset from the end of the value of parameter. Arithmetic expressions starting with a - must be separated by whitespace from the preceding : to be distinguished from the Use Default Values expansion. If length evaluates to a number less than zero, and parameter is not @ and not an indexed or associative array, it is interpreted as an offset from the end of the value of parameter rather than a number of characters, and the expan‐ sion is the characters between the two offsets. If parameter is @, the result is length positional parameters beginning at off‐ set. If parameter is an indexed array name subscripted by @ or *, the result is the length members of the array beginning with $. A negative offset is taken relative to one greater than the maximum index of the specified array. Sub‐ string expansion applied to an associative array produces unde‐ fined results. Note that a negative offset must be separated from the colon by at least one space to avoid being confused with the :- expansion. Substring indexing is zero-based unless the positional parameters are used, in which case the indexing starts at 1 by default. If offset is 0, and the positional parameters are used, $0 is prefixed to the list.

Источник

Читайте также:  Обновить линукс до виндовс 10

Get string from file

In my Packages file i have multiple packages. I’m able to check the file if a string is inside, and if so, i would like to get the version of the file.

Package: depictiontest Version: 1.0 Filename: ./debs/com.icr8zy.depictiontest.deb Size: 810 Description: Do not install. Testing Depiction. Name: Depiction Test 

so the above is part of the many similar looking info of a package. Each time i detected if the package exists i would like to get the Version. is there any possible way? btw, this is what i use to get check if the file exists.

if grep -q "$filename" /location/Packages; then #file exists #get file version  

EDIT: Sorry but maybe i wasn't clear in explaining myself, I would already have the Name of the package and would like to extract the Version of that package only. I do not need a loop to get all the Names and Versions. Hope this clears it. 🙂

5 Answers 5

How do you extract the file name in the first place? Why not parse the whole file, then filter out nonexistent file names.

awk '/^Package:/ /^Version:/ /^Filename:/ /^$/' Packages | while read p v f; do test -e "$f" || continue echo "$p $v" done 

This is not robust with e.g. file names with spaces, but Packages files don't have file names with spaces anyway. (Your example filename is nonstandard, though; let's assume it's no worse than this.)

You want to make sure there's an empty line at the end of Packages, or force it with < sed '$/^$/d' Packages; echo; >| awk .

Edit: This assumes a fairly well-formed Packages file, with an empty line between records. If a record lacks one of these fields, the output will repeat the value from the previous record - that's nasty. If there are multiple adjacent empty lines, it will output the same package twice. Etc. If you want robust parsing, I'd switch to Perl or Python, or use a standard Debian tool (I'm sure there must be one).

Источник

Оцените статью
Adblock
detector