Linux sed только цифры

sed extracting group of digits

How can I match a number such that I don’t know the number of digits in a number to be extracted e.g. it can be 2344 in place of 65?

6 Answers 6

$ echo "This is an example: 65 apples" | sed -r 's/^[^0-9]*(7+).*/\1/' 65 

+1, but beware that not all sed support -r and thus cannot use the ‘+’ modifier and must escape the parens.

Why does a regex like [(3*) apple] (sprunge.us/feGV) doesn’t work in sed? It works just fine in python.

so. ^[^0-9]* correspond to everything non-digit at the start of line. 8+ to atleast one digit or more, right?

@AbhijeetRastogi: Since we are using substitution we need to account for the entire line. Any part of the line not accounted for will be part of the output. This won’t be the case if you are using pattern search (not substitution) as in your Python case.

It’s because your first .* is greedy, and your 4* allows 0 or more digits. Hence the .* gobbles up as much as it can (including the digits) and the 7* matches nothing.

echo "This is an example: 65 apples" | sed -n 's/.*\b\(9\+\) apples/\1/p' 

where I forced the 1 to match at least one digit, and also added a word boundary before the digits so the whole number is matched.

However, it’s easier to use grep , where you match just the number:

echo "This is an example: 65 apples" | grep -P -o '1+(?= +apples)' 

The -P means «perl regex» (so I don’t have to worry about escaping the ‘+’).

The -o means «only print the matches».

The (?= +apples) means match the digits followed by the word apples.

Источник

sed удалить цифры в середине слова

Вообще-то, умеет. Даже без -r. Но тогда надо его экранировать.

Какой нафиг pcre? ERE же, sed _без -r_ непортабельно (ибо не BRE, —posix запретит) умеет \+.

Если нужет + — лучше использовать -r, почему — пояснено выше.

Гнутый сед сам по себе (без —posix) непортабелен. Я как-то пытался объяснить одному маководу, почему BSD sed не может работать у него без горы костылей, и предложил два варианта – нормальный с GNU sed -r и более-менее универсальный с GNU sed без -r и горой слешей, в итоге он отринул и мои варианты, и других седобогов, и предпочёл воспользоваться простынёй пихтонолюба.

Читайте также:  Сколько места занимают файлы linux

Гнутый сед сам по себе (без —posix) непортабелен. Я как-то пытался объяснить одному маководу,

gnu sed конечно непортабелен, но 95% фишек с —posix НЕ работают. Лучше уж питон тогда.

Не, ему сказали, что нужен гнутый, и он его вроде даже поставил. А вот насчёт как и откуда он его вызывал я не уверен, потому что он встраивал его в что-то вроде makefile в свою xcode.

Deleted ( 14.05.13 12:43:34 MSK )
Последнее исправление: fargred 14.05.13 12:44:51 MSK (всего исправлений: 1)

Я имел в виду потенциальные проблемы с _ и нелатинскими буквами.

То что у ТС pcre мимо увидел только сейчас.

Гнутый сед сам по себе (без —posix) непортабелен

Кто с этим спорит? Тезис был в том, что явное указание -r лучше, чем \+. Или по-вашему лучше писать /bin/sh в скрипте с башизмами?)

Для вас придумали [:alpha:], а вы лезете с \B, несоотвествующим условию.

яхз, как оно в маках с sed.

блин, не потому неправильно. \B есть и работает в sed

$ echo "aaa123bbb" | sed 's/\B/!!/g' a!!a!!a!!1!!2!!3!!b!!b!!b 

Что ты им хотел добиться-то?

$ echo "aaa123bbb z123 z123x ф666Ы БСЛ10" | sed -r 's/([[:alpha:]])([[:digit:]]+)([[:alpha:]])/\1\3/g' aaabbb z123 zx фЫ БСЛ10 

эх, \B это неграницаслова. «слово» это буквы/цифры/подчерк. Т.е. дырка между \w\w или дырка между \W\W.

это не мне надо рассказывать, а тому кто обозвал это `word boundary` 🙂

это не мне надо рассказывать, а тому кто обозвал это `word boundary`

ты не дочитал. Ну ничего, читай с начала и до конца:

`\w' Matches any "word" character. A "word" character is any letter or digit or the underscore character. `\W' Matches any "non-word" character. `\b' Matches a word boundary; that is it matches if the character to the left is a "word" character and the character to the right is a "non-word" character, or vice-versa. `\B' Matches everywhere but on a word boundary; that is it matches if the character to the left and the character to the right are either both "word" characters or both "non-word" characters. 

Источник

Extract numbers from a string using sed and regular expressions

Another question for the sed experts. I have a string representing an pathname that will have two numbers in it. An example is:

./pentaray_run2/Trace_220560.dat 

I need to extract the second of these numbers — ie 220560 I have (with some help from the forums) been able to extract all the numbers together (ie 2220560) with:

Читайте также:  Установка kali linux второй системой

But what I’m after is the second number!! Any help much appreciated. PS the number I’m after is always the second number in the string.

4 Answers 4

kent$ echo "./pentaray_run2/Trace_220560.dat"|sed -r 's/.*_(9*)\..*/\1/g' 220560 

Great works a treat. I guess _ in there means to look for numbers only after the underscore? In this instance I can always expect an underscore so this will work. Which actual bit of the expression does that is it .*_ Stackoverflow really is such a fantastic resource — I have been puzzling at this for hours. Out of interest do you think there is a way to use the \1 at the end — perhaps extract all numbers (contiguous digits) as substrings and ask for the second one. This could be useful to me and others in the future?

You can extract the last numbers with this:

It is easier to think this backwards:

  1. From the end of the string, match zero or more non-digit characters
  2. Match (and capture) one or more digit characters
  3. Match at least one non-digit character
  4. Match all the characters to the start of the string

Part 3 of the match is where the «magic» happens, but it also limits your matches to have at least a non-digit before the number (ie. you can’t match a string with only one number that is at the start of the string, although there is a simple workaround of inserting a non-digit to the start of the string).

The magic is to counter-act the left-to-right greediness of the .* (part 4). Without part 3, part 4 would consume all it can, which includes the numbers, but with it, matching makes sure that it stops in order to allow at least a non-digit followed by a digit to be consumed by parts 1 and 2, allowing the number to be captured.

Источник

sed extract digits

The problem is that the . in .* will match digits as well as non-digits, and it keeps on matching as long as it can — that is as long as there’s one digit left unconsumed that can match the 2 .

Instead of extracting digits, just delete non-digits:

echo hgdfjg678gfdg kjg45nn | sed 's/[^0-9]//g' 
echo hgdfjg678gfdg kjg45nn | tr -d -c 0-9 

In sed, probably want to replace non-digits with a single space, so the original groupings of digits can be maintained.

Note, that sed ‘s/[^0-9]//g’ will not cut off new line characters (important, when you filtering multiline strings), however tr -d -c 0-9 will do

Читайте также:  Linux link time reference

You may use grep with option -o for this:

$ echo hgdfjg678gfdg kjg45nn | grep -E -o "7+" 678 45 
$ echo hgdfjg678gfdg kjg45nn | tr -d [a-z] 678 45 

.* in sed is greedy. And there are no non-greedy option AFAIK.
(You must use [^0-9]* in this case for non-greedy matching. But this works only once, so you will get only 678 without 45 .)

If you must use only sed , it would not be easy to get the result.
I recommend to use gnu’s grep

$ echo hgdfjg678gfdg kjg45nn | grep -oP '\d+' 678 45 

If you really want to stick to sed , this would be one of many possible answers.

$ echo hgdfjg678gfdg kjg45nn | \ sed -e 's/\([0-9^]\)\([^0-9]\)/\1\n\2/g' | \ sed -n 's/[^0-9]*\(5\+\).*/\1/p’ 678 45 

Источник

sed, capture only the number

I cannot think on a one-liner, so this is my approach:

while read line do grep -Po '(?<=A=)\d+' <<< "$line" || echo "0" done < file 

I am using the look-behind grep to get any number after A= . In case there is none, the || (else) will print a 0 .

This prepends A=0 to the line before substituting.

Ah, I like your solution better. Is the \< thing making sure you get the last 'A='? What does it actually do?

@cluracan, We always get the last A= because .* is greedy. That's why we prepend A=0 instead of appending.

kent$ echo "some text A=10 some text some more text A more text some other text A=30 other text"|awk -F'A=' 'NF==1' 10 0 30 

This might work for you (GNU sed):

Look for the string A= followed by one or more numbers and if it occurs replace the whole line by the back reference. Otherwise replace the whole of the line by 0 .

+1 this is the only one that is 100% sed, is accurate enough and actually works (and also quite short). b.t.w. what does the ;t; between the two replacements ? (oh, btw, you even can scrap one of the 4 from the pattern) 🙂

@thom t checks to see if the last substitute command (if any) was a success and if so bails out. As there is no goto place name following the t command it ends the processing of the line. If the substitution was not a success (or no substitute has been done) it replaces the whole line with a 0 .

Источник

Оцените статью
Adblock
detector