Regex in sed linux

Regular Expression in Linux command sed

Short answer is sed is being greedy. Since you put .* it’s grabbing the first two ‘apk’ combinations as part of .* and so only recognizes the last apk as the final combination.

3 Answers 3

First, to enable the regular expressions you’re familiar with in sed , you need to use the -r switch (sed -r . ):

echo $all_apk_file | sed -r 's/(.*apk )/TEST/g' # returns TESTy m.apk 

Look at what it returns: TESTy m.apk . This is because the .* is greedy, so it matches as much as it possibly can. That is, the .* matches a 1 2.apk x , and you’ve said you want to replace .*apk , being a 1 2.apk x.apk with ‘TEST’, resulting in TESTy m.apk (note the following space after the ‘.apk’ in your regular expression, which is why the match doesn’t extend all the way to the last ‘.apk’, which has no space following it).

Usually one could change the .* to .*? to make it non-greedy, but this behaviour is not supported in sed.

So, to fix it you just have to make your regex more restrictive.

It is hard to tell what you want to do — remove the first three words where the third ends in ‘.apk’ and replace with ‘TEST’? In that case, one could use the regular expression:

in combination with the ‘i’ switch (case insensitive).

You will have to give your logic for deciding what to remove (first three words, any number of words up to the first ‘.apk’ word, etc) in order for us to help you further with the regex.

Secondly, you’ve put the ‘g’ switch in your regex. This means that all matching patterns will be replaced, and you seem to only want the first to be replaced. So remove the ‘g’ switch.

Finally, all of thse in combination:

echo $all_apk_file | sed -r 's/[a-z0-9]+ +[a-z0-9]+ +[a-z0-9]+\.apk/TEST/i' # TEST x.apk y m.apk 

This can be done in perl with echo $all_apk_file | perl -pe ‘s/^(.*?\.apk)/TEST/’ if switching to perl is an option.

echo "$all_apk_file" | sed 's/apk/\n/;s/.*\n/TEST/' TEST x.apk y m.apk 

As to why your regexp did not work see @mathematical.coffee and @Jonathan Leffler’s excellent explanations.

s/apk/\n/ is synonymous with s/apk/\n/1 which means replace the first occurence of apk with \n . As sed uses the \n as a record separator we know that it cannot occur in any initial strings passed to the sed commands. With these two facts under our belts we can split strings.

Читайте также:  Командная оболочка операционной системы linux

N.B. If you wanted to replace upto the second apk then s/apk/\n/2 would fit the bill. Of course for the last occurence of apk then .*apk comes into play.

One part of the problem is that in regular sed , the () and <> are ordinary characters in patterns until escaped with backslashes. Since there are no parentheses in the variable’s value, the regex never matches. With GNU sed , you can also enable extended regular expressions with the -r flag. If you fix that problem, you will then run into the problem that .* is greedy, and the g modifier actually doesn’t change anything:

$ echo $all_apk_file | sed 's/\(.*apk \)\/TEST/g' TESTy m.apk $ echo $all_apk_file | sed -r 's/(.*apk )/TEST/g' TESTy m.apk $ echo $all_apk_file | sed -r 's/(.*apk )/TEST/' TESTy m.apk $ 

It only stops there because there isn’t a space after m.apk in the echoed value of the variable.

The issue now is: what is it that you want replaced? It sounds like ‘everything up to and including the first occurrence of apk at the end of a word. This is probably most easily done with trailing context or non-greedy matching as found in Perl regular expressions. If switching to Perl is an option, do so. If not, it is not trivial in normal sed regular expressions.

$ echo $all_apk_file | sed 's/^[^.]* [^.][^.]*\.apk /TEST /' TEST x.apk y m.apk $ 

This looks for anything without dots in it, followed by a blank, followed by no dots again, and .apk ; this means that the first dot allowed is the one in 2.apk . It works for the sample data; it would not work if the variable contained:

all_apk_file="a 1.2 2.apk m.apk y.apk 37" 

You’ll need to tune this to meet your requirements.

Источник

How to Use Sed Command with Regex

The sed command has longlist of supported operations that can be performed to ease the process of editing text files. It allows the users to apply the expressions that are usually used in programming languages; one of the core supported expressions is Regular Expression (regex).

The regex is used to manage text inside text files, with the help of regex a pattern that consists of string and these patterns are then used to match or locate the text. The regex is widely used in programming languages such as Python, Perl, Java and its support is also available for command line programs such as grep and several text editors too like sed.

Читайте также:  Print server linux cups

Although the simple searching and sorting can be performed using sed command, using regex with sed enables advanced level matching in text files. The regex works on the directions of characters used; these characters guide the sed command to perform the directed tasks. In this article, we will demonstrate the use of regex with sed command and followed by the examples that will show the application of regex.

How to use regex in sed

This section is the core part of the writing that contains the detailed explanation of Regular Expressions in sed context: let’s start with it

Matching the word

If you want to find the word that exactly matches the characters, then you must specify the exact characters that matches the word: For instance, we have a text file that contains the list of laptop manufacturers named as “laptops.txt”:

Let’s get the content of the file by using the command mentioned below:

Use the following command will help to get the “ACER” word:

Matching all words start with specific character

This regex support contains multiple actions that are described in this section:

If you want to search and match the words that starts and end with a specific character, then you must use “*” sign in between characters to do so; but it is noticed that the “*” symbol prints the words that start with single or multiple “A’s” but with single “R”: For instance, the command written below will print all the words that starts with single or multiple “A” and ends with single “R”:

To match the word that ends with specific character or that contains only specified character: the command written below will display the words with character “P” or the exact word “HP”:

Matching the words with specific character

It is noticed that you can get the words that contain any character with the help of sed command: For instance, the command mentioned below will find the words that contains one of these characters “A”, “H” or “D”:

Matching the string

You can use sed command with regular expressions to print the strings; you can either print all the strings or you can also target a specific string by using the starting or ending character of that string:

Читайте также:  Linux для hp proliant

we have used “file.txt‘ to use it as an example in this section; this file contains the following content:

For instance, if you want to print all the strings; the following command will help you in this regard:

If you want to get all the strings that start with character “a” then you have to use carrot symbol (^) to indicate the starting character of the string.

The command mentioned below till print the strings that start with “@”:

Moreover, if you want to get only those strings that end with a specific character then you have to use “$” with that character. For instance, the command written here will print the strings that ends with “#”:

Matching the blank lines

The sed command regex support allows the user to print/delete the empty lines by using “/^$/”; the following command will print the empty lines in “laptops.txt” file:

Or you can delete by replacing “p” with “d” in the above command as displayed below:

Matching the letter case

The sed command allows users to manipulate the words with specific letter case:

For instance, you can print, delete, substitute the letter case words by using sed command:

A text file named as “test.txt” is used in this example, the content of this file is printed by using following command:

Matching the lowercase letters

The following command will print all those words that contain lower case letters in them:

Matching the uppercase letters

Or you can print the words that contains upper case letters by issuing the following command in terminal:

Conclusion

Regular Expressions(regex) are referred to as; any word or sequence of characters that is used to get the matching words from any text file. They provide extensive support for several programming languages as well as Ubuntu commands or programs. Alongside this regex, Ubuntu provides support for extensive commands that ease the process of performing tedious tasks. The sed command line utility of Ubuntu allows you to perform several tedious tasks very easily to perform several operations on text files. We have compiled this guide to enlighten the benefits of joining regex with sed; this joint venture provides advanced level matching and searching inside text files. Regular Expressions need help from characters that are used for matching to perform various tasks such as deleting, printing, substituting, or managing text inside text files.

Источник

Оцените статью
Adblock
detector