Grep and awk in linux

Содержание

Using grep vs awk
6 Answers 6
What are the differences among grep, awk & sed? [duplicate]
3 Answers 3

Using grep vs awk

To capture a particular pattern, awk and grep can be used. Why should we use one over the other? Which is faster and why? If I had a log file and I wanted to grab a certain pattern, I could do one of the following

awk '/pattern/' /var/log/messages

grep 'pattern' /var/log/messages

I haven’t done any benchmarking, so I wouldn’t know. Can someone elaborate this? It is great to know the inner workings of these two tools.

Precede any command, even shell scripts, with the time command to time how long it takes to run the command. Ex: time ls -l .

6 Answers 6

grep will most likely be faster:

# time awk '/USAGE/' imapd.log.1 | wc -l 73832 real 0m2.756s user 0m2.740s sys 0m0.020s # time grep 'USAGE' imapd.log.1 | wc -l 73832 real 0m0.110s user 0m0.100s sys 0m0.030s

awk is a interpreted programming language, where as grep is a compiled c-code program (which is additionally optimized towards finding patterns in files).

(Note — I ran both commands twice so that caching would not potentially skew the results)

More details about interpreted languages on wikipedia.

As Stephane has rightly pointed out in comments, your mileage may vary due to the implementation of the grep and awk you use, the operating system it is on and the character set you are processing.

Without saying what grep or awk implementation you’re using and on what computer architecture, and with which system character set, those timings have little value.

the second command will also use the newly cached version. I dont doubt that grep is quicker but not by as much as your numbers show.

(hence running awk, grep, awk, grep and posting the results from the second set of awk and grep 🙂 and FYI, I live in a UTF8 locale.

Funny enough, with the BSD tools (on a Mac), awk (31.74s) is slightly faster than sed (33.34s), which is slightly faster than grep (34.21s). Gnu awk owns them all at 5.24s, I don’t have gnu grep or sed to test.

grep should be slightly faster because awk does more with each input line than just search for a regexp in it, e.g. if a field is referenced in the script (which it’s not in this case) awk will split each input line into fields based on the field-separator value and it populates builtin variables. but with what you posted there should be almost no difference. By far the most important difference between grep and awk wrt matching regexps is that grep searches the whole line for a matching string while awk can search specific fields and so provide more precision and fewer false matches.

Use the most specific and expressive tool. The tool that best fits your use case is likely to be the fastest.

searching for lines matching a substring or regexp? Use grep.
selecting certain columns from a simply-delimited file? Use cut.
performing pattern-based substitutions or . other stuff sed can reasonably do? Use sed.
need some combination of the above 3, or printf formatting, or general purpose loops and branches? Use awk.

+1 except use perl instead of awk . if you need something more complicated than grep/cut/sed, then chances are awk won’t be enough and you need something «full-blown»

@RetroCode: python is more «general purpose» than perl; the equivalent one-liner will probably be much longer.

@sds no, you don’t need perl unless you’re going to do something other than text processing. awk is just fine for the text processing stuff that’s more complicated than grep/cut/sed and as a bonus comes as standard on all UNIX installations, unlike perl.

@RetroCode Because the CPython implementation’s command line tool doesn’t support being used as a filter for newline-terminated records, because it is more verbose, which is undesirable for one-liners, and because all regex modules I have tried with Python are 10 to 100 slower than Perl’s, which is bad for large data.

When only searching for strings, and speed matters, you should almost always use grep . It’s orders of magnitude faster than awk when it comes to just gross searching.

UTILITY OPERATION TYPE EXECUTION TIME CHARACTERS PROCESSED PER SECOND (10 ITERATIONS) ------- -------------- --------------- ------------------------------- grep search only 41 sec. 489.3 million sed search & replace 4 min. 4 sec. 82.1 million awk search & replace 4 min. 46 sec. 69.8 million Python search & replace 4 min. 50 sec. 69.0 million PHP search & replace 15 min. 44 sec. 21.2 million

Those are completely bogus numbers. Talk about comparing apples and oranges — it’s like saying you can only find a new car on web site A in 5 secs whereas you can find a car, negotiate a price, get a loan, and purchase the car on site B in 1 hour so therefore site A is faster than site B.The article you quoted is completely wrong in it’s statements of relative execution speed between grep, sed, and awk and it also says awk . has PCRE matching for regular expressions which is just completely untrue.

While I agree that in theory grep should be faster than awk , in practice, YMMV as that depends a lot on the implementation you use.

here comparing busybox 1.20.0’s grep and awk, GNU grep 2.14, mawk 1.3.3, GNU awk 4.0.1 on Debian/Linux 7.0 amd64 (with glibc 2.17) in a UTF-8 locale on a 240MB file of 2.5M lines of ASCII-only characters.

$ time busybox grep error error | wc -l 331003 busybox grep error error 8.31s user 0.12s system 99% cpu 8.450 total wc -l 0.07s user 0.11s system 2% cpu 8.448 total $ time busybox awk /error/ error | wc -l 331003 busybox awk /error/ error 2.39s user 0.84s system 98% cpu 3.265 total wc -l 0.12s user 1.23s system 41% cpu 3.264 total $ time grep error error | wc -l 331003 grep error error 0.80s user 0.10s system 99% cpu 0.914 total wc -l 0.00s user 0.11s system 12% cpu 0.913 total $ time mawk /error/ error | wc -l 330803 mawk /error/ error 0.54s user 0.13s system 91% cpu 0.732 total wc -l 0.03s user 0.08s system 14% cpu 0.731 total $ time gawk /error/ error | wc -l 331003 gawk /error/ error 1.37s user 0.12s system 99% cpu 1.494 total wc -l 0.04s user 0.07s system 7% cpu 1.492 total $ time

In the C locale, only GNU grep gets a significant boost and becomes faster than mawk .

The dataset, the type of the regexp may also make a big difference. For regexps, awk should be compared to grep -E as awk ‘s regexps are extended REs.

For this dataset, awk could be faster than grep on busybox based systems or systems where mawk is the default awk and the default locale is UTF-8 based (IIRC, it used to be the case in Ubuntu).

Источник

What are the differences among grep, awk & sed? [duplicate]

I am confused about the differences between grep , awk and sed in terms of their role in Unix/Linux system administration and text processing.

3 Answers 3

grep : search for specific terms in a file

#usage $ grep This file.txt Every line containing "This" Every line containing "This" Every line containing "This" Every line containing "This" $ cat file.txt Every line containing "This" Every line containing "This" Every line containing "That" Every line containing "This" Every line containing "This"

Now awk and sed are completly different than grep . awk and sed are text processors. Not only do they have the ability to find what you are looking for in text, they have the ability to remove, add and modify the text as well (and much more).

awk is mostly used for data extraction and reporting. sed is a stream editor
Each one of them has its own functionality and specialties.

$ sed -i 's/cat/dog/' file.txt # this will replace any occurrence of the characters 'cat' by 'dog'

$ awk '' file.txt # this will print the second column of file.txt

Basic awk usage:
Compute sum/average/max/min/etc. what ever you may need.

$ cat file.txt A 10 B 20 C 60 $ awk 'BEGIN  END ' file.txt Average: 30

I recommend that you read this book: Sed & Awk: 2nd Ed.

It will help you become a proficient sed/awk user on any unix-like environment.

Grep is useful if you want to quickly search for lines that match in a file. It can also return some other simple information like matching line numbers, match count, and file name lists.

Awk is an entire programming language built around reading CSV-style files, processing the records, and optionally printing out a result data set. It can do many things but it is not the easiest tool to use for simple tasks.

Sed is useful when you want to make changes to a file based on regular expressions. It allows you to easily match parts of lines, make modifications, and print out results. It’s less expressive than awk but that lends it to somewhat easier use for simple tasks. It has many more complicated operators you can use (I think it’s even turing complete), but in general you won’t use those features.

Источник