Linux count lines grep

Count total number of occurrences using grep

grep -c is useful for finding how many times a string occurs in a file, but it only counts each occurence once per line. How to count multiple occurences per line? I’m looking for something more elegant than:

perl -e '$_ = <>; print scalar ( () = m/needle/g ), "\n"' 

I know grep is specified, but for anyone using ack , the answer is simply ack -ch . @KyleStrand For me ack -ch only counted the lines with occurrences and not the number of occurences

@MarcKees Looking at the man page, that sounds like the correct behavior. Thanks for pointing that out!

7 Answers 7

grep’s -o will only output the matches, ignoring lines; wc can count them:

This will also match ‘needles’ or ‘multineedle’.

To match only single words use one of the following commands:

grep -ow 'needle' file | wc -l grep -o '\bneedle\b' file | wc -l grep -o '\' file | wc -l 

@Geek \b matches a word boundary, \B matches NOT a word boundary. The answer above would be more correct if it used \b at both ends.

For a count of occurrences per line, combine with grep -n option and uniq -c . grep -no ‘\’ file | uniq -c

Doesn’t seem to work on WSL, it report a smaller number of occurences on large files. grep ‘needle’ file -c works in my case

@JivanPal This was in the context of uniq -c , which sort cannot do. Of course, if you know identical lines will always be adjacent, you don’t need sort at all, which they will be if your pattern is just a static string, but not in the general case.

If you have GNU grep (always on Linux and Cygwin, occasionally elsewhere), you can count the output lines from grep -o : grep -o needle | wc -l .

With Perl, here are a few ways I find more elegant than yours (even after it’s fixed).

perl -lne 'END map ++$c, /needle/g' perl -lne 'END $c += s/needle//g' perl -lne 'END ++$c while /needle/g' 

With only POSIX tools, one approach, if possible, is to split the input into lines with a single match before passing it to grep. For example, if you’re looking for whole words, then first turn every non-word character into a newline.

# equivalent to grep -ow 'needle' | wc -l tr -c '[:alnum:]' '[\n*]' | grep -c '^needle$' 

Otherwise, there’s no standard command to do this particular bit of text processing, so you need to turn to sed (if you’re a masochist) or awk.

awk '> END ' sed -n -e 's/set/\n&\n/g' -e 's/^/\n/' -e 's/$/\n/' \ -e 's/\n[^\n]*\n/\n/g' -e 's/^\n//' -e 's/\n$//' \ -e '/./p' | wc -l 

Here’s a simpler solution using sed and grep , which works for strings or even by-the-book regular expressions but fails in a few corner cases with anchored patterns (e.g. it finds two occurrences of ^needle or \bneedle in needleneedle ).

sed 's/needle/\n&\n/g' | grep -cx 'needle' 

Note that in the sed substitutions above, I used \n to mean a newline. This is standard in the pattern part, but in the replacement text, for portability, substitute backslash-newline for \n .

Читайте также:  Операционная система linux параметры

Источник

8 Ways to Count Lines in a File in Linux

wc -l < [filename] on a green background

Counting lines in a Linux file can be hectic if you don’t know the applicable commands and how to combine them. This tutorial makes the process comfortable by walking you through eight typical commands to count lines in a file in Linux.

For example, the word count, wc , command’s primary role, as the name suggests, is to count words. However, since a group of words forms a line, you can use the command to count lines besides characters and words.

All you do is redirect the input of a file to the command alongside the -l flag.

Apart from the wc , you can use the awk, sed, grep, nl , pr , cat and perl commands. Before that, it would help to understand data streams and piping in Linux.

Table of Contents

The concept of Data Streams and Piping

Data streams

Three files come together to complete the request when you run a command: standard input, standard output, and error files.

The standard input, abbreviated as stdin and redirected as < , feeds the computer with data. The standard output, abbreviated as stdout and redirected as >, shows the result of running a command. If an error occurs when processing the result, we see the standard error, often abbreviated as stderr .

The primary stdin is the keyboard, while the stdout is the (monitor) screen. However, due to the flexibility of Linux and the fact that everything in Linux is a file, we can change the stdin , stdout , or stderr to suit our needs, as you will see when counting lines with the wc command.

Before that, you should understand the concept of piping in Linux.

Piping

Piping in Linux, denoted by | , means running two or more commands simultaneously on the terminal. For example, we can cat a file, let’s call the file index.txt . But instead of waiting to see the output, we redirect it to the sort command, which outputs the data alphabetically.

Читайте также:  Смена пользователя терминале linux

Now that you understand the main concepts applied when customizing a file’s input to get the number of lines, let’s see eight ways to count lines in a file in Linux.

Ways to Count Lines in a File in Linux

WC

The wc command returns a file’s line numbers, words, and characters, respectively.

Let’s create a file, practice.txt , and append the following lines.

We are counting file lines. We use the wc, awk, sed, grep, and perl commands. The process is easy because we can redirect ouptut and pipe commands. Linux is becoming fun!

Running the wc command on the file, we get the following output:

Likewise, we can control the output using specific flags with the input redirection symbol.

Источник

Count the number of lines found by grep

I want to know how many instances of a pattern are found by grep while looking recursively through a directory structure. It seems I should be able to pipe the output of grep through something which would count the lines.

2 Answers 2

I was able to put the answer together with help from this question. The program «wc» program counts newlines, words and byte counts. The «-l» option specifies that the number of lines is desired. For my application, the following worked nicely to count the number of instances of «somePattern»:

$grep -r "somePattern" filename | wc -l 

There’s also grep -c , but it doesn’t exactly do what you require: «Suppress normal output; instead print a count of matching lines for each input file».

grep -rcZ "some_pattern" | awk -F'\0' 'END' 

This is likely superior in speed compared to wc -l .

It also works for files with newline in name.

worked perfectly, but if you’re still around, would you mind making it better by explaining step by step what was done there?

For grep: -r searches recursively, -Z prints out the output with the filename separated from the number of matching lines with the nul character. For awk: -F ‘\0’ makes the field delimiter the nul character. s+=$NF is the sum of the values of the last field — NF is the number of fields $NF is therefore the last field, and then s is printed when awk runs out of input and the program ends.

Источник

Grep Count

While [.inline-code]grep[.inline-code] is most often used for finding strings and neighboring text, it can be used for counting lines and occurrences as well.

[#count-lines][.inline-code]grep[.inline-code] count lines[#count-lines]

To count the number of lines that a string appears in using [.inline-code]grep[.inline-code]:

For which the [.inline-code]-c[.inline-code] flag is used to count the number of lines that are matched and print out the number. This can be useful, for example, when you want to search through log files for the number of entries from a particular IP, endpoint or other identifier where you only care about the number of lines, not the full number of matches.

Читайте также:  Linux запрет обновления пакета

[#count-occurrences][.inline-code]grep[.inline-code] count occurrences[#count-occurrences]

If however you want to count the number of occurrences of a string, beyond simply the number of lines, then the command can be used:

For which, the [.inline-code]-o[.inline-code] flag gets the occurrence of that string, while [.inline-code]wc -l[.inline-code] will count the number of times the occurrence appears on each line. This is useful when you want to know how many times a particular string occurs in a document such as the number of times a name, variable or IP address appears.

This command is simplified if you only want to print out the occurrences of that string rather than count the number of occurrences. This is done using:

For which the [.inline-code]-o[.inline-code] flag is used to print out the matches on each line. This is useful when you want to see the context of the matches in the text they are part of, such as what strings the matches end up as.

[#common-gotchas]Common “gotchas” when using [.inline-code]grep[.inline-code] to count[#common-gotchas]

[#using-regex][.inline-code]grep[.inline-code] uses regex standards[#using-regex]

It is important to know that the “string” following the [.inline-code]grep[.inline-code] command will match the document based on regular expression standards. This means that simply typing in [.inline-code]test[.inline-code] will match on longer strings containing that word such as “testing”. To match only the specific word use regex expressions or the [.inline-code]-w[.inline-code] flag. For example:

 $ grep -o “\btest\b” |wc -l $ grep -ow “test” | wc -l

[#multiple-words][.inline-code]grep[.inline-code] counting multiple words[#multiple-words]

 $ grep -o -E “string1|string2” | sort | uniq -c

If you want to search for multiple words at the same time, this command becomes more complicated. To print out the word and count you can use:

Read more on grep multiple strings to understand how the “or” part of this command came together.

[#counting-across-multiple-files][.inline-code]grep[.inline-code] counting across multiple files[#counting-across-multiple-files]

To match across multiple files and count the occurrences then this can become even more complicated. But the following command should print out the occurrences and which file they occur in:

 $ grep -0 “string”  | cut -d ‘:’ -f 1 | uniq -c

All of these commands can also be combined with the [.inline-code]-i[.inline-code] flag so that the string match is case insensitive

[#using-awk-or-sed]Use [.inline-code]awk[.inline-code] or [.inline-code]sed[.inline-code] for text manipulation[#using-awk-or-sed]

If you want to manipulate text, or work with specific fields of a file, you will probably want to use a more specific tool such as [.inline-code]sed[.inline-code] or [.inline-code]awk[.inline-code]

[#learn-more]Find out more about [.inline-code]grep[.inline-code][#learn-more]

As always if you want to find out more about how to use the grep tool you can use:

Which will print out all the options with explanations. Or:

Which will print out a short page of all the available options.

Источник

Оцените статью
Adblock
detector