How to use grep command

grep is a command line program that allows us to find all occurrences of given pattern in the text. We can use grep to search command standard output, for example:

$ # grep will print lines containing "eth0" phrase
$ ifconfig | grep eth0
eth0      Link encap:Ethernet  HWaddr xx:xx:xx:xx:xx:xx
$ cat /usr/share/dict/words | grep nerd

We may also use grep to find given pattern in the set of text files:

$ cat > foo.txt <<EOF
in the middle of the line foo is
$ # search for foo in foo.txt file
$ grep foo foo.txt
$ cat > bar.txt <<EOF
$ # search for foo in foo.txt and bar.txt 
$ grep foo {foo,bar}.txt

NOTE: In the last example I used heredoc’s to create foo.txt and bar.txt files.
NOTE: {foo,bar}.txt is expanded by bash into foo.txt bar.txt, this is called brace expansion.

We may also peform recursive searches using -r or -R options (grep will search in every file contained in the specified directory and it’s subdirectories):

$ tree grep_exercises/
  |-- bar.txt
  `-- foo.txt
$ grep -r foo grep_exercises/

grep command in details

General grep invocation has form:

$ grep [options] pattern [files|directories]

grep searches standard input when no files were given or specified files/directories for occurrences of the pattern. When grep finds a line that contains the pattern it prints it to the standard output. When we search multiple files each printed line is prefixed by a filename.

-i (--ignore-case)

By default grep is case sensitive, so pattern foo doesn’t match word FOO, we can perform case insensitive search using -i option:

$ grep bar bar.txt

$ grep -i bar bar.txt
-v (--invert-match)

We can use -v option to negate search, that is to print all lines that don’t contain given pattern:

$ cat bar.txt 

$ grep -v foo bar.txt 
-r and -R

Both options can be used to perform recursive searches, the only difference is that -R will follow symbolic links during search while -r won’t.

When we are performing recursive search often we want to exclude some directory from search e.g. .git directory when we are searching inside GIT repository, --exclude-dir option allows us to do exactly that:

$ # create git repository
$ git init; git add *.txt; git commit -m 'foo';
$ grep -r foo .
./foo.txt:in the middle of the line foo is
Binary file ./.git/index matches

$ grep -r foo --exclude-dir=.git .
./foo.txt:in the middle of the line foo is

We may limit set of files that will be searched during recursive search using --include/--exclude options. Both of these options take shell GLOB as an argument. --include will force grep to check only files that match GLOB. --exclude will force grep to skip any files that don’t match GLOB. For example to search for foo pattern only in .txt files we may use command:

$ grep -r --include='*.txt' .

Another useful option when performing recursive searches is -I, it tells grep to skip binary files when looking for the pattern.

-w (--word-regexp)

Sometime we want to match only whole words e.g. when we search for foo we don’t want matches like foobar or xfoox. -w option will force grep to perform whole word search:

$ grep foo foo.txt
in the middle of the line foo is

$ grep -w foo foo.txt
in the middle of the line foo is
-c (--count)

-c option allow us to count number of occurrences of the pattern inside searched files:

$ grep -c foo foo.txt

$ grep -c foo *.txt
-n (--line-number)

-n option will force grep to prefix each found line with line number:

$ grep -rn foo grep_exercises/
grep_exercises/foo.txt:5:in the middle of the line foo is
-l (--files-with-matches), -L (--files-without-match)

-l option will force grep to print only filenames of files that contain given pattern instead of printing lines:

$ grep -rl foo grep_exercises/

The is also -L option that will print only filenames of files that doesn’t contain specified pattern.

-h (--no-filename), -H (--with-filename)

-h option will prevent printing of filenames when performing multifile search. Similarly -H option will force grep to always print filenames:

$ grep -H foo foo.txt 
foo.txt:in the middle of the line foo is
-A (--after-context), -B (--before-context) and -C (--context)

-A, -B and -C options allow us to print not only lines containing pattern but also specified number of lines after (A) and before (B) them. -C option is a shortcut to set both -A and -B to the same value:

$ grep -B 1 -A 2 -w my /usr/share/dict/words

Basic regular expressions (BRE)

By default grep interpret pattern as a basic regular expression (BRE). BREs are similar to shell GLOBs but are more powerful. Inside BRE we may use the following special characters:

Character/Expression Meaning
. Matches any single character
* Matches zero or more repetitions of previous character or group
[abc] Bracket expression matches single character. [abc] matches any of the characters a, b or c. [0-9] matches any digit. [^abc] matches any character that is not a, b or c. [0-9abcdef] matches any lowercase hex digit. Finally we can use character classes inside bracket expressions e.g. [[:digit:]]
^ Matches beginning of the line
$ Matches end of the line

Any of the special characters used in BRE may be escaped by preceding them with a backslash (\).

OK, now let’s see BRE in action:

$ # first we will find all words that ends with 'cat'
$ grep 'cat$' /usr/share/dict/words

$ # now let's find all 5 letter words in which second letter is
$ # a, e or i
$ grep '^.[aei]...$' /usr/share/dict/words | head

$ # and let's finish by finding all hex numbers in the text
$ # we will assume that hex numbers have format 0xaabbcc
$ cat /proc/cpuinfo | grep '0x[0-9a-f]*'
microcode   : 0x9
microcode   : 0x9

BRE are even more powerful if we use grouping and more advanced operators. Let’s start with grouping, we may create groups by surrounding parts of the pattern with \(...\), then we may apply e.g. * to the entire group. For example to find all words that have even number of characters we may write:

$ grep '^\([a-z][a-z]\)*$' /usr/share/dict/words | head

This works because we are telling grep that we want to find any number of repetitions of pair of characters.

Sometime we search for pattern like phone number that have fixed number of digits in each part, we may use \{...\} operator to specify exact number of repetitions of previous character or group:

Expression Meaning
\{m\} Matches m repetitions of previous character or group e.g. a{3} matches aaa but not aa or abc.
\{m,\} Matches at least m repetitions of previous character or group e.g. a{2} matches aa, aaa but not a.
\{m,n\} Matches at least m and at most n repetitions of of previous character or group.

For example to search for phone number in format XXX-XX-XXXX we will grep:

$ grep '[0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\}' file

The last thing worth know about BRE is the possibility to use backreferences. For example let’s say that we search for all five letters palindromes, we may use \1, \2, etc. placeholders to tell grep that we expect text that was matched by first group, second group etc. in the place of placeholders:

$ grep '^\(.\)\(.\).\2\1$' /usr/share/dict/words | head

NOTE: Sometimes we just want to search using old plain text pattern, we may disable BRE using -F grep option.

I didn’t described all BRE features, if you are interested in details please check the standard. IMHO BRE are ugly and cumbersome to use, in the next section I will quickly describe extended regular expressions that bring full power of regexes to grep.

Extended regular expressions (ERE)

To enable extended regular expressions we must pass -E (--extended-regexp) option to grep. ERE are similar to BRE with the exception that we don’t need to precede operators such as grouping with backslashes. To create a group in ERE we simply use (foo) pattern, and to specify number of matches for that group we use (foo){3} pattern. This looks much more cleaner compared to BRE \(foo\)\{3\}.

If we want to match e.g. left parenthesis we need to escape it using backslash, for example to match main() text we could use main\(\) pattern (or we could use -F option to match text as it is).

In ERE in addition to all special characters from BRE we may use:

Character/Expression Meaning
+ Matches one or more repetitions of previous character or group
? Matches zero or one occurrences of previous character or group
| Alternation - allows to match one of the specified patterns from set e.g. 0(x|X) will match 0x and 0X texts

Now let’s see some examples of ERE in action:

$ # let's find all phone numbers in format XXXX-XX-XXX that may
$ # be optionally prefixed by country code (+XX)
$ grep -E '(\(\+[0-9]{2}\) )?[0-9]{4}-[0-9]{2}-[0-9]{3}' file

$ # let's find all percent values in the text
$ grep -E '[0-9]+%' file

$ # let's find all email addresses, here we use -o option
$ # to print only matched text
$ grep -E -o '[A-Za-z0-9._]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}' file

Most modern versions of grep support even more powerful versions of regular expressions that include shortcuts for character classes like \d for [0-9] and \D for [^0-9] and more advanced anchors like \b (matches word start or end). Check your local version of grep to find out what features are available.

That’s all that I wanted to say about grep command. Happy grepping!