AWK - Regular Expressions

AWK is very powerful and efficient in handling regular expressions. A number of complex tasks can be solved with simple regular expressions. Any command-line expert knows the power of regular expressions.
This tutorial covers standard regular expressions with suitable example.

Dot

It matches any single character except end of line character. For instance below simple example matches fin, fun, fan etc.
[jerry]$ echo -e "cat\nbat\nfun\nfin\nfan" | awk '/f.n/'
On executing the above code, you get the following result:
fun
fin
fan

Start of line

It matches the start of line. For instance below example prints all the lines which starts with pattern The.
[jerry]$ echo -e "This\nThat\nThere\nTheir\nthese" | awk '/^The/'
On executing the above code, you get the following result:
There
Their

End of line

It matches the end of line. For instance below example prints the lines which end with letter n.
[jerry]$ echo -e "knife\nknow\nfun\nfin\nfan\nnine" | awk '/n$/'
On executing the above code, you get the following result:
fun
fin
fan

Match character set

It is used to match only one out of several characters. For instance below example matches pattern Call and Tall but not Ball.
[jerry]$ echo -e "Call\nTall\nBall" | awk '/[CT]all/'
On executing the above code, you get the following result:
Call
Tall

Exclusive set

In exclusive set the carat negates the set of characters in the square brackets. For instance below example prints only Ball.
[jerry]$ echo -e "Call\nTall\nBall" | awk '/[^CT]all/'
On executing the above code, you get the following result:
Ball

Alteration

A vertical bar allows regular expressions to be logically OR'ed. For instance below example prints Ball and Call.
[jerry]$ echo -e "Call\nTall\nBall\nSmall\nShall" | awk '/Call|Ball/'
On executing the above code, you get the following result:
Call
Ball

Zero or one occurrence

It matches zero or one occurrence of the preceding character. For instance below example matches Colour as well as Color. We have made u as a optional character by using ?.
[jerry]$ echo -e "Colour\nColor" | awk '/Colou?r/'
On executing the above code, you get the following result:
Colour
Color

Zero or more occurrence

It matches the zero or more occurrence of the preceding character. For instance below example matches ca, cat, catt and so on.
[jerry]$ echo -e "ca\ncat\ncatt" | awk '/cat*/'
On executing the above code, you get the following result:
ca
cat
catt

One or more occurrence

It matches one or more occurrence of the preceding character. For instance below example matches one or more occurrences of the 2.
[jerry]$ echo -e "111\n22\n123\n234\n456\n222"  | awk '/2+/'
On executing the above code, you get the following result:
22
123
234
222

Grouping

Parentheses () are used for grouping and the character | is used for alternatives. For instance below regular expression matches lines containing either Apple Juice or Apple Cake.
[jerry]$ echo -e "Apple Juice\nApple Pie\nApple Tart\nApple Cake" | awk '/Apple (Juice|Cake)/'
On executing the above code, you get the following result:
Apple Juice
Apple Cake

No comments:

Post a Comment