AWK - Built in variables

AWK provides several built in variables. They play important role while writing AWK scripts. This chapter illustrates the usage of the built-in variables.



Standard AWK variables

Following are the standard AWK variables:

ARGC

It implies the number of arguments provided at the command line.
[jerry]$ awk 'BEGIN {print "Arguments =", ARGC}' One Two Three Four
On executing the above code, you get the following result:
Arguments = 5
What's wrong with the program? why AWK is showing 5 when we have passed only 4 arguments. Just check below example it will clear your doubt.

ARGV

It is an array which stores the command-line arguments. Array's valid index range is from 0 to ARGC - 1.
[jerry]$ awk 'BEGIN { for (i = 0; i < ARGC - 1; ++i) { printf "ARGV[%d] = %s\n", i, ARGV[i] } }' one two three four
On executing the above code, you get the following result:
ARGV[0] = awk
ARGV[1] = one
ARGV[2] = two
ARGV[3] = three

CONVFMT

It represents the conversion format for numbers and its default value is %.6g.
[jerry]$ awk 'BEGIN { print "Conversion Format =", CONVFMT }'
On executing the above code, you get the following result:
Conversion Format = %.6g

ENVIRON

It is an associative array of environment variables.
[jerry]$ awk 'BEGIN { print ENVIRON["USER"] }'
On executing the above code, you get the following result:
jerry
To find names of other environment variable use GNU/Linux's env command.

FILENAME

It represents the current file name.
[jerry]$ awk 'END {print FILENAME}' marks.txt
On executing the above code, you get the following result:
marks.txt
Please note that FILENAME is undefined in BEGIN block.

FS

It represents the (input)field separator and its default value is space. You can also change this by using -F command line option.
[jerry]$ awk 'BEGIN {print "FS = " FS}' | cat -vte
On executing the above code, you get the following result:
FS =  $

NF

It represents the number of fields in current record. For instance below example prints only those lines which contains more than two fields.
[jerry]$ echo -e "One Two\nOne Two Three\nOne Two Three Four" | awk 'NF > 2'
On executing the above code, you get the following result:
One Two Three
One Two Three Four

NR

It represents the number of the current record. For instance below example prints the record if current record contains less than three fields
[jerry]$ echo -e "One Two\nOne Two Three\nOne Two Three Four" | awk 'NR < 3'
On executing the above code, you get the following result:
One Two
One Two Three

FNR

It is similar to NR, but relative to the current file. It is useful when AWK is operating on multiple files. Value of FNR will reset with new file.

OFMT

It represents output format number and its default value is %.6g.
[jerry]$ awk 'BEGIN {print "OFMT = " OFMT}'
On executing the above code, you get the following result:
OFMT = %.6g

OFS

It represents output field separator and its default value is space.
[jerry]$ awk 'BEGIN {print "OFS = " OFS}' | cat -vte
On executing the above code, you get the following result:
OFS =  $

ORS

It represents output record separator and its default value is newline.
[jerry]$ awk 'BEGIN {print "ORS = " ORS}' | cat -vte
On executing the above code, you get the following result:
ORS = $
$

RLENGTH

It represents the length of the string matched by match function. AWK's match function searches for a given string in the input-string.
[jerry]$ awk 'BEGIN { if (match("One Two Three", "re")) { print RLENGTH } }'
On executing the above code, you get the following result:
2

RS

It represents (input)record separator and its default value is newline.
[jerry]$ awk 'BEGIN {print "RS = " RS}' | cat -vte
On executing the above code, you get the following result:
RS = $
$

RSTART

It represents the first position in the string matched by match function.
[jerry]$ awk 'BEGIN { if (match("One Two Three", "Thre")) { print RSTART } }'
On executing the above code, you get the following result:
9

SUBSEP

It represents separator character for array subscripts and its default value is \034.
[jerry]$ awk 'BEGIN { print "SUBSEP = " SUBSEP }' | cat -vte
On executing the above code, you get the following result:
SUBSEP = ^\$

$0

It represents the entire input record.
[jerry]$ awk '{print $0}' marks.txt
On executing the above code, you get the following result:
1)    Amit     Physics    80
2)    Rahul    Maths      90
3)    Shyam    Biology    87
4)    Kedar    English    85
5)    Hari     History    89

$n

It represents nth field in current record where fields are separated by FS.
[jerry]$ awk '{print $3 "\t" $4}' marks.txt
On executing the above code, you get the following result:
Physics    80
Maths      90
Biology    87
English    85
History    89

GNU AWK specific variables

Following are the GNU AWK specific variables:

ARGIND

It represents index in ARGV of the current file being processed.
[jerry]$ awk '{ print "ARGIND   = ", ARGIND; print "Filename = ", ARGV[ARGIND] }' junk1 junk2 junk3
On executing the above code, you get the following result:
ARGIND   =  1
Filename =  junk1
ARGIND   =  2
Filename =  junk2
ARGIND   =  3
Filename =  junk3

BINMODE

It is used to specifies binary mode for all file I/O on non-POSIX systems. Numeric values of 1, 2, or 3, specify that input files, output files, or all files, respectively, should use binary I/O. String values of r, or w specify that input files, or output files, respectively, should use binary I/O. String values of rw or wr specify that all files should use binary I/O.

ERRNO

A string indicating the error when a redirection fails for getline or if close call fails.
[jerry]$ awk 'BEGIN { ret = getline < "junk.txt"; if (ret == -1) print "Error:", ERRNO }'
On executing the above code, you get the following result:
Error: No such file or directory

FIELDWIDTHS

A space separated list of field widths. When this variable is set, GAWK parses the input into fields of fixed width, instead of using the value of the FS variable as the field separator.

IGNORECASE

When this variable is set GAWK becomes case insensitive. Following simple example illustrates this:
[jerry]$ awk 'BEGIN{IGNORECASE=1} /amit/' marks.txt
On executing the above code, you get the following result:
1)    Amit     Physics    80

LINT

It provides dynamic control of the --lint option from GAWK program. When this variable is set GAWK prints lint warnings. When assigned the string value fatal, lint warnings become fatal errors, exactly like --lint=fatal.
[jerry]$ awk 'BEGIN {LINT=1; a}'
On executing the above code, you get the following result:
awk: cmd. line:1: warning: reference to uninitialized variable `a'
awk: cmd. line:1: warning: statement has no effect

PROCINFO

This is an associative array containing information about the process, such as real and effective UID numbers, process ID number, and so on.
[jerry]$ awk 'BEGIN { print PROCINFO["pid"] }'
On executing the above code, you get the following result:
4316

TEXTDOMAIN

It represents the text domain of the AWK program. It is used to find the localised translations for the program's strings.
[jerry]$ awk 'BEGIN { print TEXTDOMAIN }'
On executing the above code, you get the following result:
messages
Above output shows English text because of en_IN locale.

No comments:

Post a Comment