Technology: AWK - Built-in Functions

The AWK has a number of functions built into it that are always available to the programmer. This tutorial describes AWK's Arithmetic, String, Time, Bit manipulation and other miscellaneous functions with suitable examples:

Arithmetic Functions

AWK has the following built-in arithmetic functions:

atan2(y, x)

It return the arctangent of (y/x) in radians. Following simple example illustrates this:

[jerry]$ awk 'BEGIN {
  PI = 3.14159265
  x = -10
  y = 10
  result = atan2 (y,x) * 180 / PI;

  printf "The arc tangent for (x=%f, y=%f) is %f degrees\n", x, y, result
}'

On executing the above code, you get the following result:

The arc tangent for (x=-10.000000, y=10.000000) is 135.000000 degrees

cos(expr)

This function returns the cosine of expr, which is in radians. Below simple example illustrates this:

[jerry]$ awk 'BEGIN {
  PI = 3.14159265
  param = 60
  result = cos(param * PI / 180.0);

  printf "The cosine of %f degrees is %f.\n", param, result
}'

On executing the above code, you get the following result:

The cosine of 60.000000 degrees is 0.500000.

exp(expr)

This function is used to find the exponential value.

[jerry]$ awk 'BEGIN {
  param = 5
  result = exp(param);

  printf "The exponential value of %f is %f.\n", param, result
}'

On executing the above code, you get the following result:

The exponential value of 5.000000 is 148.413159.

int(expr)

This function truncate the expr to integer value. Below simple example illustrates this:

[jerry]$ awk 'BEGIN {
  param = 5.12345
  result = int(param)

  print "Truncated value =", result
}'

On executing the above code, you get the following result:

Truncated value = 5

log(expr)

This function calculates the natural logarithm.

[jerry]$ awk 'BEGIN {
  param = 5.5
  result = log (param)

  printf "log(%f) = %f\n", param, result
}'

On executing the above code, you get the following result:

log(5.500000) = 1.704748

rand

This function returns a random number N, between 0 and 1, such that 0 <= N < 1. For instance below example generates three random numbers:

[jerry]$ awk 'BEGIN {
  print "Random num1 =" , rand()
  print "Random num2 =" , rand()
  print "Random num3 =" , rand()
}'

On executing the above code, you get the following result:

Random num1 = 0.237788
Random num2 = 0.291066
Random num3 = 0.845814

sin(expr)

This function returns the sine of expr, which is in radians. Below simple example illustrates this:

[jerry]$ awk 'BEGIN {
  PI = 3.14159265
  param = 30.0
  result = sin(param * PI /180)

  printf "The sine of %f degrees is %f.\n", param, result
}'

On executing the above code, you get the following result:

The sine of 30.000000 degrees is 0.500000.

sqrt(expr)

This function returns the square root of expr.

[jerry]$ awk 'BEGIN {
  param = 1024.0
  result = sqrt(param)

  printf "sqrt(%f) = %f\n", param, result
}'

On executing the above code, you get the following result:

sqrt(1024.000000) = 32.000000

srand([expr])

This function generates random number using seed value. It use expr as the new seed for the random number generator. In absence of expr it uses the time of day as seed value.

[jerry]$ awk 'BEGIN {
  param = 10

  printf "srand() = %d\n", srand()
  printf "srand(%d) = %d\n", param, srand(param)
}'

On executing the above code, you get the following result:

srand() = 1
srand(10) = 1417959587

String Functions

AWK has the following built-in String functions:

asort(arr [, d [, how] ])

This function sorts the contents of arr using gawk's normal rules for comparing values, and replace the indices of the sorted values arr with sequential integers starting with 1.

[jerry]$ awk 'BEGIN {
 arr[0] = "Three"
 arr[1] = "One"
 arr[2] = "Two"

 print "Array elements before sorting:"
 for (i in arr) {
  print arr[i]
 }

 asort(arr)

 print "Array elements after sorting:"
 for (i in arr) {
  print arr[i]
 }
}'

On executing the above code, you get the following result:

Array elements before sorting:
Three
One
Two
Array elements after sorting:
One
Three
Two

asorti(arr [, d [, how] ])

The behaviour of this function is the same as that of asort(), except that the array indices are used for sorting.

[jerry]$ awk 'BEGIN {
 arr["Two"] = 1
 arr["One"] = 2
 arr["Three"] = 3

 asorti(arr)

 print "Array indices after sorting:"
 for (i in arr) {
  print arr[i]
 }
}'

On executing the above code, you get the following result:

Array indices after sorting:
One
Three
Two

gsub(regex, sub, string)

gsub stands for global substitution. It replaces every occurrence of sub with regex. The third parameter is optional if it is omitted then $0 is used.

[jerry]$ awk 'BEGIN {
 str = "Hello, World"

 print "String before replacement = " str

 gsub("World", "Jerry", str)

 print "String after replacement = " str
}'

On executing the above code, you get the following result:

String before replacement = Hello, World
String after replacement = Hello, Jerry

index(str, sub)

It checks whether sub is a substring of str or not. On success it returns the position where sub starts otherwise it returns 0. The first character of str is in position 1.

[jerry]$ awk 'BEGIN {
 str = "One Two Three"
 subs = "Two"

 ret = index(str, subs)

 printf "Substring \"%s\" found at %d location.\n", subs, ret
}'

On executing the above code, you get the following result:

Substring "Two" found at 5 location.

length(str)

It returns the length of string string.

[jerry]$ awk 'BEGIN {
 str = "Hello, World !!!"

 print "Length = ", length(str)
}'

On executing the above code, you get the following result:

Length = 16

match(str, regex)

It returns the index of the first longest match of regex in string str. It returns 0 if no match found.

[jerry]$ awk 'BEGIN {
 str = "One Two Three"
 subs = "Two"

 ret = match(str, subs)

 printf "Substring \"%s\" found at %d location.\n", subs, ret
}'

On executing the above code, you get the following result:

Substring "Two" found at 5 location.

split(str, arr, regex)

This function splits string str into fields by regular expression regex and the fields are loaded into array arr. If regex is omitted then FS is used.

[jerry]$ awk 'BEGIN {
 str = "One,Two,Three,Four"

 split(str, arr, ",")

 print "Array contains following values"

 for (i in arr) {
  print arr[i]
 }
}'

On executing the above code, you get the following result:

Array contains following values
One
Two
Three
Four

sprintf(format, expr-list)

This function returns a string constructed from expr-list according to format.

[jerry]$ awk 'BEGIN {
 str = sprintf("%s", "Hello, World !!!")

 print str
}'

On executing the above code, you get the following result:

Hello, World !!!

strtonum(str)

This function examines str and return its numeric value. If str begins with a leading 0, treat it as an octal number. If str begins with a leading 0x or 0X, treat it as a hexadecimal number. Otherwise, assume it is a decimal number.

[jerry]$ awk 'BEGIN {
 print "Decimal num = " strtonum("123")
 print "Octal num = " strtonum("0123")
 print "Hexadecimal num = " strtonum("0x123")
}'

On executing the above code, you get the following result:

Decimal num = 123
Octal num = 83
Hexadecimal num = 291

sub(regex, sub, string)

This function performs single substitution. It replaces first occurrence of sub with regex. The third parameter is optional if it omitted, $0 is used.

[jerry]$ awk 'BEGIN {
 str = "Hello, World"

 print "String before replacement = " str

 sub("World", "Jerry", str)

 print "String after replacement = " str
}'

On executing the above code, you get the following result:

String before replacement = Hello, World
String after replacement = Hello, Jerry

substr(str, start, l)

This function returns the substring of string str, starting at index start of length l. If length is omitted, the suffix of str starting at index start is returned.

[jerry]$ awk 'BEGIN {
 str = "Hello, World !!!"
 subs = substr(str, 1, 5)

 print "Substring = " subs
}'

On executing the above code, you get the following result:

Substring = Hello

tolower(str)

This function returns a copy of string str with all upper case characters converted to lower case.

[jerry]$ awk 'BEGIN {
 str = "HELLO, WORLD !!!"

 print "Lowercase string = " tolower(str)
}'

On executing the above code, you get the following result:

Lowercase string = hello, world !!!

toupper(str)

This function returns a copy of string str with all lower case characters converted to upper case.

[jerry]$ awk 'BEGIN {
 str = "hello, world !!!"

 print "Uppercase string = " toupper(str)
}'

On executing the above code, you get the following result:

Uppercase string = HELLO, WORLD !!!

Time Functions

AWK has the following built-in time functions:

systime

This function returns the current time of the day as the number of seconds since the Epoch (1970-01-01 00:00:00 UTC on POSIX systems).

[jerry]$ awk 'BEGIN {
 print "Number of seconds since the Epoch = " systime()
}'

On executing the above code, you get the following result:

Number of seconds since the Epoch = 1418574432

mktime(datespec)

This function converts datespec string into a time stamp of the same form as returned by systime(). The datespec is a string of the form YYYY MM DD HH MM SS.

[jerry]$ awk 'BEGIN {
 print "Number of seconds since the Epoch = " mktime("2014 12 14 30 20 10")
}'

On executing the above code, you get the following result:

Number of seconds since the Epoch = 1418604610

strftime([format [, timestamp[, utc-flag]]])

This function formats timestamps according to the specification in format.

[jerry]$ awk 'BEGIN {
 print strftime("Time = %m/%d/%Y %H:%M:%S", systime())
}'

On executing the above code, you get the following result:

Time = 12/14/2014 22:08:42

Following are the various time formats supported by AWK:

Date format specification	Description
%a	The locale’s abbreviated weekday name.
%A	The locale’s full weekday name.
%b	The locale’s abbreviated month name.
%B	The locale’s full month name.
%c	The locale’s appropriate date and time representation. (This is %A %B %d %T %Y in the C locale.)
%C	The century part of the current year. This is the year divided by 100 and truncated to the next lower integer.
%d	The day of the month as a decimal number (01–31).
%D	Equivalent to specifying %m/%d/%y.
%e	The day of the month, padded with a space if it is only one digit.
%F	Equivalent to specifying %Y-%m-%d. This is the ISO 8601 date format.
%g	The year modulo 100 of the ISO 8601 week number, as a decimal number (00–99). For example, January 1, 1993 is in week 53 of 1992. Thus, the year of its ISO 8601 week number is 1992, even though its year is 1993. Similarly, December 31, 1973 is in week 1 of 1974. Thus, the year of its ISO week number is 1974, even though its year is 1973.
%G	The full year of the ISO week number, as a decimal number.
%h	Equivalent to %b.
%H	The hour (24-hour clock) as a decimal number (00–23).
%I	The hour (12-hour clock) as a decimal number (01–12).
%j	The day of the year as a decimal number (001–366).
%m	The month as a decimal number (01–12).
%M	The minute as a decimal number (00–59).
%n	A newline character (ASCII LF).
%p	The locale’s equivalent of the AM/PM designations associated with a 12-hour clock.
%r	The locale’s 12-hour clock time. (This is %I:%M:%S %p in the C locale.)
%R	Equivalent to specifying %H:%M.
%S	The second as a decimal number (00–60).
%t	A TAB character.
%T	Equivalent to specifying %H:%M:%S.
%u	The weekday as a decimal number (1–7). Monday is day one.
%U	The week number of the year (the first Sunday as the first day of week one) as a decimal number (00–53).
%V	The week number of the year (the first Monday as the first day of week one) as a decimal number (01–53).
%w	The weekday as a decimal number (0–6). Sunday is day zero.
%W	The week number of the year (the first Monday as the first day of week one) as a decimal number (00–53).
%x	The locale’s appropriate date representation. (This is %A %B %d %Y in the C locale.)
%X	The locale’s appropriate time representation. (This is %T in the C locale.)
%y	The year modulo 100 as a decimal number (00–99).
%Y	The full year as a decimal number (e.g. 2011).
%z	The time-zone offset in a +HHMM format (e.g., the format necessary to produce RFC 822/RFC 1036 date headers).
%Z	The time zone name or abbreviation; no characters if no time zone is determinable.

Bit Manipulation Functions

AWK has the following built-in bit manipulation functions:

and

Performs bitwise AND operation.

[jerry]$ awk 'BEGIN {
 num1 = 10
 num2 = 6

 printf "(%d AND %d) = %d\n", num1, num2, and(num1, num2)
}'

On executing the above code, you get the following result:

(10 AND 6) = 2

compl

Performs bitwise COMPLEMENT operation.

[jerry]$ awk 'BEGIN {
 num1 = 10

 printf "compl(%d) = %d\n", num1, compl(num1)
}'

On executing the above code, you get the following result:

compl(10) = 9007199254740981

lshift

Performs bitwise LEFT SHIFT operation.

[jerry]$ awk 'BEGIN {
 num1 = 10

 printf "lshift(%d) by 1 = %d\n", num1, lshift(num1, 1)
}'

On executing the above code, you get the following result:

lshift(10) by 1 = 20

rshift

Performs bitwise RIGHT SHIFT operation.

[jerry]$ awk 'BEGIN {
 num1 = 10

 printf "rshift(%d) by 1 = %d\n", num1, rshift(num1, 1)
}'

On executing the above code, you get the following result:

rshift(10) by 1 = 5

or

Performs bitwise OR operation.

[jerry]$ awk 'BEGIN {
 num1 = 10
 num2 = 6

 printf "(%d OR %d) = %d\n", num1, num2, or(num1, num2)
}'

On executing the above code, you get the following result:

(10 OR 6) = 14

xor

Performs bitwise XOR operation.

[jerry]$ awk 'BEGIN {
 num1 = 10
 num2 = 6

 printf "(%d XOR %d) = %d\n", num1, num2, xor(num1, num2)
}'

On executing the above code, you get the following result:

(10 bitwise xor 6) = 12

Miscellaneous Functions

AWK has the following miscellaneous functions:

close(expr)

This function closes file of pipe.

[jerry]$ awk 'BEGIN {
 cmd = "tr [a-z] [A-Z]"
 print "hello, world !!!" |& cmd
 close(cmd, "to")
 cmd |& getline out
 print out;
 close(cmd);
}'

On executing the above code, you get the following result:

HELLO, WORLD !!!

Does script look cryptic ? Let us demystify it.
first statement, cmd = "tr [a-z] [A-Z]" - is the command to which we are going to establish the two way communication from AWK.
The next statement i.e. print command, provides input to the tr command. Here the &| indicates the two way communication.
The third statement i.e. close(cmd, "to") - closes the to process after competing its execution.
Next statement cmd |& getline out stores the output into out variable with the aid of getline function.
Next print statement prints the output and finally close function closes the command.

delete

This function deletes an element from array. Below simple example shows the usage of the close function:

[jerry]$ awk 'BEGIN {
 arr[0] = "One"
 arr[1] = "Two"
 arr[2] = "Three"
 arr[3] = "Four"

 print "Array elements before delete operation:"
 for (i in arr) {
  print arr[i]
 }

 delete arr[0]
 delete arr[1]

 print "Array elements after delete operation:"
 for (i in arr) {
  print arr[i]
 }
}'

On executing the above code, you get the following result:

Array elements before delete operation:
One
Two
Three
Four

Array elements after delete operation:
Three
Four

exit

This function stops the execution of the script. It also accepts an optional expr which becomes AWK's return value. Below example describes the usage of the exit function.

[jerry]$ awk 'BEGIN {
 print "Hello, World !!!"

 exit 10

 print "AWK never executes this statement."
}'

On executing the above code, you get the following result:

Hello, World !!!

fflush

This function flushes any buffers associated with open output file or pipe. Below is the syntax of the function.

fflush([output-expr])

If no output-expr is supplied, it flushes standard output. If output-expr is the null string ("") then it flushes all open files and pipes.

getline

This function instructs AWK to read next line. Below example reads and displays the marks.txt file using getline function.

[jerry]$ awk '{getline; print $0}' marks.txt

On executing the above code, you get the following result:

2) Rahul Maths 90
4) Kedar English 85
5) Hari History 89

Script worked fine. But where is the first line ? Let us find out it.
At the start up AWK reads first line from the file marks.txt and stores it into $0 variable.
In next statement we are instructing AWK to read next line using getline. Hence AWK reads second line and store it into $0 variable.
And finally AWK's print statement prints the second line. This process continues until file is exhausted.

The next function changes the flow of the program. It causes the current processing of the pattern space to stop. The program reads the next line, and starts executing the commands again with the new line. For instance below program does not perform any processing when pattern match succeeds.

[jerry]$ awk '{if ($0 ~/Shyam/) next; print $0}' marks.txt

On executing the above code, you get the following result:

1) Amit Physics 80
2) Rahul Maths 90
4) Kedar English 85
5) Hari History 89

nextfile

The nextfile function changes the flow of the program. It stop processing the current input file and start new cycle through pattern/procedures statements, beginning with the first record of the next file. For instance below example stops processing of the first file when pattern match succeeds.
First create two files. Our file1.txt looks like as follow:

file1:str1
file1:str2
file1:str3
file1:str4

And our file2.txt looks like as follow:

file2:str1
file2:str2
file2:str3
file2:str4

Now let us use nextfile function.

[jerry]$ awk '{ if ($0 ~ /file1:str2/) nextfile; print $0 }' file1.txt file2.txt

On executing the above code, you get the following result:

file1:str1
file2:str1
file2:str2
file2:str3
file2:str4

return

This function can be used within a user-defined function to return the value. Please note that the return value of a function is undefined if expr is not provided. Below example describes the usage of the return function.
First, create a functions.awk file containing AWK command as shown below:

function addition(num1, num2)
{
 result = num1 + num2

 return result
}

BEGIN {
 res = addition(10, 20)
 print "10 + 20 = " res
}

On executing the above code, you get the following result:

10 + 20 = 30

system

This function executes the specified command and returns its exit status. A return status 0 indicates that command execution succeeded. A non-zero value indicates a failure of command execution. For instance below example displays the current date and also shows the return status of the command.

[jerry]$ awk 'BEGIN { ret = system("date"); print "Return value = " ret }'

On executing the above code, you get the following result:

Sun Dec 21 23:16:07 IST 2014
Return value = 0

Pages

AWK - Built-in Functions

Arithmetic Functions

atan2(y, x)

cos(expr)

exp(expr)

int(expr)

log(expr)

rand

sin(expr)

sqrt(expr)

srand([expr])

String Functions

asort(arr [, d [, how] ])

asorti(arr [, d [, how] ])

gsub(regex, sub, string)

index(str, sub)

length(str)

match(str, regex)

split(str, arr, regex)

sprintf(format, expr-list)

strtonum(str)

sub(regex, sub, string)

substr(str, start, l)

tolower(str)

toupper(str)

Time Functions

systime

mktime(datespec)

strftime([format [, timestamp[, utc-flag]]])

Bit Manipulation Functions

and

compl

lshift

rshift

or

xor

Miscellaneous Functions

close(expr)

delete

exit

fflush

getline

next

nextfile

return

system

No comments:

Post a Comment