Go forward to Variables.
Go backward to Expressions.
Go up to Expressions.
Constant Expressions
====================
The simplest type of expression is the "constant", which always has
the same value. There are three types of constants: numeric constants,
string constants, and regular expression constants.
A "numeric constant" stands for a number. This number can be an
integer, a decimal fraction, or a number in scientific (exponential)
notation. Note that all numeric values are represented within `awk' in
double-precision floating point. Here are some examples of numeric
constants, which all have the same value:
105
1.05e+2
1050e-1
A string constant consists of a sequence of characters enclosed in
double-quote marks. For example:
"parrot"
represents the string whose contents are `parrot'. Strings in `gawk'
can be of any length and they can contain all the possible 8-bit ASCII
characters including ASCII NUL. Other `awk' implementations may have
difficulty with some character codes.
Some characters cannot be included literally in a string constant.
You represent them instead with "escape sequences", which are character
sequences beginning with a backslash (`\').
One use of an escape sequence is to include a double-quote character
in a string constant. Since a plain double-quote would end the string,
you must use `\"' to represent a single double-quote character as a
part of the string. The backslash character itself is another
character that cannot be included normally; you write `\\' to put one
backslash in the string. Thus, the string whose contents are the two
characters `"\' must be written `"\"\\"'.
Another use of backslash is to represent unprintable characters such
as newline. While there is nothing to stop you from writing most of
these characters directly in a string constant, they may look ugly.
Here is a table of all the escape sequences used in `awk':
`\\'
Represents a literal backslash, `\'.
`\a'
Represents the "alert" character, control-g, ASCII code 7.
`\b'
Represents a backspace, control-h, ASCII code 8.
`\f'
Represents a formfeed, control-l, ASCII code 12.
`\n'
Represents a newline, control-j, ASCII code 10.
`\r'
Represents a carriage return, control-m, ASCII code 13.
`\t'
Represents a horizontal tab, control-i, ASCII code 9.
`\v'
Represents a vertical tab, control-k, ASCII code 11.
`\NNN'
Represents the octal value NNN, where NNN are one to three digits
between 0 and 7. For example, the code for the ASCII ESC (escape)
character is `\033'.
`\xHH...'
Represents the hexadecimal value HH, where HH are hexadecimal
digits (`0' through `9' and either `A' through `F' or `a' through
`f'). Like the same construct in ANSI C, the escape sequence
continues until the first non-hexadecimal digit is seen. However,
using more than two hexadecimal digits produces undefined results.
(The `\x' escape sequence is not allowed in POSIX `awk'.)
A "constant regexp" is a regular expression description enclosed in
slashes, such as `/^beginning and end$/'. Most regexps used in `awk'
programs are constant, but the `~' and `!~' operators can also match
computed or "dynamic" regexps (see How to Use Regular Expressions: Regexp Usage.).
Constant regexps may be used like simple expressions. When a
constant regexp is not on the right hand side of the `~' or `!~'
operators, it has the same meaning as if it appeared in a pattern, i.e.
`($0 ~ /foo/)' (see Expressions as Patterns: Expression Patterns.).
This means that the two code segments,
if ($0 ~ /barfly/ || $0 ~ /camelot/)
print "found"
and
if (/barfly/ || /camelot/)
print "found"
are exactly equivalent. One rather bizarre consequence of this rule is
that the following boolean expression is legal, but does not do what
the user intended:
if (/foo/ ~ $1) print "found foo"
This code is "obviously" testing `$1' for a match against the regexp
`/foo/'. But in fact, the expression `(/foo/ ~ $1)' actually means
`(($0 ~ /foo/) ~ $1)'. In other words, first match the input record
against the regexp `/foo/'. The result will be either a 0 or a 1,
depending upon the success or failure of the match. Then match that
result against the first field in the record.
Since it is unlikely that you would ever really wish to make this
kind of test, `gawk' will issue a warning when it sees this construct in
a program.
Another consequence of this rule is that the assignment statement
matches = /foo/
will assign either 0 or 1 to the variable `matches', depending upon the
contents of the current input record.
Constant regular expressions are also used as the first argument for
the `sub' and `gsub' functions (*note Built-in Functions for String
Manipulation: String Functions.).
This feature of the language was never well documented until the
POSIX specification.
You may be wondering, when is
$1 ~ /foo/ { ... }
preferable to
$1 ~ "foo" { ... }
Since the right-hand sides of both `~' operators are constants, it
is more efficient to use the `/foo/' form: `awk' can note that you have
supplied a regexp and store it internally in a form that makes pattern
matching more efficient. In the second form, `awk' must first convert
the string into this internal form, and then perform the pattern
matching. The first form is also better style; it shows clearly that
you intend a regexp match.