END{
for (x=.14; x<=.15; x+=.01) {
print "x is", x;
}
for (y=.04; y<=.05; y+=.01) {
print "y is", y;
}
}
What do you expect its output to be? Well, on quite a few different
machines, it's:
% awk 'END{ for (x=.14; x<=.15; x+=.01) { print "x is", x; }
for (y=.04; y<=.05; y+=.01) { print "y is", y;}}' /dev/null
x is 0.14
y is 0.04
y is 0.05
What explains this? (It's not awk-specific, by the way; try the
equivalent C program, for instance.)
Solution hint: how are numbers represented on a computer? (I am a fan of CS322.)
echo "the unix" | awk '/[W-Z]/{print "yes"}'
where you might not expect any output.
And yet, on my home linux machine,
% echo "the unix" | awk '/[W-Z]/{print "yes"}'
yes
Huh?
Diagnostics:
% awk --version | head -1
GNU Awk 3.1.3
Running at Cornell on an entirely different linux machine:
% echo "the unix" | awk '/[W-Z]/{print "yes"}'
yes
Back home:
% echo "the unix" | awk '/[X-Z]/{print "yes"}'
%
Wha?!!
And then, on a mac laptop:
% echo "the unix" | awk '/[W-Z]/{print "yes"}'
%
Whoa. Is it a mac vs. unix thing?
Solution: A Google search for gawk problem uppercase range yields a promising-snippet page:
http://ftp.wayne.edu/pub/gnu/Manuals/gawk-3.1.0/html_chapter/gawk_4.htmlHowever, I get 404'ed. Luckily, there's still the cached version around (thank you, Google!): http://216.239.51.104/search?q=cache:JAQwH7uqCbwJ:ftp.wayne.edu/pub/gnu/Manuals/gawk-3.1.0/html_chapter/gawk_4.html+gawk+problem+uppercase+range&hl=en&ct=clnk&cd=18&gl=us So one can search around for a "clean" version, e.g.
http://www.delorie.com/gnu/docs/gawk/gawk_29.htmlwhich is the gawk user's guide. It says:
Within a character list, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, using the locale's collating sequence and character set. For example, in the default C locale, `[a-dx-z]' is equivalent to `[abcdxyz]'. Many locales sort characters in dictionary order, and in these locales, `[a-dx-z]' is typically not equivalent to `[abcdxyz]'; instead it might be equivalent to `[aBbCcDdxXyYz]', for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value `C' [or POSIX is a possible choice, too -- LL.].Indeed,
% echo "the unix" | awk '/[WXYZ]/{print "yes"}'
%
% echo 3 | awk '{printf("X\nx\nW\nY\nZ\n")}' | sort
W
x
X
Y
Z
%
% setenv LC_COLLATE POSIX
% echo 3 | awk '{printf("X\nx\nW\nY\nZ\n")}' |
sort
W
X
Y
Z
x
% echo "the unix" | awk '/[W-Z]/{print "yes"}'
%
And this is the behavior one had hoped for ...
Phew!
Back to Lillian Lee's home page.