There's no shame in being a (g)awk user

(Not) bugs

Here's a little awk program:

    END{

      for (x=.14; x<=.15; x+=.01) {
        print "x is", x;
      }

      for (y=.04; y<=.05; y+=.01) {
        print "y is", y;
      }
    }

What do you expect its output to be? Well, on quite a few different machines, it's:

    % awk 'END{  for (x=.14; x<=.15; x+=.01) {    print "x is", x;   }
        for (y=.04; y<=.05; y+=.01) {  print "y is", y;}}' /dev/null
    x is 0.14
    y is 0.04
    y is 0.05

What explains this? (It's not awk-specific, by the way; try the equivalent C program, for instance.)

Solution hint: how are numbers represented on a computer? (I am a fan of CS322.)

Here's a little awk thing:
```
     echo "the unix" | awk '/[W-Z]/{print "yes"}'
```
where you might not expect any output. And yet, on my home linux machine,
```
    % echo "the unix" | awk '/[W-Z]/{print "yes"}'
    yes
```
Huh? Diagnostics:
```
    % awk --version | head -1
    GNU Awk 3.1.3
```
Running at Cornell on an entirely different linux machine:
```
    % echo "the unix" | awk '/[W-Z]/{print "yes"}'
    yes
```
Back home:
```
    % echo "the unix" | awk '/[X-Z]/{print "yes"}'
    %
```
Wha?!! And then, on a mac laptop:
```
 
   % echo "the unix" | awk '/[W-Z]/{print "yes"}'
    %
```
Whoa. Is it a mac vs. unix thing?
Solution: A Google search for gawk problem uppercase range yields a promising-snippet page:
http://ftp.wayne.edu/pub/gnu/Manuals/gawk-3.1.0/html_chapter/gawk_4.html
However, I get 404'ed. Luckily, there's still the cached version around (thank you, Google!): http://216.239.51.104/search?q=cache:JAQwH7uqCbwJ:ftp.wayne.edu/pub/gnu/Manuals/gawk-3.1.0/html_chapter/gawk_4.html+gawk+problem+uppercase+range&hl=en&ct=clnk&cd=18&gl=us So one can search around for a "clean" version, e.g.
http://www.delorie.com/gnu/docs/gawk/gawk_29.html
which is the gawk user's guide. It says:
Within a character list, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, using the locale's collating sequence and character set. For example, in the default C locale, `[a-dx-z]' is equivalent to `[abcdxyz]'. Many locales sort characters in dictionary order, and in these locales, `[a-dx-z]' is typically not equivalent to `[abcdxyz]'; instead it might be equivalent to `[aBbCcDdxXyYz]', for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value `C' [or POSIX is a possible choice, too -- LL.].
Indeed,
```
  % echo "the unix" | awk '/[WXYZ]/{print "yes"}'
  %
  % echo 3 | awk '{printf("X\nx\nW\nY\nZ\n")}' | sort
  W
  x
  X
  Y
  Z
  %
  % setenv LC_COLLATE POSIX
  %  echo 3 | awk '{printf("X\nx\nW\nY\nZ\n")}' |
  sort
  W
  X
  Y
  Z
  x
  % echo "the unix" | awk '/[W-Z]/{print "yes"}'
  %
```
And this is the behavior one had hoped for ...
Phew!

Back to Lillian Lee's home page.