Java Execution Model: Arrays, Strings, Autoboxing

We look at some important language features and see how object diagrams can help us to understand how they work and to avoid common programming errors involving them.

Names and scope

Names can refer to a variety of things: local variables (including the formal parameters of methods), instance fields, methods, types, classes, and packages. The basic rule for deciding what kind of thing a name refers to is to find the definition of the name with the smallest scope that includes the use of the name. Different kinds of names have different rule for scope. Local variables are in scope from the point of declaration until the end of the block in which they are declared. Method and field names are in scope throughout their class. Class and interface names are in scope throughout the program unless they are nested inside another class, in which case they are in scope throughout the containing class.

If a name is in the scope of two different declarations at once, the outer declaration is said to be shadowed by the inner one. Java considers some shadowing to be illegal. For example, this code will not compile because the variable x is shadowed inside the while loop:

int x = 2;
while (x != 0) {
    int x = 5;
    // both x's in scope here.
}
// only outer x in scope here.

One place where shadowing is allowed, often getting programmers into trouble, is when a local variable shadows an instance field. Shadowing often arises with constructors, because it is tempting to name formal parameters in the same way as instance variables:

class Point {
    int x, y;
    Point(int x, int y) {
	// locals x and y shadow instance variables x and y
	this.x = x;
	this.y = y;
    }
}

As the example shows, there is a way to talk about shadowed instance variables, using the object reference this. The expression this can only be used inside instance methods (not static methods) and refers to the current object: in this case, the object being constructed. Alternatively, you can give the formal parameters different names (for example, by appending an underscore), and then the instance variables can be used directly, without the "this".

Arrays

Like objects, arrays in Java are reference types. The type int[] is the type of an array of int. One can have arrays of any type.

Since arrays are reference types, we ordinarily create them with the new expression. Consider the following example:

int[] a = new int[2];
int[] b = new int[] { 10, 20, 30 };

This code creates objects and initializes variables as shown by the following diagram. Notice that an array has a built-in instance variable called length that cannot be assigned to, and that defines the number of elements in the array. If the length is \(n\), then the legal indices into the array are \(0,...,n-1\).

As with objects, the variables a and b contain references to the arrays rather than the arrays themselves. If we wrote an assignment a = b;, they would subsequently refer to the same underlying array; a and b would be aliases. Each array contains a first slot that keeps track of the type of the elements in the array, and each has a single immutable instance variable length that keeps track of the number of elements. The two elements of a are initialized to the default value of type int, which is 0.

When declaring and initializing an array of type T[] for some T, an abbreviated syntax is allowed in which the usual new T[] is omitted. For example, the previous declaration of b could be written more compactly:

int[] b = { 10, 20, 30 };

We can't always initialize arrays completely in the declaration. To initialize large arrays, it is common to use loops. The for loop is a useful statement in Java. For example, suppose we want to initialize an array of Points, where the class Point is defined as:

class Point {
   int x, y;
   Point(int x, int y) {
      this.x = x;
      this.y = y;
   }
}
Here is a loop that does that:

for_loop.java

This code creates an array whose entries are references to a series of newly created Point objects:

object diagram after loop initialization

The for loop repeatedly executes its body (the code in braces) until its guard (here i < n) becomes false. It has an interesting syntax. There are three clauses in the parentheses, separated by semicolons. The first clause (here int i = 0) is the loop initializer. It is executed once at the beginning of the loop and may include a variable declaration, as in the example. The second clause is the loop guard. It is evaluated at the beginning of every loop iteration, including the first. The loop terminates immediately if it evaluates to false. The third clause is the increment statement. It is executed at the end of every iteration of the body.

Two other statements used in conjunction with loops are the break and continue statements. If a break statement in the body of a loop is executed, the loop terminates immediately. The continue statement does not terminate the loop altogether, but causes the current execution of the loop body to end immediately. The increment is performed and the guard is checked, and if the guard is still true, the body is executed again from the top.

Multidimensional arrays

A multidimensional array is really an array of references to arrays. For example, consider the following code that creates a two-dimensional array (aka matrix):

int[][] m = {{10, 20, 30}, {40, 50, 60}};

For many purposes, we can think of this code as truly creating a 2D array that could, for example, be used as a matrix in linear algebra computations:

However, the code actually creates three objects; a 2-dimensional array is really an array of arrays:

Java does not enforce that m represents a rectangular matrix. For example, we can change the length of one of its rows:

m[1] = new int[1];

which would make the first row have length 1. We can even make the rows alias each other:

m[1] = m[0];

Arrays as objects

Since arrays are objects, we can put a array value into a variable whose type is declared to be Object:

Object a = new int[] {10, 20};

This also means that we can create an array of objects and store any object, including arrays, into it:

Object[] b = new Object[] {a, a};
b[0] = b; // !

While this code is legal, it is certainly confusing and not a good example of how to write code!

Strings

Strings in Java are also objects, which leads to some surprises. The evaluation of a string literal like "Hello" actually causes a call to a constructor of the class String, resulting in an object that is an instance of the class. For example, the code on the left has the effect on the right:

String x = "Hello";
String y = x;
String z = y + "World";
String w = y + "World";

The operator + denotes concatenation when applied to strings. It creates new string objects. Notice that the variables z and w are initialized to refer to string objects that have exactly the same state, but are actually different objects. Since strings are immutable (they cannot be changed after they are created), the fact that they are different objects normally does not matter.

Strings support a large selection of useful methods. One such method is charAt, which returns the character at a given position in the string, starting at 0. For example, the expression z.charAt(1) evaluates to the character 'e', and the same is true for w.

Although the objects referenced by z and w represent the same string literal "HelloWorld", they are different objects and can be distinguished using the == operator; that is, the result of z == w is false. This is because the == operator on reference types simply returns whether the operands reference the same object. This is almost never what we want when comparing two strings. We almost always want to test whether the two string literals are the same. For this, one should use the String object's equals method. For example, the expression z.equals(w) would evaluate to true.

This also applies to other object values. A comparison using equals is more likely what you want. All objects have an equals method. Think twice before using == on an object.

Based on the discussion of strings above, it is natural to think that strings are very special objects in Java. But they aren't: the only truly special thing about strings is that string objects can be created using the convenient quotation mark syntax. The object diagram above is a bit of a white lie, because strings are actually implemented using arrays of characters. For example, the string "Hello" is really implemented as two objects as shown in the object diagram on right. The String object contains an instance variable value that refers to the array of characters making up the string.

Since the entries in the character array never change, you have to work pretty hard to figure out what a string in Java really is, because you can only access strings through the operations of the String class. And that is a Good Thing, because it means that the designers of Java can change the way strings work in future versions of Java without breaking all the existing programs. In fact, the implementation of the String class has changed significantly in the past few versions of Java, so even this object diagram is a white lie.

Autoboxing

Sometimes we want to use a primitive type like int where a reference type is expected, and vice versa. For example, a variable of type Object can refer to any object, but cannot refer directly to a primitive value.

To address this issue, Java introduced a set of eight classes corresponding to the eight primitive types, as shown in the following table.

primitive type corresponding reference type
int Integer
boolean Boolean
short Short
byte Byte
char Char
float Float
double Double
long Long
Each of these classes defines objects that contain a value of the appropriate primitive type, and define an equals method to compare them.

In addition, Java will automatically "box" primitive values into the corresponding reference type when necessary, and will automatically unbox them in the other direction, too. This feature is called autoboxing. It can have some counterintuitive effects, however. For example, consider this code:

Integer i = 200;
Object l = i;
int j = i;
Object k = j;
System.out.println(i == j); // true
System.out.println(i == l); // true
System.out.println(j == k); // static error: can't compare Object and int.
System.out.println(l == k); // false!
System.out.println(l.equals(k)); // true

There are a couple of surprises here. First, the compiler does not let us compare j and k. Autoboxing causes j to be boxed into an Integer object, but the static type of k is Object, so the Java compiler does not know that k can be unboxed into an int.

Another surprise is the test l == k. Since l and k are different objects representing the same number, they compare as unequal. As with strings, we should use the equals method to compare values of type Integer.

Curiously, changing the number 200 to anything between -128 and 127 will cause the code above to report true for l == k. This happens because there is a table of Integer objects that is used only for small integers. Autoboxing is performed by the method Integer.valueOf, which uses this table when it can and only resorts to new for larger integers.

One moral of the story, again, is that we should use the equals method to compare Integer objects. Even though expression l == k is false, the expression l.equals(k) is true.

Clearly, the assignment j=i is doing more than just an assignment. In fact, it's really executing the following code: j = i.intValue(). The intValue() method extracts the int value from the Integer object. This is an example of syntactic sugar, in which the language permits us to abbreviate how we write code. Conversely, if we assigned i=j, this would be syntactic sugar for i = Integer.valueOf(j), which calls a method that depending on the value of j either looks up an appropriate preexisting object in a table, or creates a new Integer object. Calls to the valueOf and intValue methods are automatically inserted by the Java compiler to implement boxing and unboxing. Similar methods exist for the other primitive types.