Java Execution Model: Arrays, Strings, Autoboxing

We look at some important language features and see how object diagrams can help us to understand how they work and to avoid common programming errors involving them.

Arrays

Like objects, arrays in Java are boxed values. The type int[] is the type of an array of int, and any type can be substituted for the int to obtain a corresponding array type.

Since arrays are boxes, we ordinarily create them with the new expression. Consider the following example:

int[] a = new int[2];
int[] b = new int[] { 10, 20, 30 };

This code creates objects and initializes variables as shown by the following diagram.

As with objects, the variables a and b contain references to the arrays rather than the arrays themselves. If we wrote an assignment a = b;, they would subsequently refer to the same underlying array; a and b would be aliases. Each array contains a first slot that keeps track of the type of the elements in the array, and each has a single immutable instance variable length that keeps track of the number of elements. The two elements of a are initialized to the default value of type int, which is 0.

When declaring and initializing an array of type T[] for some T, an abbreviated syntax is allowed in which the usual new T[] is omitted. For example, the previous declaration of b could be written more compactly:

int[] b = { 10, 20, 30 };

In general, we can't construct arrays completely at declaration time. To initialize them, it is common to use loops. The for loop is a useful statement in Java. For example, here is a loop that initializes an array of Points, where the class Point is defined as in the previous lecture:

for_loop.java

This code creates an array whose entries are references to a series of newly created Point objects:

object diagram after loop initialization

The for loop repeatedly executes its body (here, in braces) until a condition is false. It has an interesting syntax. There are three clauses in the parentheses, separated by semicolons. The first clause is the loop initializer. It is executed once at the beginning of the loop and may be a variable declaration. The second clause is the loop guard, It is evaluated at the beginning of every loop iteration, and the loop terminates if it evaluates to false. The third clause is the increment statement. It is executed at the end of every loop iteration.

Another way to exit from a loop is to use the break statement. It immediately terminates the closest enclosing loop. The less frequently used continue statement causes the current loop iteration to end and the next loop iteration to begin immediately (although the increment statement is still executed, and the guard is still checked.)

Multidimensional arrays

A multidimensional array is really an array of references to arrays. For example, consider the following code that creates a two-dimensional array (aka matrix):

int[][] m = {{10, 20, 30}, {40, 50, 60}};

This code actually creates three objects:

Java does not try very hard to ensure that m continues to represent a nice rectangular matrix. For example, we can change the length of one of its rows:

m[1] = new int[1];

Or we can even make the rows alias each other!

m[1] = m[0];

Strings

Strings in Java are really objects, which leads to some surprises for programmers. A string literal like "Hello" actually causes a call to a constructor for String, resulting in an object. For example, the code on the left has the effect on the right:

String x = "Hello";
String y = x;
String z = y + "World";
String w = y + "World";

The operator + denotes concatenation when applied to strings, rather than addition. It creates new string objects. Notice that variables z and w are initialized to refer to string objects that have exactly the same state, but are actually different objects. Since strings are immutable (they cannot be changed after they are created), the fact that they are different objects normally does not matter.

Strings support a large selection of useful methods. For example, one such method is charAt, which returns the character at a given position in the string. For example, the expression z.charAt(1) evaluates to the character 'e', and the same is true for w.

The strings referenced by z and w can be distinguished in one way, however. If they are compared using the == operator, the result of z == w is false. This happens because the == operator on boxed values simply returns whether the operands are the same box (that is, the same object). Probably this isn't what we want when we compare two strings!

Therefore, when comparing two objects generally, and strings particularly, you should almost always use the equals method, which returns whether two objects should be considered interchangeable. The expression z.equals(w) evaluates to true, as we'd like. Think twice before you use == on object values.

object diagram of string with implementation

Based on the discussion of strings above, it is tempting to think that strings are very special objects in Java. Actually, they aren't: the only special thing about strings is that string objects can be created using the convenient quotation mark syntax. The object diagram above is a bit of a white lie, because strings are actually implemented using arrays of characters. For example, the string "Hello" is really implemented as two objects as shown in the object diagram on right. The String object contains an instance variable value that refers to the array of characters making up the string.

Since the entries in the character array never change, you have to work pretty hard to figure out that is what a string really is in Java, because you can only access strings through the operations of the String class. And that is a Good Thing, because it means that the designers of Java can change the way strings work in future versions of Java without breaking all the existing programs! In fact, the implementation of Strings has changed significantly in the past few versions of Java, so even this object diagram is a white lie.

Autoboxing

Sometimes we want to use an unboxed value like an int where a boxed value is expected. For example, a variable of the type Object can refer to any object, but can't refer directly to a primitive value.

To address this issue, Java introduces a set of classes corresponding to the primitive values. For int there is Integer, for boolean there is Boolean, for double, Double, and so on. Each of these classes defines objects that contain a value of the appropriate primitive type, and define equals to compare state.

In addition, Java will automatically box primitive values into the corresponding object type when necessary, and will automatically unbox them in the other direction, too. This feature is called autoboxing. It can have some counterintuitive effects, however. For example, consider this code:

Integer i = 200;
Object l = i;
int j = i;
Object k = j;
i == j // true
i == l // true
j == k // static error: can't compare Object and int.
i == k // false!

There are a couple of surprises here: first, the compiler does not let us compare j and k. Autoboxing causes j to be boxed into an Integer object, but the static type of k is Object, so the Java compiler does not know that k can be unboxed into an int.

Another surprise is the last line of code. Since i and k are different objects representing the same number, they compare as unequal. As with strings, we should use the equals method to compare values of type Integer.

Perhaps even more surprisingly, changing the number 200 to anything between -128 and 127 will cause the code above to report true for i == k. This happens because there is a table of Integer objects that is used only for small integers. Autoboxing is performed by the method Integer.valueOf, which uses this table when it can and only resorts to new for larger integers.

One moral of the story, again, is that to compare to Integer objects, we need to use the equals() method on the objects. Even though expression i==k is false, the expression i.equals(j) is true.

Clearly, the assignment j=i is doing more than just an assignment. In fact, it's really executing the following code: j = i.intValue(). The intValue() method extracts the int value from the Integer object. This is an example of syntactic sugar, in which the language permits us to abbreviate how we write code. Conversely, if we assigned i=j, this would be syntactic sugar for i = Integer.valueOf(j), which calls a method that depending on the value of j either looks up an appropriate preexisting object in a table, or creates a new Integer object. Calls to the valueOf and intValue methods are automatically inserted by the Java compiler to implement boxing and unboxing. Similar methods exist for the other primitive types.

Names and scope

Names can refer to a variety of things: local variables (including formal parameters), instance variables (aka fields), methods, types, classes, and packages. The basic rule for deciding what kind of thing a name refers to is to find the definition of the name with the smallest scope that includes the use of the name. Different kinds of names have different rule for scope. Local variables are in scope from the point of declaration until the end of the block in which they are declared. Method and field names are in scope throughout their class. Class and interface names are in scope throughout the program unless they are nested inside another class, in which case they are in scope throughout the containing class.

If a name is in the scope of two different declarations at once, the outer declaration is said to be shadowed by the inner one. Java considers some shadowing to be illegal. For example, this code will not compile because the variable x is shadowed inside the while loop:

int x = 2;
while (x != 0) {
    int x = 5;
    // both x's in scope here.
}
// only outer x in scope here.

One place where shadowing is allowed, often getting programmers into trouble is when a local variable shadows an instance variable. This often arises with constructors, because it is tempting to name formal parameters in the same way as instance variables:

class Point {
    int x, y;
    Point(int x, int y) {
	// locals x and y shadow instance variables x and y
	this.x = x;
	this.y = y;
    }
}

As the example shows, there is a way to talk about shadowed instance variables, using the object reference this. The expression this can only be used inside instance methods (not static methods) and refers to the current receiver object: in this case, the object being constructed.