We look at some important language features and see how object diagrams can help us to understand how they work and to avoid common programming errors involving them.
Names can refer to a variety of things: local variables (including the formal parameters of methods), instance fields, methods, types, classes, and packages. The basic rule for deciding what kind of thing a name refers to is to find the definition of the name with the smallest scope that includes the use of the name. Different kinds of names have different rule for scope. Local variables are in scope from the point of declaration until the end of the block in which they are declared. Method and field names are in scope throughout their class. Class and interface names are in scope throughout the program unless they are nested inside another class, in which case they are in scope throughout the containing class.
If a name is in the scope of two different declarations at once, the outer declaration
is said to be shadowed by the inner one. Java considers some shadowing to be
illegal. For example, this code will not compile because the variable x
is shadowed inside the while loop:
int x = 2;
while (x != 0) {
int x = 5;
// both x's in scope here.
}
// only outer x in scope here.
One place where shadowing is allowed, often getting programmers into trouble, is when a local variable shadows an instance field. Shadowing often arises with constructors, because it is tempting to name formal parameters in the same way as instance variables:
class Point {
int x, y;
Point(int x, int y) {
// locals x and y shadow instance variables x and y
this.x = x;
this.y = y;
}
}
As the example shows, there is a way to talk about shadowed instance
variables, using the object reference this. The expression
this can only be used inside instance methods (not static methods)
and refers to the current object: in this case, the object being
constructed. Alternatively, you can give the formal parameters different names
(for example, by appending an underscore), and then the instance variables can
be used directly, without the "this".
Like objects, arrays in Java are reference types. The type int[] is the type
of an array of int. One can have arrays of any type.
Since arrays are reference types, we ordinarily create them with the new expression.
Consider the following example:
int[] a = new int[2];
int[] b = new int[] { 10, 20, 30 };
This code creates objects and initializes variables as shown by the following diagram. Notice
that an array has a built-in instance variable called length that cannot be assigned to,
and that defines the number of elements in the array. If the length is \(n\), then the legal
indices into the array are \(0,...,n-1\).
As with objects, the variables a and b contain
references to the arrays rather than the arrays themselves. If we wrote an
assignment a = b;, they would subsequently refer to the same
underlying array; a and b would be aliases.
Each array contains a first slot that keeps track of the
type of the elements in the array, and each has a single immutable instance
variable length that keeps track of the number of elements. The
two elements of a are initialized to the default value of type
int, which is 0.
When declaring and initializing an array of type T[] for some T, an abbreviated syntax
is allowed in which the usual new T[] is omitted. For example, the previous declaration
of b could be written more compactly:
int[] b = { 10, 20, 30 };
We can't always initialize arrays completely in the declaration. To initialize large arrays,
it is common to use loops. The for loop is a useful statement in
Java. For example, suppose we want to initialize an array of Points, where
the class Point is defined as:
class Point {
int x, y;
Point(int x, int y) {
this.x = x;
this.y = y;
}
}
Here is a loop that does that:
for_loop.java
This code creates an array whose entries are references to a series of newly
created Point objects:
The for loop repeatedly executes its body (the code in braces)
until its guard (here i < n) becomes false. It has an interesting syntax. There are three
clauses in the parentheses, separated by semicolons. The first clause (here int i = 0) is the
loop initializer. It is executed once at the beginning of the loop and may include a
variable declaration, as in the example. The second clause is the loop guard. It is evaluated at
the beginning of every loop iteration, including the first. The loop terminates immediately if it evaluates
to false. The third clause is the increment statement. It is
executed at the end of every iteration of the body.
Two other statements used in conjunction with loops are the break
and continue statements. If a break statement in the
body of a loop is executed, the loop terminates immediately. The
continue statement does not terminate the loop altogether, but
causes the current execution of the loop body to end immediately.
The increment is performed and the guard is checked, and if the guard is still true,
the body is executed again from the top.
A multidimensional array is really an array of references to arrays. For example, consider the following code that creates a two-dimensional array (aka matrix):
int[][] m = {{10, 20, 30}, {40, 50, 60}};
For many purposes, we can think of this code as truly creating a 2D array that could, for example, be used as a matrix in linear algebra computations:
However, the code actually creates three objects; a 2-dimensional array is really an array of arrays:
Java does not enforce that m represents
a rectangular matrix. For example, we can change the length of one of its rows:
m[1] = new int[1];
which would make the first row have length 1. We can even make the rows alias each other:
m[1] = m[0];
Since arrays are objects, we can put a array value into a variable whose type is declared to
be Object:
Object a = new int[] {10, 20};
This also means that we can create an array of objects and store any object, including arrays, into it:
Object[] b = new Object[] {a, a};
b[0] = b; // !
While this code is legal, it is certainly confusing and not a good example of how to write code!
Strings in Java are also objects, which leads to some surprises. The evaluation of
a string literal like "Hello" actually causes a call to a
constructor of the class String, resulting in an object that is an
instance of the class. For example, the code on the left has the effect on the right:
String x = "Hello"; String y = x; String z = y + "World"; String w = y + "World";
The operator + denotes concatenation when applied to strings.
It creates new string objects. Notice that the variables z
and w are initialized to refer to string objects that have exactly the same
state, but are actually different objects. Since strings are immutable (they cannot be
changed after they are created), the fact that they are different objects normally does
not matter.
Strings support
a large selection of useful methods. One such method is charAt,
which returns the character at a given position in the string, starting at 0.
For example, the expression z.charAt(1) evaluates to the character
'e', and the same is true for w.
Although the objects referenced by z and w represent the
same string literal "HelloWorld", they are different objects and
can be distinguished using the == operator; that is, the
result of z == w is false. This is because the
== operator on reference types simply returns whether the operands
reference the same object. This is almost never what we want when comparing two strings.
We almost always want to test whether the two string literals
are the same. For this, one should use the String
object's equals method. For example, the expression z.equals(w) would
evaluate to true.
This also applies to other object values. A comparison using equals
is more likely what you want. All objects have an equals method. Think twice before using == on an object.
Based on the discussion of strings above, it is natural to think that strings
are very special objects in Java. But they aren't: the only truly special thing
about strings is that string objects can be created using the convenient
quotation mark syntax. The object diagram above is a bit of a white lie,
because strings are actually implemented using arrays of characters. For
example, the string "Hello" is really implemented as two
objects as shown in the object diagram on right. The String object
contains an instance variable value that refers to the array of
characters making up the string.
Since the entries in the character array never change, you have to work pretty
hard to figure out what a string in Java really is, because you can
only access strings through the operations of the String class.
And that is a Good Thing, because it means that the designers of Java can
change the way strings work in future versions of Java without breaking all the
existing programs. In fact, the implementation of the String class
has changed significantly in the past few versions of Java, so even this
object diagram is a white lie.
Sometimes we want to use a primitive type like int where a
reference type is expected, and vice versa. For example, a variable of
type Object can refer to any object, but cannot refer directly to a primitive value.
To address this issue, Java introduced a set of eight classes corresponding to the eight primitive types, as shown in the following table.
| primitive type | corresponding reference type |
|---|---|
int | Integer | boolean | Boolean
| short | Short
| byte | Byte
| char | Char
| float | Float
| double | Double
| long | Long
|
equals method to compare them.
In addition, Java will automatically "box" primitive values into the corresponding reference type when necessary, and will automatically unbox them in the other direction, too. This feature is called autoboxing. It can have some counterintuitive effects, however. For example, consider this code:
Integer i = 200; Object l = i; int j = i; Object k = j; System.out.println(i == j); // true System.out.println(i == l); // true System.out.println(j == k); // static error: can't compare Object and int. System.out.println(l == k); // false! System.out.println(l.equals(k)); // true
There are a couple of surprises here. First, the compiler does not let us compare
j and k. Autoboxing causes j to be boxed into an Integer object, but
the static type of k is Object, so the Java compiler does not
know that k can be unboxed into an int.
Another surprise is the test l == k.
Since l and k are different objects representing the
same number, they compare as unequal. As with strings, we should use the
equals method to compare values of type Integer.
Curiously, changing the number 200 to anything between
-128 and 127 will cause the code above to report true for l
== k. This happens because there is a table of Integer
objects that is used only for small integers. Autoboxing is performed by the
method Integer.valueOf, which uses this table when it can and only
resorts to new for larger integers.
One moral of the story, again, is that we should use the equals method
to compare Integer objects.
Even though expression l == k is false,
the expression l.equals(k) is true.
Clearly, the assignment j=i is doing more than just an assignment.
In fact, it's really executing the following code: j = i.intValue().
The intValue() method extracts the int value from the
Integer object. This is an example of syntactic sugar, in
which the language permits us to abbreviate how we write code.
Conversely, if we assigned i=j, this would be syntactic sugar for
i = Integer.valueOf(j), which calls a method that depending on
the value of j either looks up an appropriate preexisting
object in a table, or creates a new Integer object.
Calls to the valueOf and intValue methods are
automatically inserted by the Java compiler to implement boxing and unboxing.
Similar methods exist for the other primitive types.