2. Reference Types and Semantics
In our first lecture, we saw the importance of understanding the types of expressions in our Java programs. As a statically typed language, Java requires programmers to explicitly declare the types of all variables, parameters, and method return values, and its compiler checks these types and alerts us to errors before the code is run. We also discussed Java’s primitive types, the basic numerical and boolean types to which the language’s built in arithmetical and logical operations can be applied. While these primitive types play an important role, they are insufficient to provide support for the full suite of features that we expect from a programming language. We will need more types, such as a type for Strings and array types for aggregating data. All other types beyond the eight primitive types we introduced last lecture are reference types and have different syntax and semantics (behavior) in Java. We’ll discuss some of the reference types that Java provides in today’s lecture. In upcoming lectures, we’ll see how we can create and use our own reference types by defining new classes.
Objects
As we discussed last lecture, a type is characterized by the set of possible values it can be assigned (its state) and the ways that it can be used in our program (its behaviors). Non-primitive types can have more complicated states and behaviors which are described in that type’s class.
A class is a unit of code that specifies the blueprint for a non-primitive type, including its state and its available behaviors.
While the state of a primitive type consists of a fixed number of bits representing a single number or character, we’ll see that non-primitive types may store multiple pieces of information and take up different amounts of memory. We call a realization of a non-primitive type an object.
An object is an instance of a non-primitive type.
One example of an object is a String, a unit of text that consists of an ordered sequence of characters. The String class is defined as part of the Java language and described on one of Java’s documentation webpages. Note that class names (so also the names of all reference types) are capitalized.
Since Strings can have different lengths (the number of characters in the String), their size in memory may vary. As such, the compiler cannot pre-determine the amount of space that a String would occupy within a call frame. Instead, it finds a place for the String in a separate region of memory called the heap and accesses this String through a reference to its memory location. This is what gives rise to the name reference types.
Non-primitive types in Java are referred to as reference types. Objects of reference types are allocated on the memory heap. A reference to the object (i.e., the heap memory location where the object is located) is stored in a variable on the runtime stack.
The process for creating an instance of a reference type (i.e., an object) is called construction.
Construction
Suppose that we want to create a String variable to store the user’s first name. We can declare this variable using the same syntax as for primitive types, choosing a name for the variable, firstName
and preceding this name with the static type that the variable will hold: String firstName
. To initialize this variable, we must create a String instance to assign to firstName
. We do this using the new
keyword and calling a constructor of the String class, a method whose name is the same as the reference type (e.g., String
) and whose parameters are described in the constructor’s documentation. In this case, we can supply a String literal, the sequence of characters of the String we are creating, writing:
|
|
|
|
When we execute this line of code, a lot happens behind the scenes. On the LHS of this assignment, the declaration String firstName
tells Java to create space on the runtime stack (within the call frame of the current method) for a String
variable. Since this is a reference type, the size of this variable is 8 bytes, the length of one memory reference. Now, we evaluate the expression on the RHS of the assignment statement, new String("Matt");
. This new
expression tells Java to find space for a String
object on the memory heap. It looks at the String
class to see how much space is necessary and then calls the String
constructor to set up the object’s state. The return value of this new
expression is a memory reference, a representation of where on the heap this new String object is located. This is the information that is stored within the String
variable firstName
.
In our memory diagrams, we draw objects using rounded rectangles and label their type above the object. Inside of the rounded rectangle, we include a depiction of the object’s state. For strings, we simply write the String literal that the object holds.
If we suppose that our String object was created at memory address 0x000000004A390600
(a 64-bit hexadecimal number representing a memory location), we’d end up with:
Since keeping track of memory addresses adds unnecessary complication to our diagram (we will never directly interact with these addresses in our code), we will not do this going forward. Instead, we will symbolically represent the reference using an arrow from within the variable box (on the runtime stack) pointing to the edge of the object (on the memory heap).
Since String
s are such a ubiquitous type in Java programs, they break some of the typical conventions for reference types. In particular, Java provides us the String literal syntax "string"
that, on its own, constructs a String
object containing this sequence of characters. Therefore, writing String firstName = "Matt"
accomplishes the same thing as the above example. We use the more verbose constructor syntax since this is required for most other reference types.
Instance Methods
The behaviors of reference types are primarily specified by defining instance methods within their classes. If we look at the String
class documentation, we can find a list of a String
’s instance methods under the heading “Method Summary”. We highlight a few of these methods here:
Return Type | Method Name/Parameters | Description |
---|---|---|
char |
charAt(int index) |
Returns the char value at the specified index. |
int |
indexOf(char c) |
Returns the index within this string of the first occurrence of the specified character. |
int |
length() |
Returns the length of this string. |
String |
substring(int beginIndex, int endIndex) |
Returns a string that is a substring of this string. The substring begins at the specified beginIndex and extends to the character at index endIndex - 1 . Thus, the length of the substring is endIndex-beginIndex . |
String |
toLowerCase() |
Converts all of the characters in this String to lower case using the rules of the default locale. |
We invoke an instance method using the syntax <object variable name>.<instance method name>(<parameters>)
. For example, consider the following code:
|
|
|
|
When this code is executed, 4
(the length of String
“Matt”) will be printed to the console. The second line of this code calls the length
instance method of the String
object firstName
. This method returns the number of characters in the String
, and this int
return value is assigned to the local variable len
.
As a second example, let’s trace through the execution of the following main()
method.
previous
next
Strings
are an example of immutable objects. Once a String
object is constructed, its contents (the sequence of characters that it stores) will not change. Rather, all of the methods (such as substring()
) that manipulate the contents of a String
in some way actually create a new String
object with different contents. When we start to have multiple objects that are passed around between different methods, it is important that we understand Java’s semantics for dealing with reference types. This is what we will consider next.
Java’s Reference Semantics
Let’s write a simple Java program that counts the number of occurrences of a particular character in some text entered by the user. We’ll specify a (static) method, countChar()
with a String
parameter str
and a char
parameter ch
that iterates over the indices of the String and counts the character. We can achieve this with a for
-loop and some instance methods of the String
class. Separately, in the main()
method, we will write code to query the user for input, call the countChar()
method, and print the results to the console. Much of this main()
method is similar to code from last lecture. The full code is given below.
|
|
|
|
Notice in the print()
statement on line 19 that we are using the +
operator in a new way, to concatenate String
s. Again, the ubiquitousness of String
s motivates the inclusion of this shorthand syntax for this operation (one of the few behaviors of reference types that is not achieved through an instance method reference). However, we will see soon that the immutability of String
s can make this a deceptively inefficient operation. The concatenation is not simply adding on characters to the end of the first String
, it is copying the contents of both argument Strings
into a new, longer String
object.
Within this code, we are passing the String
object text
as an argument to the countChar()
method. What happens when we do this? Does a new String
object get created in the countChar()
method, or is the same String
object used? Said another way, does Java pass objects by value or by reference? The answer, perhaps a bit unintuitive at first, is that Java always passes arguments by value, meaning it copies the contents of the argument variables from the calling stack frame into the parameter variables in the called stack frame. We already saw that this was the case for primitive types, but the same reasoning applies to reference types. The subtlety is that the contents of a reference type variable is a reference, a memory address. When we copy this address, we are copying the arrow to point to the same object in heap memory. In other words, we have a new arrow that also points to (or aliases) the same object. We animate an invocation of this program (eliding the details of the objects referenced by args
and sc
for simplicity) below.
previous
next
The same reference type semantics carry over to assignment statements within a single method call; the effect of an assignment statement is to store the value of the expression on the RHS (an object reference) into the named container (variable) on the LHS. This has the effect of aliasing the object with another reference arrow. Consider the following example. What will be printed at the end of this method? Try drawing out the memory diagram yourself before stepping through the animation.
previous
next
Drawing object diagrams carefully according to the conventions we have seen will help you to trace through code methodically. This is particularly helpful when there is a lot of variable aliasing, which will be the case when object references are passed between method calls.
null
The Java keyword null
is a special value that can be assigned to any reference type. It indicates the absence of an object reference; a reference type variable that is null
does not have an arrow pointing to any object on the heap. We indicate null
variables in our object diagrams by drawing a diagonal slash through the box for that variable.
As we will see, null
values will often be important for representing and managing the state of more complicated reference types. However, working with reference types which can possibly be null
presents a new set of complications. In particular, trying to invoke an instance method on a null
variable will cause a problem. The code
|
|
|
|
will throw a NullPointerException
since the String
variable str
does not reference a String
object on which the length()
method can be invoked. Whenever you write code that interacts with reference type variables that may be null
, it is important to proactively check for this possibility to prevent NullPointerException
s; this is a type of runtime exception that the compiler will not flag.
Arrays
The next reference types that we will introduce are arrays. Arrays are the foundational data structure in Java, a type that is used to collect and organize other objects or primitive values. Data structures distinguish themselves by the way that they arrange these data. Arrays contain a fixed number of ordered (i.e., indexed) cells, and each cell can store one object or value of a specified type.
An array is a data structure with a fixed capacity that stores elements of a particular type in an ordered, linear arrangement of cells.
The type of data that an array can store is specified in its declaration. Its capacity, the maximum number of elements that it can store, is specified during its initialization. Java uses square brackets, [
and ]
, as syntax for interacting with arrays. An example of an array initialization is:
|
|
|
|
This statement declares a variable nums
whose static type is int[]
, an array storing int
s. The new
expression on the RHS of this initialization constructs this array by allocating space on the heap. The notation int[5]
indicates that the capacity of this int
array should be 5. In other words, the array should have five cells to which we can assign int
values. When the array is constructed, Java fills in its entries with default values: 0 for primitive numeric types, false
for primitive boolean
, and null
for reference types. We can reflect the result of this array initialization in a memory diagram as follows:
length
The capacity of an array, or the number of cells that it contains, can be accessed through its length
field. A field, or an instance variable is a variable that is located within an object and is accessed using the member access operator “.
”. For example, the following code would print “5
” to the console:
|
|
|
|
We can visualize this field as a variable in the object rounded rectangle of our memory diagram:
However, since the value of the length
is apparent from the other information in the diagram (namely, the number of cells and the cell labels), we will typically omit it. We will talk a lot more about fields when we discuss defining custom types in a few lectures.
Indexing
To utilize arrays in our code, we will need a way to refer to specific cells. This will allow us to both access and reassign the values stored in these cells. We do this using the cell’s indices.
The index is a number used to identify a specific cell in an array.
The cells in an array a
are indexed with integers starting from 0
(which refers to the leftmost array cell) and increasing sequentially to a.length-1
(which refers to the rightmost array cell). See the above figure, which labels the cells with indices 0
through 4
. The notation a[i]
, again utilizing square brackets, refers to the cell with index i
in the array a
. When this notation appears on the RHS of an assignment statement, it is an expression that evaluates to the contents of this array cell. When this notation appears on the LHS of an assignment statement, it is interpreted as a variable referring to the cell itself. Let’s practice this indexing notation by tracing through the following example code.
previous
next
Array Literals
Java provides a shorthand syntax to construct an array object with specific values stored in each of its cells. This is accomplished through the use of an array literal, which lists the cell values in a comma-separated ordered list surrounded by curly brackets. For example, the statement
|
|
|
|
assigns to nums
a reference to an int
array with capacity 4 storing the values 3, 7, -5, and 8 (in that order).
For arrays storing primitive types, we can simplify this syntax further, dropping the new int[]
from the RHS of the assignment:
|
|
|
|
Arrays of Reference Types
We’ve seen examples so far of declaring arrays of primitive types, but we can similarly define arrays that store reference types. For example, we can declare an array of Strings
, a String[]
. The memory diagrams for reference type arrays begin to look a bit more complicated as they give us our first example of heap objects containing references to other heap objects.
previous
next
Multidimensional Arrays
A special case of arrays of reference types are arrays whose cells store references to other arrays, so-called multidimensional arrays. The notation for multidimensional arrays is (IMO) a bit subtle, so it is useful to walk through an example very carefully. Consider the following array initialization:
|
|
|
|
How should we think about this multidimensional array? The Java language specifies that we should interpret the dimensions of the RHS expression from left to right, so the object that we have constructed is an array with capacity 2 whose cells hold references to int[]
s, and each of these cells is initialized to reference a different int[]
with capacity 3 (and whose cells are initialized to the default int
value, 0). We end up with the following picture:
Whenever I revisit this "left-to-right" convention, it annoys me at first, as my natural inclination is to interpret int[2][3]
as (int[2])[3]
, an array of capacity 3 whose cells hold int[2]
arrays. However, I will concede that Java's choice is better because it allows for more natural interpretation of the bounds for multidimensional indexing, as we will see below.
We can also use a “nested array literal” to initialize the contents of a multi-dimensional array to particular values. For example, the initialization statement
|
|
|
|
results in the memory representation
While the memory diagram of a 2D array has this tree-like "arrays pointing to arrays" structure, it is often more natural to think of 2D arrays as grids or matrices of numbers, with the "outer array" representing the rows of this matrix and the "inner arrays" representing the contents of each row. We visualize our previous example as the matrix:
\[\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}.\]Similar to (1D) arrays, we access elements of multidimensional arrays using square bracket syntax. Now, we may require multiple indices to access the “innermost” array elements. We again interpret the indices from left to right. For example, to evaluate numGrid[1][0]
, we start at the int[][]
object referenced by numGrid
, then access the int[]
object referenced by its cell at index 1, and then access the int
element referenced by its cell at index 0. Thus, numGrid[1][0]
evaluates to 4
. On the other hand, numGrid[0][1]
evaluates to 2
. Using a single index returns a reference to the “inner array”, so numGrid[0]
evaluates to the int[]
containing {1,2,3}
.
We have only looked at examples of rectangular 2D arrays in which the capacities of all "inner" arrays are the same. This is what is naturally constructed using syntax such as new int[n][m]
for some integers n
and m
, however, it is not a requirement. We can also initialize a 2D array using syntax new int[n][]
which constructs the "outer" array with default null
references. The cells of this outer array can then be assigned to int[]
arrays with different capacities, giving a ragged array. Ragged multidimensional arrays can also be constructed using array initializers. Both of these ideas are expanded upon in the exercises.
Program Arguments
Recall that the signature of the main()
entrypoint method for Java programs is public static void main(String[] args)
. We now have the tools that we need to interpret the parameter of this method; it is an array of String
s. Specifically, this array gets populated with arguments that are passed to the java
program on the command line. For example, we can write a simple program Echo.java
(named after the Linux echo
command) that prints out the program arguments that it receives, separated by spaces.
Echo.java
|
|
|
|
If we compile this “.java” file, we’ll end up with an “Echo.class” file. An example execution of this Java program (along with the output that it produces) is shown below.
> java Echo hello world hello world
You can find information about modifying the run configurations in IntelliJ to supply program arguments on our IntelliJ reference page.
Main Takeaways:
- All types beyond the 8 primitive types in Java are reference types. Instances of reference types are called objects.
- Objects are allocated on the memory heap and variables with reference types store the heap memory address where the object is located (i.e., a reference to that object).
null
is a special value for reference types that indicates a variable without a reference to an object. - Instance methods of objects are invoked and fields of objects are accessed using the member access operator "
.
". - Arrays are a basic data structure consisting of a linearly-ordered arrangement of a fixed number of cells. Java uses special square brackets notation for array initialization and indexing.
- Program arguments are passed into a Java program at the start of execution and appear in the
String[] args
parameter of themain()
method.
Exercises
During the execution of the following block of code, how many String
objects are constructed?
|
|
|
|
What gets printed when this code is executed?
null
?When we execute the initialization statement String[][] wordGrid = new String[8][3];
, how many objects are constructed?
String
s
Count Uppercase Letters
|
|
|
|
Find a Substring.
|
|
|
|
Note: There is a String
class method with this behavior (try to find it), but it is also good practice to write such a method yourself.
Reverse a String
|
|
|
|
Double Characters
|
|
|
|
String
s on the Heap
f()
:
|
|
|
|
f()
's call frame and the memory heap after each line of this code snippet is executed. How many String
objects were constructed during this execution?
a
after the following code is executed?
|
|
|
|
Swap Array Elements
|
|
|
|
Reverse Array
|
|
|
|
Count Element Occurrences
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
c
"knows" is that each of its memory cells will hold a reference to an int[]
array. It is not concerned with the state of this array (in particular, its length
), just as a one-dimensional array allows different values to be stored in each of its entries.
int
s, reassigns one entry of this array, and then prints the array's contents. Determine what is printed by each of the snippets. It may help to draw memory diagrams to trace through the code. Can you explain why the outputs of the two snippets are different?
snippet 1
|
|
|
|
snippet 2
|
|
|
|
- At the start, a random number between 1 and 100 (inclusive) is selected by the program (and unknown to the user).
- The application prompts the user to guess a number and uses a
Scanner
to read their input. - If the guess is too low or too high, this is reported to the user, and they are prompted to guess again.
- If the guess is correct, a congratulatory message is printed that includes the total number of guesses the user made.
Guess a number between 1-100: 46 Your guess is too low. Guess a number between 1-100: 78 Your guess is too high. Guess a number between 1-100: 60 Your guess is too high. Guess a number between 1-100: 53 You found the secret number in 4 guesses. Congratulations!
main()
method in a file Add.java
that accepts any number of program arguments and prints their sum. The static method Integer.parseInt()
will likely be useful. An example execution of this program is shown below.
> java Add 13 40 -56 718 715