M/F 2:30-3:20   
in G01 Gates Hall

CS 1130: Transition to OO Programming

Spring 2016

Type char

Overview

The values of type char are the characters that you can type on your keyboard, plus many more.

Literals of type char are written with single-quote marks surrounding the character. The last example below is the blank, or space character, which you get by stroking the space bar on your keyboard.

'a'     'Z'      '<'     ' '

Try typing ''' into the DrJava interactions pane, and you are in for a surprise. It produces an error message. The problem is that the second single quote is ambiguous —does it indicate the end of the literal or does it represent the single-quote character? To get around this ambiguity, Java and many other languages use the escape character \ to write characters like the single quote, like this:

'\''

The table below lists four characters that are written using the escape character. To see others, look at page 225 of Gries/Gries.

character meaning
\' the single quote char
\" the double quote char
\\ the backslash char
\n the new-line char

You will see how the new-line character is used in the next lecture, on Strings.


Representation of characters: ASCII and Unicode

You don't have to read the information below on the representation of characters and Unicode. However, it will give you a sense of history and help you understand how inclusive type char is with respect to the world's languages.

Representation of characters

In the 1960's (and before, in some places), characters were represented in a 7-bit format known as ASCII (American Standard Code for Information Interchange). This code had representations of small and capital letters, many other characters that you find on your keyboard (e.g. !@#$%^&-+=<>) and many non-printable characters, like tabs and line-feeds. For example, the representation for 'A' was the integer 65, or, in binary, 1000001. ASCII defines codes for 128 characters, 33 of which are non-printing (like the tab).

But there are many other alphabets besides the one we use in the United States —Russian, Finnish, Greek, Japanese, Persian, and so forth, to say nothing of Chinese. A truly international code would allow characters from all these alphabets.

In 1991, the Unicode Consortium was created to develop a 16-bit code that would include all alphabets. The outcome of their work was the Unicode Standard, which is now in use in many programming languages, including Java. The ultimate reference for Unicode is the website www.unicode.org/. To get a simpler and shorter glimpse of Unicode, turn to lesson page 6-5 of the CD ProgramLive.

Writing unicode representations in Java

In Java, you can write the character 'A' as the character literal '\u0041'. The 'u' indicates that the next 4 characters give the hexademical unicode representation of the character in question. To show you how comprehensive the Unicode standard is, below, we give a table of some characters along with their representations as Java char literals, their decimal representations, and the languages to which they belong some of them might not print properly in some browsers. For fun, type the unicode representation into DrJava's interactions pane and see the letter itself appear.

character Java unicode decimal representation language
'A' '\u0041' 65 Latin, English
'Ճ' '\u0543' 1347 Armenian
'א' '\u05D0' 1488 Hebrew
'Ꮬ' '\u13DC' 5084 Cherokee
'' '\u0914' 2324 Devanagri (Sanskrit)
'' '\u30B0' 12464 Japanese Katakana
'ڰ' '\u06B0' 1712 Arabic
'\u3118' 12568 Chinese Bopomofo