This page attempts to answer a few of the common questions about PolyJ, particularly questions relating to its design. PolyJ differs in some important ways from other systems and proposals for parametric polymorphism in Java.
If you have any questions about PolyJ that are not answered here, please send them along.
What is PolyJ?
PolyJ is a portable compiler that accepts an extended version of the Java language. It was originally developed by members of the Programming Methodology Group at the MIT Laboratory for Computer Science. The current version was produced at Cornell by Jed Liu and Grant Wang under the direction of Andrew Myers.
PolyJ provides constrained parametric polymorphism. Unlike
some other proposals for adding genericity to Java, it use signature
constraints, which provide flexibility when composing a program. PolyJ
also allows all types to be used as parameters, even basic types like
int
. Another powerful feature is that instantiation types
and parameter types are first-class types that may be used
wherever a type may be used -- particularly, in safe run-time casts and
with "instanceof".
Does PolyJ fully implement Java 1.1?
No. The current implementation of PolyJ supports only Java 1.0. It is our intention to support Java 1.1 in the next major release, which should come out around January '99. See the bugs and deficiencies page for more information about Java 1.1 features.
Is there a definition of the language?
Yes. It is found in our paper from the ACM Symposium on Principles of Programming Languages (POPL '97). There is also a reference manual for PolyJ distributed with the release.
PolyJ and GJ have much in common in their philosophy. They are both backward-compatible Java extensions that compile by using a homogeneous translation to Java. They are both conservative extensions to Java that try to avoid unnecessary changes. Their parameterization mechanisms have similar power. However, there are some differences:
PolyJ advantages:int
) and
array types (e.g., int[]
). In PolyJ, you can write Vector[int]
;
in GJ, the closest you can come is Vector<Integer>
.
We are able to do this without slowing down basic types.
instanceof
and run-time casts work with them. For example,
unlike in GJ you can cast from Object
to Vector<String>
.
T
is a type parameter, the type T[]
is a legal type; it is even safe to cast to a T[]
.
[]
) rather than C++-style angle brackets (<>
).
This is a more robust and easily parsed syntax.
GJ advantages:Vector<T>
is implemented by the Java type
Vector
. We have provided a new implementation of
Vector[T]
with PolyJ.
The compiler is built as an extension to the guavac compiler developed by David Engberg. The guavac compiler is used on many Linux systems. It is written in C++, which also means that PolyJ runs fast on virtually all platforms. We are currently integrating the Java 1.1 upgrades from guavac so we can support the 1.1 features. However, a version of the PolyJ compiler written in PolyJ is in the works.
Is PolyJ available? How stable is it?
PolyJ is available and working. It has been used every semester since Fall '97 for the MIT software engineering course, which helped make it robust. It is now publicly available in source code form. We also hope to release it soon in compiled form for Windows 95/NT.
Can I use PolyJ?
PolyJ is released under the Gnu public license. It works by translating code into standard Java code, so if you can run Java programs, you can run PolyJ programs too. It can handle standard Java .class files, so if you have a working Java system, you should be able to just start using PolyJ instead. You can write programs partly in Java and partly in PolyJ, and use existing Java libraries with PolyJ programs. Integrated Java development environments with their own built-in Java compiler are problematic. However, if the development environment lets you use an outside Java compiler, you can plug in PolyJ instead.
Yes. PolyJ is backward compatible with Java. Every legal Java class, whether in source code form or bytecode form, is a legal PolyJ class too. All of the existing Java infrastructure can be used from PolyJ.
Yes and yes. Unlike in C++, PolyJ libraries do not need to be shipped in source code form. A generic library is usable from PolyJ programs because all the method signature information needed is available within the compiled code. PolyJ classes and libraries can be called from Java code, too: PolyJ compiles to ordinary Java bytecode, so any non-parameterized inferfaces and classes in PolyJ programs can be called from Java programs simply.
Unlike implementations of the C++ template mechanism, the PolyJ compiler generates very little extra code for each instantiation of a parameterized abstraction. It can do this because unlike the C++ template mechanism, PolyJ has constrained genericity, which has the result that an instantiation of a parameterized class can be type-checked without looking at the code of the parameterized class or at the code of the type parameters.
In the translation, the code of a
parameterized class is translated into a similar Java class. For each
instantiation, a small trampoline class is generated to glue the generic
code to its parameters and to allow run-time casting and
instanceof
operations. A trampoline class typically has
only a handful of methods, each of which does nothing but call another
method, so each trampoline is very small and does not bloat the code.
Why are where clauses used to constrain type parameters, rather than an "extends" requirement?
We tried very hard to come up with a workable solution using subtype
constraints, but finally concluded it just doesn't work well. The problem
is that Java, unlike most of the object-oriented languages with parametric
polymorphism, requires explicit declaration of subtype relationships. This
means that in order to create an instantiation type like
Set<Node>
, there must be an explicitly-declared subtype
relationship between Node
and a special parameterized
constraint interface (which is how Pizza/GJ would do it).
That doesn't work at all unless the programmer
has control over Node
, in which case the necessary subtype
relationships can be added as needed.
There are two problems with this. First, Java is supposed to support
development with extensive libraries and separate compilation. It seems
restrictive to require that the programmer have access to
the source code for Node
. Second, we are concerned that there
will be an explosion of these parameterized constraint interfaces, and that
every class will end up declaring that it implements a long list
of them. The explosion will be particularly bad because Java supports
overloading, and different generic classes will require different versions
of the overloaded method. These different versions will lead to different
constraint interfaces that differ only in how they are overloaded.
How do we support instantiation on basic types?
When we translate PolyJ code, any mention of a basic type is passed through unchanged in the Java code. There is no performance penalty for our allowing basic types as type parameters. However, when a basic type is used as a type parameter, manipulations of that type within the code are not as fast as when the type is known statically. This is a consequence of our no-bloat, homogeneous translation.
An important aspect of our translation is the efficient representation of
arrays of parameter types. Suppose a piece of code is parameterized
with respect to some type T, and that it contains references to objects of
type T[]
. If the code is instantiated on a basic type such as
int
or boolean
, the corresponding arrays will
be objects of type int[]
or boolean[]
.
Arrays are efficiently supported in PolyJ.
PolyJ also permits array types to be used as parameters. For example, it
is easy to write a generic sorting package Sorts[A,E]
,
where the parameter A
can be instantiated on arrays, Vectors,
and other types with array-like behavior; the parameter E
is
the element type. Thus, one can write Sorts[int[], int]
and
also Sorts[Vector[T], T]
for any type "T" supporting a
"compareTo" method. To see how this is done, look in the current distribution!
Does PolyJ rely on the underlying Java compiler to catch bugs?
No. Although PolyJ uses Java as a target language, the PolyJ compiler performs complete static checking on the ".pj" source file. It then automatically invokes the Java compiler of your choice to produce ".class" files. Any Java compiler that is compatible with "javac" will work just fine.
Why does PolyJ use square brackets rather than C++-style angle brackets?
Angle brackets make it harder to parse the language. This is a bad thing because when the language is harder to parse, it prevents useful tools from being developed that manipulate source code -- pretty printers, source-to-source translators, extended static checkers. Also, languages that are hard to parse tend to produce very confusing error messages when the user writes a program with a syntax error in it. C++ has suffered from both of these problems. We picked brackets largely because they're easier to parse.
C++ can get away with with angle brackets because unlike Java, it requires forward declarations. When it sees a line of code like
Set<T,U> x;
it needs to know whether "Set" is a type or not, because it affects how the statement is parsed. If "Set" is a variable, this is a silly but perfectly legal statement that evaluates the expressions (Set < T) and (U > x), separated by the C++ comma operator! If "Set" is a type, it is a declaration of a variable "x". The two parse trees are not even similar.
Currently, a single-pass LALR parser can build a useful parse tree for Java. If angle brackets are used for parameterization, the compiler will need an extra phase so that it can pick up the type names and construct the appropriate nested scopes. In fact, it will be even harder to parse than C++ is.
A complex parsing process also leads to user confusion, because compilers can report unexpected error messages when users write code containing a syntax error. Consider the example above. If the compiler doesn't know about "Set" for some reason (and there are several possible reasons why this could happen), it will probably generate an error message that has nothing to do with what is really wrong, such as "Undeclared variable x", or maybe just "Syntax error", an error some C++ compilers are fond of. Actually, it's not easy for the C++ compilers to generate a better error message; the problem is really in the language definition.