The FAQ

This page attempts to answer a few of the common questions about PolyJ, particularly questions relating to its design. PolyJ differs in some important ways from other systems and proposals for parametric polymorphism in Java.

If you have any questions about PolyJ that are not answered here, please send them along.

What is PolyJ?
What version of Java is PolyJ an extension of?
Is there a definition of the language?
How does PolyJ differ from Pizza and Generic Java (GJ)?
How was PolyJ implemented?
Is PolyJ available? How stable is it?
Can I use PolyJ?
Can PolyJ programs use Java libraries?
Can I ship PolyJ libraries as bytecode? Can they be called from Java?
How does PolyJ avoid the code expansion of C++ templates?
Why are where clauses used to constrain type parameters?
How is instantiation on basic types efficiently supported?
Does PolyJ rely on the underlying Java compiler to catch bugs?
Syntax: why does PolyJ use square brackets rather than C++-style angle brackets?

What is PolyJ?

PolyJ is a portable compiler that accepts an extended version of the Java language. It was originally developed by members of the Programming Methodology Group at the MIT Laboratory for Computer Science. The current version was produced at Cornell by Jed Liu and Grant Wang under the direction of Andrew Myers.

PolyJ provides constrained parametric polymorphism. Unlike some other proposals for adding genericity to Java, it use signature constraints, which provide flexibility when composing a program. PolyJ also allows all types to be used as parameters, even basic types like int. Another powerful feature is that instantiation types and parameter types are first-class types that may be used wherever a type may be used -- particularly, in safe run-time casts and with "instanceof".
Does PolyJ fully implement Java 1.1?

No. The current implementation of PolyJ supports only Java 1.0. It is our intention to support Java 1.1 in the next major release, which should come out around January '99. See the bugs and deficiencies page for more information about Java 1.1 features.
Is there a definition of the language?

Yes. It is found in our paper from the ACM Symposium on Principles of Programming Languages (POPL '97). There is also a reference manual for PolyJ distributed with the release.
How does PolyJ differ from Pizza and Generic Java (GJ)?
PolyJ and GJ have much in common in their philosophy. They are both backward-compatible Java extensions that compile by using a homogeneous translation to Java. They are both conservative extensions to Java that try to avoid unnecessary changes. Their parameterization mechanisms have similar power. However, there are some differences:
PolyJ advantages:
- It supports instantiation on basic types (e.g., int) and array types (e.g., int[]). In PolyJ, you can write Vector[int]; in GJ, the closest you can come is Vector<Integer>. We are able to do this without slowing down basic types.
- Instantiation types and parameter types are first class: instanceof and run-time casts work with them. For example, unlike in GJ you can cast from Object to Vector<String>.
- More complete support for arrays. For example, in generic code in which T is a type parameter, the type T[] is a legal type; it is even safe to cast to a T[].
- Static methods and static variables are supported in generic code.
- Signature constraints (where clauses) are used instead of interface constraints. This makes it easier to put programs together from Java or PolyJ components.
- Constraints can require constructors and static methods. For example, generic code might require that a parameter type supply a constructor so new objects of that types can be created.
- The instantiation notation uses square brackets ([]) rather than C++-style angle brackets (<>). This is a more robust and easily parsed syntax.
GJ advantages:
- Because GJ works entirely by erasure, it supports retrofitting of GJ interfaces onto existing Java code. For example, the generic type Vector<T> is implemented by the Java type Vector. We have provided a new implementation of Vector[T] with PolyJ.
- GJ supports polymorphic methods: individual methods of a class that have their own type parameters.
How is PolyJ implemented?
The compiler is built as an extension to the guavac compiler developed by David Engberg. The guavac compiler is used on many Linux systems. It is written in C++, which also means that PolyJ runs fast on virtually all platforms. We are currently integrating the Java 1.1 upgrades from guavac so we can support the 1.1 features. However, a version of the PolyJ compiler written in PolyJ is in the works.
Is PolyJ available? How stable is it?

PolyJ is available and working. It has been used every semester since Fall '97 for the MIT software engineering course, which helped make it robust. It is now publicly available in source code form. We also hope to release it soon in compiled form for Windows 95/NT.
Can I use PolyJ?

PolyJ is released under the Gnu public license. It works by translating code into standard Java code, so if you can run Java programs, you can run PolyJ programs too. It can handle standard Java .class files, so if you have a working Java system, you should be able to just start using PolyJ instead. You can write programs partly in Java and partly in PolyJ, and use existing Java libraries with PolyJ programs. Integrated Java development environments with their own built-in Java compiler are problematic. However, if the development environment lets you use an outside Java compiler, you can plug in PolyJ instead.
Can PolyJ programs use Java libraries?
Yes. PolyJ is backward compatible with Java. Every legal Java class, whether in source code form or bytecode form, is a legal PolyJ class too. All of the existing Java infrastructure can be used from PolyJ.
Can I ship PolyJ libraries as bytecode? Can they be called from Java?
Yes and yes. Unlike in C++, PolyJ libraries do not need to be shipped in source code form. A generic library is usable from PolyJ programs because all the method signature information needed is available within the compiled code. PolyJ classes and libraries can be called from Java code, too: PolyJ compiles to ordinary Java bytecode, so any non-parameterized inferfaces and classes in PolyJ programs can be called from Java programs simply.
How do we avoid the code expansion caused by C++ templates?
Unlike implementations of the C++ template mechanism, the PolyJ compiler generates very little extra code for each instantiation of a parameterized abstraction. It can do this because unlike the C++ template mechanism, PolyJ has constrained genericity, which has the result that an instantiation of a parameterized class can be type-checked without looking at the code of the parameterized class or at the code of the type parameters.

In the translation, the code of a parameterized class is translated into a similar Java class. For each instantiation, a small trampoline class is generated to glue the generic code to its parameters and to allow run-time casting and instanceof operations. A trampoline class typically has only a handful of methods, each of which does nothing but call another method, so each trampoline is very small and does not bloat the code.
Why are where clauses used to constrain type parameters, rather than an "extends" requirement?

We tried very hard to come up with a workable solution using subtype constraints, but finally concluded it just doesn't work well. The problem is that Java, unlike most of the object-oriented languages with parametric polymorphism, requires explicit declaration of subtype relationships. This means that in order to create an instantiation type like Set<Node>, there must be an explicitly-declared subtype relationship between Node and a special parameterized constraint interface (which is how Pizza/GJ would do it). That doesn't work at all unless the programmer has control over Node, in which case the necessary subtype relationships can be added as needed.

There are two problems with this. First, Java is supposed to support development with extensive libraries and separate compilation. It seems restrictive to require that the programmer have access to the source code for Node. Second, we are concerned that there will be an explosion of these parameterized constraint interfaces, and that every class will end up declaring that it implements a long list of them. The explosion will be particularly bad because Java supports overloading, and different generic classes will require different versions of the overloaded method. These different versions will lead to different constraint interfaces that differ only in how they are overloaded.
How do we support instantiation on basic types?

When we translate PolyJ code, any mention of a basic type is passed through unchanged in the Java code. There is no performance penalty for our allowing basic types as type parameters. However, when a basic type is used as a type parameter, manipulations of that type within the code are not as fast as when the type is known statically. This is a consequence of our no-bloat, homogeneous translation.

An important aspect of our translation is the efficient representation of arrays of parameter types. Suppose a piece of code is parameterized with respect to some type T, and that it contains references to objects of type T[]. If the code is instantiated on a basic type such as int or boolean, the corresponding arrays will be objects of type int[] or boolean[]. Arrays are efficiently supported in PolyJ.

PolyJ also permits array types to be used as parameters. For example, it is easy to write a generic sorting package Sorts[A,E], where the parameter A can be instantiated on arrays, Vectors, and other types with array-like behavior; the parameter E is the element type. Thus, one can write Sorts[int[], int] and also Sorts[Vector[T], T] for any type "T" supporting a "compareTo" method. To see how this is done, look in the current distribution!
Does PolyJ rely on the underlying Java compiler to catch bugs?

No. Although PolyJ uses Java as a target language, the PolyJ compiler performs complete static checking on the ".pj" source file. It then automatically invokes the Java compiler of your choice to produce ".class" files. Any Java compiler that is compatible with "javac" will work just fine.
Why does PolyJ use square brackets rather than C++-style angle brackets?

Angle brackets make it harder to parse the language. This is a bad thing because when the language is harder to parse, it prevents useful tools from being developed that manipulate source code -- pretty printers, source-to-source translators, extended static checkers. Also, languages that are hard to parse tend to produce very confusing error messages when the user writes a program with a syntax error in it. C++ has suffered from both of these problems. We picked brackets largely because they're easier to parse.

C++ can get away with with angle brackets because unlike Java, it requires forward declarations. When it sees a line of code like
```
  Set<T,U> x;
```
it needs to know whether "Set" is a type or not, because it affects how the statement is parsed. If "Set" is a variable, this is a silly but perfectly legal statement that evaluates the expressions (Set < T) and (U > x), separated by the C++ comma operator! If "Set" is a type, it is a declaration of a variable "x". The two parse trees are not even similar.
Currently, a single-pass LALR parser can build a useful parse tree for Java. If angle brackets are used for parameterization, the compiler will need an extra phase so that it can pick up the type names and construct the appropriate nested scopes. In fact, it will be even harder to parse than C++ is.
A complex parsing process also leads to user confusion, because compilers can report unexpected error messages when users write code containing a syntax error. Consider the example above. If the compiler doesn't know about "Set" for some reason (and there are several possible reasons why this could happen), it will probably generate an error message that has nothing to do with what is really wrong, such as "Undeclared variable x", or maybe just "Syntax error", an error some C++ compilers are fond of. Actually, it's not easy for the C++ compilers to generate a better error message; the problem is really in the language definition.