Page 7
Implementing a new operator by tree translation
Page 8
Index
See this article in english 
Page 9
Checklist for defining new operators

Implementing a new operator by direct bytecode generation

Alexander Hristov

If you feel brave and full of positive energy, in this article we'll implement the ** operator of the previous article using direct bytecode generation. No, really, it isn't that hard, since the code of the compiler is very neatly organized and provides you with a lot of help.

However, you do need to have at least a superficial knowledge of the JVM. "Superficial knowledge" means that you need not know eveery single VM instruction, but you should know how methods are called, what the constant pool is, and that the JVM is a stack-based machine (no "registers" here) and what this means. Knowing to read the output of javap will certainly be a plus.

The code generation is performed by the Gen class, together with a set of helper classes, among which the two most important are ByteCodes (which we have met previously, as it contains named constants for all VM opcodes, as well as constants for "virtual" instructions), Code (which stores the code generated and provides several methods for emitting opcodes) and Items (which represents addressable items, i.e. -anything that can be referred to by VM instructions. This includes stack items, local variable,s parameters, constant pool members,etc.). All the four classes are located in the com.sun.tools.javac.jvm package.

As a first step, if you have completed the previous tutorial, you have the ** operator implemented in the de-sugaring phase, so this needs to be removed. Leave the visitBinary method of the Lower class as it was.

Your responsability in a visitor method of the Gen class is to generate the code needed to perform the action described in that node, and return an Item representing the result of the action. If the result of your action was a computation, you'll probably return a stack item because the computation is usually left on the stack. Imagine for example you have the expression lhs + rhs, where lhs and rhs are arbitrarliy complex numeric expressions. Your responsability as the "+" node is to do the following:

compute the result of lhs, then push it on the stack
coerce the result into an int 
compute the result of rhs, then push it on the stack
coerce the result into an int 
emit the opcode for the +(int,int) operator. 
return "my result is the item on the top of the stack"

Fortunately, the genExp helper method does almost all of the steps for you: It takes an expression tree and its expected type as parameters, generates code (recursively) to get the result and then coerces the final result to the expected type. So basically, you'd say (pseudocode)

lhsresult = genExp(lhs,expected type int)
push lhsresult on the stack 
rhsresult = genExp(rhs,expected type int) 
push rhsresult on the stack
emit +(int,int)
return "my result is the item on the top of the stack"

Now remember that types in the java compiler are defined as fields of the Symtab class. So the way to say "I expect the result to be an int" is by referencing the Symtab.intType value.

The genExp method returns an Item. This item does not necessarily reside on the stack, so you must push it manually. This is performed by invoking the load() method of the returned item, which takes cares of the details:

The good thing about this is that you can handle items (more or less) regardless of what they reference. So our pseudocode becomes:

Item lhsResult = genExp(lhs,Symtab.intType)
lhsResult.load();


Item rhsResult = genExp(rhs,Symtab.intType) 
rhsResult.load();


emit +(int,int)
return "my result is the item on the top of the stack"

When you need to emit directly some code, you call the appropriate method of the Code class, which will usually be one of the emitopN() methods, where N = 0,1,2 or 4. These methods take N+1 parameters, the first one being the opcode and the rest - the operands. If our operator was the + operator, the rest would be easy : simply emit the iadd operator:

Item lhsResult = genExp(lhs,Symtab.intType)
lhsResult.load();


Item rhsResult = genExp(rhs,Symtab.intType) 
rhsResult.load();


code.emitop0(iadd);
return "my result is the item on the top of the stack"

Finally, we must return an Item that represents what we've done. Since the result of what we've done remains on the stack, we need a StackItem. Whatever the item type, the Gen class declares an instance variable called items which you can use to produce any kind of item, by invoking the appropriate makeXXXItem() on it. Since we need to create a stack item, we'll call makeStackItem(). This method requires us to specify the type of the stack item. Not because it is really needed by the JVM, but because the compiler itself checks the code generated is consistent. For example, it is not correct for the code of a node to leave data on the stack on exit.

Item lhsResult = genExp(lhs,Symtab.intType)
lhsResult.load();


Item rhsResult = genExp(rhs,Symtab.intType) 
rhsResult.load();


code.emitop0(iadd);
return items.makeStackItem(Symtab.intType);

If our operator was simply the + operator, at this point we would be done. But of course, we aren't finished. We must invoke Math.pow(). In java, parameters are passed to invoked methods by pushing them on the stack (with the first parameter being the first one pushed), and then using the appropriate invokeXXX VM instruction. This instruction takes as an operand the number of the method to call, which must exist in the constant pool.

All these details are handled for us by the callMethod() helper method. It takes five parameters:

The only parameter we might have trouble obtaining is a Type representing the Math class. If any part of the compiler needs to know about a specific type, then it is a good idea to declare it with the rest of the compiler-known types - in the Symtab class. Simply add the following lines:

Symtab.java
 
public class Symtab {
...
  public final Type inheritedType;
  public final Type proprietaryType;
public final Type mathType;
/** The symbol representing the length field of an array. */ public final VarSymbol lengthVar; /** The null check operator. */ public final OperatorSymbol nullcheck; ... protected Symtab(Context context) throws CompletionFailure { context.put(symtabKey, this); names = Name.Table.instance(context); reader = ClassReader.instance(context); ... reader.init(this); // Enter predefined classes.
mathType = enterClass("java.lang.Math");
objectType = enterClass("java.lang.Object"); classType = enterClass("java.lang.Class"); stringType = enterClass("java.lang.String"); ... } ... }
 

So now we are ready to modify the visitBinary node to emit code for our binary operator

Gen.java
 
  public void visitBinary(JCBinary tree) {
    OperatorSymbol operator = (OperatorSymbol) tree.operator;
    if (operator.opcode == powerOp) {
      genExpr(tree.lhs, Symtab.doubleType ).load();
      genExpr(tree.rhs, Symtab.doubleType).load();
      callMethod(
          tree.pos(),
          syms.mathType,
          names.fromString("pow"),
          List.of(Symtab.doubleType,Symtab.doubleType),
          true);
      result = items.makeStackItem(Symtab.doubleType);
      return;
    } 
    if (operator.opcode == string_add) {
      // Create a string buffer.
    ...

 

(Remember that powerOp is the tag we assigned to our operator in the ByteCodes class)

And voilá - we are done :-)

 

 

 

Comments

Dec 17, 2007 at 11:54 Sent by yuan
very good! I found in the java7, all the "Symtab" in visitBinary method should replaced to "syms" such as replace "Symtab.doubleType" with "syms.doubleType".

 

Add a Comment

Name (optional)
EMail (optional, will not be displayed)

Text