Page 4
Duplicating an existing operator. Step 1 - The Lexer
Page 5
Index
See this article in english 
Page 6
Duplicating an existing operator. Step 3 - The Code Generation

Duplicating an existing operator. Step 2 - The Parser

Alexander Hristov

Structure of the AST

Next stop is the Parser. Remember that the primary aim of a parser is to construct the AST.

The AST (Abstract Syntax Tree) is a tree representation of the source code that is very convenient for processing. Each source construct usually has a specific node representing it. If you are familiar with the typical expression trees, that represent 2+3 as a tree with a node called "+" and two children called "2" and "3", then the AST is nothing more than a generalization of this concept.

For example, the expression 2+Math.max(5,9) generates the following tree:

Abstract Syntax Tree

In javac, while creating the syntax tree the parser checks the correctness of the source code and reports any syntax errors, and performs string folding. Also, string literals are folded at this stage. Trees that consist of concatenations of string literals (i.e. "A"+"B") are optimized into a single string literal by the foldStrings() method.

In the java compiler, each AST node is an instance of a subclass of com.sun.tools.javac.tree.JCTree. implementing a corresponding interface derived from com.sun.source.Tree. In general, we have the following rules:

The JCTree class

The JCTree class defines three important properties for each node:

tag - This is an integer variable that stores the kind of operation that the node represents. Note that there is not a 1:1 relation between node types and operations. For example, all binary operators (+,-,*, etc) result in the very same BinaryTree node, but of course the operation of + differs from the operation of -, so they would be identified by different values of tag. The set of possible tags is defined at the beginning of the JCTree class in the following fashion:

JCTree.java
 
...
    public static final int OR = NULLCHK + 1;                // ||
    public static final int AND = OR + 1;                    // &&
    public static final int BITOR = AND + 1;                 // |
    public static final int BITXOR = BITOR + 1;              // ^
    public static final int BITAND = BITXOR + 1;             // &
    public static final int EQ = BITAND + 1;                 // ==
    public static final int NE = EQ + 1;                     // !=
    public static final int LT = NE + 1;                     // <
    public static final int GT = LT + 1;                     // >
    public static final int LE = GT + 1;                     // <=
...

 

Why not reuse the Token enumeration? Because the same token can have multiple meanings depending on context. For example, the "-" character generates the SUB Token. However, a-b (binary minus) is not the same as -a (unary minus), and they need to be represented using different node types.

Now, when modifying this list (as we'll do shortly) it's generally wise not to alter order or starting or ending ("boundary") values. i.e.. There are parts of the compiler that make checks like these:

Type.java
 
public boolean isPrimitive() {
  return tag < VOID;
}

 
and even things like this:
TreeInfo.java
 
private Name[] opname = new Name[JCTree.MOD - JCTree.POS + 1];

 

pos - An integer variable that stores the position inside the source code that this node represents. The position is represented as an offset from the beginning of the file, the first one being at offset 0. The special value of -1 is used to represent a nonexistant position.

type - A variable of class Type that stores the Java type of the node (i.e. int, float, String, whatever).

 

Adding the operator to the Parser

All binary operators result in the same node type : JCBinary, which implements the interface BinaryTree. The specific operator represented by the node is contained in the tag variable, so the first thing we need to do is to define a new tag value for our operator. We'll make the following change to JCTree:

JCTree.java
 
...
    public static final int BITOR = AND + 1;                 // |
    public static final int BITXOR = BITOR + 1;              // ^
    public static final int BITAND = BITXOR + 1;             // &
    public static final int EQ = BITAND + 1;                 // ==
public static final int NE = EQ + 1; // != public static final int NE2 = NE + 1; // #
public static final int LT = NE2 + 1; // < public static final int GT = LT + 1; // > public static final int LE = GT + 1; // <= public static final int GE = LE + 1; // >= public static final int SL = GE + 1; // << public static final int SR = SL + 1; // >> ...
 

When the parser needs to convert an operator token to its corresponding tag, it calls the optag() or unoptag() methods (depending on whether the operator is binary or unary). Our operator is binary, so we'll modify the optag method:

Parser.java
 
  /** Return operation tag of binary operator represented by token,
   *  -1 if token is not a binary operator.
   */
  static int optag(Token token) {
    switch (token) {
case POUND: return JCTree.NE2;
case BARBAR: return JCTree.OR; case AMPAMP: return JCTree.AND; ...
 

Precedence levels

Java has 16 precedence levels, and they are all defined in the com.sun.tools.javac.tree.TreeInfo class:

TreeInfo.java
 
  /** Operator precedences values.
   */
  public static final int 
      notExpression = -1, // not an expression
      noPrec = 0, // no enclosing expression
      assignPrec = 1, 
      assignopPrec = 2, 
      condPrec = 3, 
      orPrec = 4,
      andPrec = 5,
      bitorPrec = 6, 
      bitxorPrec = 7, 
      bitandPrec = 8, 
      eqPrec = 9,
      ordPrec = 10,
      shiftPrec = 11, 
      addPrec = 12, 
      mulPrec = 13,
      prefixPrec = 14,
      postfixPrec = 15, 
      precCount = 16;

 

Each operator has a precedence value that is one of the above. The mapping is performed by the opPrec() method in that class. Any new operator needs to have its precedence mapped here. Our # operator will have the same precedence as !=, so we change the method as follows:

TreeInfo.java
 
  public static int opPrec(int op) {
    switch (op) {
    case JCTree.POS:
    case JCTree.NEG:
    case JCTree.NOT:
    case JCTree.COMPL:
    case JCTree.PREINC:
    case JCTree.PREDEC:
      return prefixPrec;
    case JCTree.POSTINC:
    case JCTree.POSTDEC:
    case JCTree.NULLCHK:
      return postfixPrec;
...
    case JCTree.EQ:
case JCTree.NE2:
case JCTree.NE: return eqPrec; case JCTree.LT: case JCTree.GT: ...
 

TreeInfo also contains an array with the names of all operators. The table is filled in the constructor, so we must add our operator there, too:

TreeInfo.java
 
  private TreeInfo(Context context) {
    context.put(treeInfoKey, this);
    Name.Table names = Name.Table.instance(context);
    opname[JCTree.POS - JCTree.POS] = names.fromString("+");
    opname[JCTree.NEG - JCTree.POS] = names.hyphen;
    opname[JCTree.NOT - JCTree.POS] = names.fromString("!");
    opname[JCTree.COMPL - JCTree.POS] = names.fromString("~");
    ...
    opname[JCTree.EQ - JCTree.POS] = names.fromString("==");
    opname[JCTree.NE - JCTree.POS] = names.fromString("!=");
opname[JCTree.NE2 - JCTree.POS] = names.fromString("#");
opname[JCTree.LT - JCTree.POS] = names.fromString("<"); opname[JCTree.GT - JCTree.POS] = names.fromString(">"); ...
 

Soruce code view and Pretty

The com.sun.tools.javac.tree.Pretty class is a class that produces a textual representation of the AST. You can view it as performing the more or less reverse operation of a parser, except that it doesn't keep original ignorable elements (comments, whitespaces, etc.). Any change you make to the language requires that you also alter this class accordingly. In our case, we have to change the operatorName() method that maps tags to operator names. Why the TreeInfo table is not used here is a mystery for me, as this seems totally redundant:

Pretty.java
 
    public String operatorName(int tag) {
        switch(tag) {
            case JCTree.POS:     return "+";
            case JCTree.NEG:     return "-";
            case JCTree.NOT:     return "!";
...
            case JCTree.AND:     return "&&";
            case JCTree.EQ:      return "==";
            case JCTree.NE:      return "!=";
case JCTree.NE2: return "#";
case JCTree.LT: return "<"; case JCTree.GT: return ">"; ...
 

 

Comments

 

Add a Comment

Name (optional)
EMail (optional, will not be displayed)

Text