Página 3
Getting started, using Eclipse
Página 4
Índice
See this article in english 
Página 5
Duplicating an existing operator. Step 2 - The Parser

Lo sentimos - la página solicitada no está disponible en castellano. Mostrando la versión : en

Duplicating an existing operator. Step 1 - The Lexer

Alexander Hristov

Ok, let's get started. When tackling a complex piece of software, it's wise to try to do just a single thing at a time, and learn a single thing at a time. And the best way to start is by trying to duplicate some existing functionality.

We'll start by defining a new operator #, that will mirror the behaviour of !=. It will be just another way of comparing if two things are different. You will see how even this "simple" task will reveal a great deal of the internal structure of the compiler.

Changing the tokenizer

Since this is also a new token, we first go to the list of tokens, which is contained in com.sun.tools.javac.parser.Token. Open it and add the following line to the enumeration:

Token.java
 
@Version("@(#)Token.java  1.24 06/11/11")
public enum Token {
    EOF,
    ERROR,
...
    LTEQ("<="),
    GTEQ(">="),
    BANGEQ("!="),
POUND("#"),
AMPAMP("&&"), BARBAR("||"), PLUSPLUS("++"), ...
 

All tokens are added to a token table called "Keywords" (located in com.sun.tools.javac.parser.Keywords), which provides a mapping service between Tokens and Strings. The process of incorporating tokens into the keywords table is automatic - you needn't do anything unless your token corresponds to a family of character sequences. For example,the INTLITERAL token corresponds to a whole set of character sequences, so it must be specifically processed when mapping it to a String. However, this is not the case with our new # operator, so we'll leave that class as is for now.

As most compilers, the java compiler has a lexer (which reads characters from the input stream and converts them into tokens), and a parser (which builds an abstract syntax tree - AST - based on the tokens received from the lexer). The compiler has been designed to allow different lexers to be used. The behaviour of each of these lexers is determined by the com.sun.tools.javac.parser.Lexer interface. The default lexer provided resides in the com.sun.tools.javac.parser.Scanner class.

Since the java lexical structure is pretty straight-forward, the Scanner is nothing special. As it usually happens, the majority of the work goes towards determining if a sequence of character represents a number in one of the many possible notations ("2","-2","-2.0","2e1","0x02","02", etc..) and identifiers.

The scanOperator() and isSpecial() methods of Scanner are the ones in charge of recognizing operators in the character stream. Unless you do some pretty radical changes in the language, you don't need to touch scanOperator(). isSpecial(), however, needs to know if a specific character can be part of an operator. Since we are adding an operator that relies on a character (#) that wasn't previously recognized as possibly belonging to an operator, we must add that case:

Scanner.java
 
/** Return true if ch can be part of an operator.  */
private boolean isSpecial(char ch) {
  switch (ch) {
    case '!': case '%': case '&': case '*': case '?':
    case '+': case '-': case ':': case '<': case '=':
    case '>': case '^': case '|': case '~':
    case '@':
case '#':
return true; default: return false; } }
 

Ok, we are done with the lexer. Now it recognizes and tokenizes properly our operator.

 

Comentarios

15/12/2006 a las 05:30 Enviado por gaurav v bagga
Now there is new class Scanner where you have isSpecial(char c) method

 

Añadir Comentario

Nombre (opcional)
EMail (opcional, no se muestra)

Texto