jb -- Parser and Lexer Generation for Java


Overview and User Manual

Last Updated: 9 October 1997
Latest Version: jb-5.4

[Description | Installation | Source Tree | Parser Overview ]
[Lexer Overview | Examples | JDK 1.0.2 Compatibility ]


Description

For a description of the functionality of jb, as well as information about obtaining jb, see jb.html.

Installation

The jb installation process essentially consists of untarring the jb tar file (after any needed decompression) into a directory of your choice. The jb code itself, as opposed to Bison and flex, is entirely in Java, so there should be no reason to recompile to produce new .class files to replace the ones that come with standard installation.

If, for whatever reason, you decide to rebuild the .class files, you may do so as follows.

  1. Go into the regexp directory.
  2. Re-compile the java files by executing the following command: make clean all
  3. Go into the jbf directory.
  4. Re-compile the java files by executing the following command: make clean all Note that compiling subst.java may produce warnings about deprecated features. You can ignore them.

Once the files are installed, and possibly re-compiled, you may go into the examples directory and carry out the processes as described in the Parser Construction and in the Lexer Construction sections.


Source Tree Structure

This source tree contains a number of files and subdirectories.

Parser Overview

The following discussion assumes substantial familiarity with parsing using bison or yacc and with lexing using lex or flex.

Parser Structure

A jb parser consists of the following parts.
YYparse.java
The file YYparse.java is generated from jbf/yyparse.template. It is the primary runtime parse engine, and it contains one class: YYparse. The YYparse class is the primary parse class, and it must be instantiated to create a parser instance. Note that the class name (``YYparse'') is the default. The substitution flags note below indicates how to provide an alternate class name.

The YYparse class methods are as follows.

  • YYparse -- the constructor takes an instance of a lexical scanner (of class YYlex) as an argument.
  • setdebug(boolean) -- turns verbose debug output on or off.
  • setdebug(level) -- turns verbose debug to a specified level.
  • yyparse -- executes the parse engine.
  • yyerror -- invoked when a parse error occurs.
  • yyerror_verbose -- can be invoked from within yyerror to generate a more verbose syntax error message (see the calculator example).

YYtokentypes.java
The file YYtokentypes.java is generated from jbf/yytokentypes.template. It contains one abstract class: YYtokentypes and several constructed definitions.
  • The token-type integer constants shared by the parser and the lexical scanner.
  • A table of names corresponding to the integer token types; this table is normally accessed via the procedure YYtokentypes.tokenname().
  • The non-terminal type integer constants.
  • A table of names corresponding to the integer non-terminal types; this table is normally accessed via the procedure YYtokentypes.nontermname().

YYtoken.java
The file jbf/YYtoken.java (was Token.java) defines an object to contain the lexeme value extracted by YYlex. The YYtoken class contains an integer value typically taken from the token values in class YYtokentype. Additionally, it contains a StringBuffer of the text of the token and a YYlocation instance. This latter contains the line number and character number of the start of that token string.

YYlex.java
The file YYlex.java is the lexical scanner used by the parser, and defines the class YYlex. The parser is passed an instance of this class at construction time. This lexer can be constructed in two ways as described below in section Lexer Overview. As a rule, your lexer will import the YYtokentypes class to get access to the tokentype values. But it could also be passed through jb to insert the Tokentype constants directly (see the idl example below).

The YYlex class methods are as follows.

  • YYlex -- the constructor takes an InputStream as an argument.
  • setdebug(boolean) -- turns verbose debug output on or off.
  • setdebug(level) -- turns verbose debug to a specified level.
  • yylex -- repeatedly called to return an instance of YYtoken.
  • yylexerror -- invoked when a parse error occurs.
  • yyreport -- dump the contents of a YYtoken instance.

jbf/{ParseException.java, YYnode_Stack.java, etc.}
These files contain various support classes used by the parser. Because they are not templates, these files, as well as the Token (sub)classes can be shared by all parsers and lexers via the package named jbf.

Parser Construction Process

We will use the compilation of the calculator example (see examples/calc.y) to show how a parser is ultimately compiled to a .class file. The sequence for building the calculator can be seen by looking in the file examples/Makefile. It essentially consists of the following sequence of actions.

  1. Run bison on the parser specification ``calc.y'' using the following command, for example, to produce file calc.tab.c.
    bison calc.y
  2. Run the jb java program.
    jexec java jbf.jb -yyerror yyverror calc.tab.c \
            ../jbf/yyparse.template YYparse.java \
            ../jbf/yytokentypes.template YYtokentypes.java \
            calc.lex.java YYlex.java
    
    Note that the lexer file (calc.lex.java) will need to have been constructed using the one of the lexer generation methods described in the Lexer Overview. Also, the substitution flag causes the calls to yyerror() to be replaced with a call to yyverror(), which is defined in the suffix section of calc.y.

  3. Compile the parser java files with the following command.
    javac -g Main.java YYparse.java YYlex.java YYtokentypes.java

  4. execute the parser with the following command.
    java Main < calc.cmds
    or
    java Main -debug < calc.cmds
    or, for really voluminous output,
    java Main -debug -l 100 < calc.cmds

The jb Command:

The jb command is a java program for extracting information from a bison produced parser file and inserting it into other files at appropriate places and in appropriate form. Any argument beginning with "--" is assumed to be an option to jb.

The current list of options is as follows.

Additionally, jb will accept flag arguments of the following form.

 -flagname flagvalue 
As it scans its input files, it looks for instances of @flagname@. If it encounters an instance of that flag, then it will substitute the corresponding flagvalue string. If @flagname@ is encountered in the template, and no flag argument is specified, then the flag defaults to an internal value, which is usually the same as the flagname (e.g., @YYparse@ defaults to ``YYparse.'').

The list of flags can be seen by using the --flags argument. A partial list would include the following flags, where the default is in parenthesis if it is different from the flag name.

After options and flags have been extracted, jb interprets its first argument as the input file. Presumably this file was produced as the output from bison. The rest of the arguments to jb are paired together. The first file of each pair is the template file and the second is the name of the file in which to place the result. Each template is read a line at a time and each line is written to the output file. Simultaneously, it is scanned for selected markers (and flags), and when one is encountered, the corresponding information from the input file (suitably modified for java). is inserted into the output file.

The current list of markers is as follows.

Parser Details


Lexer Overview

The following discussion assumes substantial familiarity with parsing using bison or yacc and with lexing using lex or flex.

Two distinct methods for lexer construction are provided. First, there is the Gnu flex system, and second there is a generic, hand-coded lexer.

Flex and jf

There is a program, jf, that corresponds to jb, but for processing the output from flex. As with jb, the ``--flags'' argument can be used to see the list of substitutable flags. A partial list would include the following flags, where the default is in parenthesis if it is different from the flag name.

Additionally, jf supports three flags that indicate optimization levels for lexers. These flags obviously correspond to the same flags (with one less dash) for the flex command.

Note that no other combinations of flex flags is supported. Each of these flags requires a corresponding lexer template. There are three: jbf/yylex.Cem.template, jbf/yylex.Cfe.template, and jbf/yylex.Cfa.template. If you get the wrong combination of flex flags, jf flags, and template, then jf will complain.

Lexer Construction Process

Using the lexer specification examples/calc.lex, flex and jf would be used as follows.

  1. Execute the following command to produce calc.yy.c.
    flex -ocalc.yy.c calc.lex

  2. Execute the following command to produce calc.lex.java
    jexec java jbf.jf --Cem calc.yy.c $JBFPATH/jbf/yylex.Cem.template calc.lex.java
    If the output file, calc.lex.java does contain parser flags or markers (such at @TOKENTYPES@), then it need to be further passed through jb.

Flex-Generated Lexer Details

In order to use flex to generate lexers, you will have to obey some conventions.

As with the jb, and after options and flags have been extracted, jf interprets its first argument as the input file. Presumably this file was produced as the output from bison. The rest of the arguments to jf are paired together. The first file of each pair is the template file and the second is the name of the file in which to place the result. Each template is read a line at a time and each line is written to the output file. Simultaneously, it is scanned for selected markers (and flags), and when one is encountered, the corresponding information from the input file (suitably modified for java) is inserted into the output file.

The current list of markers is as follows.

The Generic Lexer

The second method for producing a lexer is to take the file jbf/yylex.generic, which is written in java, and modify it to recognize the tokens as defined for the language you are parsing. Basically, yylex.generic implements a classical, hand-coded lexer recognizing comments, strings, numbers, identifiers, keywords, and miscellaneous delimiters. The reason one might want to use this rather than the flex lexer is that the generic is smaller and faster than any of the flex lexers. But on the other hand, the generic lexer is harder to maintain.

In order to use the generic lexer, you should copy it to a local file and modify as needed. If you look at the end of it, you will see a routine called "definekeywords", and which is set up to be processed in the examples Makefile to generate either a lexer for the calculator or for java. I do not recommend that you follow this practice; it is only to simplify the examples construction.


Example Grammars

A variety of examples have been provided to show how the system is used.

[Calculator | CORBA IDL | Java ]


Calculator Example

The simplest example is the usual calculator example, where the calculator files are as follows.

If you examine the Makefile, you will see that there are two entries: calc.lex.java and calcx.lex.java If you invoke make with E=calc or E=calcx, then you can produce a lexer using either flex or jbf/yylex.generic, respectively.

CORBA IDL Example

A parser and lexer for the CORBA IDL (Interface Definition Language) is provided as as a second example. It attempts to produce a parse tree using the non-terminal mechanism. Note that this parse tree construction has not been significantly tested yet.

The IDL parser files are as follows.

One interesting thing about this example is the method for accessing token type values in the lexer. Normally, one might do doing something like the following.

return YYtokentypes.INTEGER_LITERAL
Instead, the token definitions are directly inserted into YYlex by inserting @TOKENTYPES@ in idl.lex and processing the lexical output file (idl.lex.java) through jb.

Java Example

A parser and lexer for the Java programming language is provided as as a third example. The grammar and lexicon were taken from the Java specifications at JavaSoft. They were modified to pass thru bison and flex and probably contain many errors. The set of files (java.y, java.lex, etc.) parallel the other examples.

JDK 1.0.2 Compatibility

As of JB Version 5.0, the default is to conform to the JDK 1.1 / JDK 1.1.1 API. But markers have been left in several files to allow for recompilation using JDK 1.0.2. The files are as follows.

At several places within these files, you will see something like the following.
/*JDK1.1.1*/
    target.flush(); target.close(); src.close();
/**/

/*JDK1.0.2
    ftarget.flush(); ftarget.close(); fsrc.close();
*/

For each occurrence of the above pattern, you should uncomment the 1.0.2 code and commment out the 1.1.1 code to produce something like the following.

/*JDK1.1.1
    target.flush(); target.close(); src.close();
*/

/*JDK1.0.2*/
    ftarget.flush(); ftarget.close(); fsrc.close();
/**/