Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Parser

A parser is a function that parses a stream of tokens and returns a result denoting whether the stream is syntactically valid or not.
The parser function used for the classic compiler is generated by Bison. classical.y contains all the configuration (i.e. the grammar, production rules) that is used by Bison to generate the parser function.
Although the modules of the compiler are written in C++, Bison's C-Language interface is used, as it is more extensively documented. This means the parser function module is a C file.

Build the parser

To generate the parser with Bison run make bison, this command will create classical.tab.c and states.bison. The former is the implementation file of the parser function and the latter is a file containing all states and items of the parser automaton. The states file is particularly handy for debugging, you can use it to examine state items and validate that your grammar defines the language as you expect.
To build the parser run make parser. If compilation fails you need to examine why. Keep in mind that the flag -Werror is passed to the compiler; this flag promotes all warnings to errors and compilation fails when errors occur.

Test the parser

Unit tests are written for the ast/nodes and ast/builder modules. You can run these tests with make test-nodes and make test-builder, respectively.
Integration tests are written for the whole parser module, where its output is examined given a fixed input stream of tokens. You can run the parser integration tests with make test-parser.

Run the parser

Run make run-parser to run the parser. This will by default feed the tests/test.classic file to the parser. You can modify this file as you'd like, to observe the parser's output and experiment with different programs written in classic.

Abstract Syntax Tree

The Abstract Syntax Tree (AST) is a tree consisting of nodes that represent grammar symbols. Non-terminal symbols are nodes with children and terminal symbols are leaf nodes.
In the case of the production rule assign-stm = ids , "=" , exp here's an example of the corresponding AST for the statement:
price = before_tax + tax;

Untitled Diagram drawio(1)(1)

Generic base types

According to the below rules, an assign-stm is a statement, just as a compound-stm or an exp can be.
statement = [ non-empty-stm ]
non-empty-stm = compound-stm | assign-stm | exp
So, statement is a generic representation of a statement and compound-stm, assign-stm and exp are specific statement types. Similarly, exp is a generic representation of an expression.

The concept of generic representation is expressed through generic base type AST node classes. e.g. there is one generic Expression_ class and the more specific ParenthesesExpression_, BinaryOperationExpression_, FunctionCallExpression_, LiteralExpression_ and VariableExpression_ classes.
All AST node classes are suffixed with an underscore. Pointers to those classes do not have suffices, e.g. Expression is equal to Expression_*.
The below class diagram depicts the aforementioned classes.

Untitled Diagram drawio(1)(2)

The generic class Expression_ has a type attribute and a downcast method; these enable the caller to create a specific expression object from the generic. While parsing, objects of the specific classes are created, but what gets pushed to the stack is a generic object; this is accomplished through the upcast method that specific classes implement.
In summary:

  • A generic object is converted to a specific one with the downcast method
  • A specific object is converted to a generic one with the upcast method
  • The AST nodes that are pushed to the stack are generic objects

Implementation

Here's a picture of the parsing process this compiler implements.

Untitled Diagram drawio(1)