Skip to content

izuzanak/yapgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

148 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

yapgen - parser generator

CI

Rapid prototyping parser generator.

Parser generator features

Follows list of parser generator features.

Feature list

  • Generates parser lexical and syntactical analyzers based on input file.
  • Lexical symbols are described by regular expressions.
  • Language grammar is described by SLR(1) grammar.
  • Generated parser can be immediately tested on source string.
  • Semantic rules of language can be tested by Lua scripts, that are binded to each rule reduction.
  • Debugged and fine tuned parser can be generated in form of C/C++, JavaScript, Rust, PHP, or AWK code.

Motivation

Need for fast parser generation and testing.

Rule examples

Examples of rule files used to generate parsers are placed in directory: build/rules

Inline examples

Follows few simple examples demonstrating yapgen possibilities.

Regular expressions

Examples of basic regular expressions.

oct_int     {'0'.<07>*}
dec_int     {<19>.d*}
hex_int     {'0'.[xX].(<09>+<af>+<AF>).(<09>+<af>+<AF>)*}

if          {"if"}
else        {"else"}

equal       {'='}
plus_equal  {"+="}
minus_equal {"-="}

comment_sl  {"//".!'\n'*.'\n'}
comment_ml  {"/*".(!'*'+('*'.!'/'))*."*/"}

Regular expressions can be used to recognize binary data.

PACKET_ADDRESS     {"/?".(<09>+<az>+<AZ>)*.'!'."\x0d\x0a"}
PACKET_IDENTIFY    {'/'.<AZ>.<AZ>.(<AZ>+<az>).<09>.(|/!\x0d|)*."\x0d\x0a"}
PACKET_ACK_COMMAND {'\x06'.<09>.(<09>+<az>).<09>."\x0d\x0a"}
PACKET_ACK         {'\x06'}

Grammar rules

Example of basic grammar rules. Identifiers closed in angle (sharp) brackets e.g. <command> identifies nonterminal symbols of grammar, and identifiers without brackets e.g. if refers to terminal symbols described by regular expressions.

<command> -> if <condition> <if_else> ->> {}
<if_else> -> <command> ->> {}
<if_else> -> <command> else <command> ->> {}

<command> -> <while_begin> <condition> <command> ->> {}
<while_begin> -> while ->> {}

Grammar rules can have semantic code binded to them.

<F> -> <F> double_equal <E> ->>
{
  if gen_parse_tree == 1 then
     this_idx = node_idx;
     node_idx = node_idx + 1;
     print("   node_"..this_idx.." [label = \"<exp> == <exp>\"]");
     print("   node_"..this_idx.." -> node_"..table.remove(node_stack).."");
     print("   node_"..this_idx.." -> node_"..table.remove(node_stack).."");
     table.insert(node_stack,this_idx);
  else
     print(table.concat(tabs,"").."operator binary double_equal");
  end
}

Parser rule file

Follows example of complete parser rules file.

init_code: {s = {\};}

terminals:
  oct_int_const {'0'.<07>*}
  dec_int_const {<19>.d*}
  hex_int_const {'0'.[xX].(<09>+<af>+<AF>).(<09>+<af>+<AF>)*}

  lr_br    {'('}
  rr_br    {')'}

  plus     {'+'}
  minus    {'-'}
  asterisk {'*'}
  slash    {'/'}
  percent  {'%'}

  _SKIP_   {w.w*}
  _END_    {'\0'}

nonterminals:
  <start> <exp> <C> <B> <A>

rules:
  <start> -> <exp> _END_  ->> {}
  <exp> -> <C>            ->> {print("result: "..s[#s]);}
  <C> -> <C> plus <B>     ->> {s[#s-1] = s[#s-1] + table.remove(s);}
  <C> -> <C> minus <B>    ->> {s[#s-1] = s[#s-1] - table.remove(s);}
  <C> -> <B>              ->> {}
  <B> -> <B> asterisk <A> ->> {s[#s-1] = s[#s-1] * table.remove(s);}
  <B> -> <B> slash <A>    ->> {s[#s-1] = s[#s-1] / table.remove(s);}
  <B> -> <B> percent <A>  ->> {s[#s-1] = s[#s-1] % table.remove(s);}
  <B> -> <A>              ->> {}
  <A> -> lr_br <C> rr_br  ->> {}
  <A> -> oct_int_const    ->> {table.insert(s,tonumber(rule_body(0),8));}
  <A> -> dec_int_const    ->> {table.insert(s,tonumber(rule_body(0),10));}
  <A> -> hex_int_const    ->> {table.insert(s,tonumber(rule_body(0)));}

Parser generated from presented rule string will generate following result for following input string.

5*(10 + 5) - 0x10
result: 59

Building parser generator

Programming language Lua of version 5.2 or greater is required for yapgen compilation.

The container generator cont is needed for compilation of parser generator.

Linux compilation

Enter build directory build.

cd build
cmake -DCMAKE_BUILD_TYPE="Release" ..
make

Usage

yapgen --parser_descr <file>     - create parser from description file
       --parser_save_cc <file>   - save parser source in language C to file
       --parser_save_js <file>   - save parser source in JavaScript to file
       --parser_save_rust <file> - save parser source in Rust to file
       --parser_save_awk <file>  - save parser source in AWK to file
       --parser_save_php <file>  - save parser source in PHP to file
       --source <file>           - load and parse source file

Linux example parsers

Example parsers are located in directory build/rules.