Rapid prototyping parser generator.
Follows list of parser generator features.
- Generates parser lexical and syntactical analyzers based on input file.
- Lexical symbols are described by regular expressions.
- Language grammar is described by SLR(1) grammar.
- Generated parser can be immediately tested on source string.
- Semantic rules of language can be tested by Lua scripts, that are binded to each rule reduction.
- Debugged and fine tuned parser can be generated in form of C/C++, JavaScript, Rust, PHP, or AWK code.
Need for fast parser generation and testing.
Examples of rule files used to generate parsers are placed in directory:
build/rules
Follows few simple examples demonstrating yapgen possibilities.
Examples of basic regular expressions.
oct_int {'0'.<07>*}
dec_int {<19>.d*}
hex_int {'0'.[xX].(<09>+<af>+<AF>).(<09>+<af>+<AF>)*}
if {"if"}
else {"else"}
equal {'='}
plus_equal {"+="}
minus_equal {"-="}
comment_sl {"//".!'\n'*.'\n'}
comment_ml {"/*".(!'*'+('*'.!'/'))*."*/"}
Regular expressions can be used to recognize binary data.
PACKET_ADDRESS {"/?".(<09>+<az>+<AZ>)*.'!'."\x0d\x0a"}
PACKET_IDENTIFY {'/'.<AZ>.<AZ>.(<AZ>+<az>).<09>.(|/!\x0d|)*."\x0d\x0a"}
PACKET_ACK_COMMAND {'\x06'.<09>.(<09>+<az>).<09>."\x0d\x0a"}
PACKET_ACK {'\x06'}
Example of basic grammar rules. Identifiers closed in angle (sharp) brackets
e.g. <command> identifies nonterminal symbols of grammar, and identifiers
without brackets e.g. if refers to terminal symbols described by regular
expressions.
<command> -> if <condition> <if_else> ->> {}
<if_else> -> <command> ->> {}
<if_else> -> <command> else <command> ->> {}
<command> -> <while_begin> <condition> <command> ->> {}
<while_begin> -> while ->> {}
Grammar rules can have semantic code binded to them.
<F> -> <F> double_equal <E> ->>
{
if gen_parse_tree == 1 then
this_idx = node_idx;
node_idx = node_idx + 1;
print(" node_"..this_idx.." [label = \"<exp> == <exp>\"]");
print(" node_"..this_idx.." -> node_"..table.remove(node_stack).."");
print(" node_"..this_idx.." -> node_"..table.remove(node_stack).."");
table.insert(node_stack,this_idx);
else
print(table.concat(tabs,"").."operator binary double_equal");
end
}
Follows example of complete parser rules file.
init_code: {s = {\};}
terminals:
oct_int_const {'0'.<07>*}
dec_int_const {<19>.d*}
hex_int_const {'0'.[xX].(<09>+<af>+<AF>).(<09>+<af>+<AF>)*}
lr_br {'('}
rr_br {')'}
plus {'+'}
minus {'-'}
asterisk {'*'}
slash {'/'}
percent {'%'}
_SKIP_ {w.w*}
_END_ {'\0'}
nonterminals:
<start> <exp> <C> <B> <A>
rules:
<start> -> <exp> _END_ ->> {}
<exp> -> <C> ->> {print("result: "..s[#s]);}
<C> -> <C> plus <B> ->> {s[#s-1] = s[#s-1] + table.remove(s);}
<C> -> <C> minus <B> ->> {s[#s-1] = s[#s-1] - table.remove(s);}
<C> -> <B> ->> {}
<B> -> <B> asterisk <A> ->> {s[#s-1] = s[#s-1] * table.remove(s);}
<B> -> <B> slash <A> ->> {s[#s-1] = s[#s-1] / table.remove(s);}
<B> -> <B> percent <A> ->> {s[#s-1] = s[#s-1] % table.remove(s);}
<B> -> <A> ->> {}
<A> -> lr_br <C> rr_br ->> {}
<A> -> oct_int_const ->> {table.insert(s,tonumber(rule_body(0),8));}
<A> -> dec_int_const ->> {table.insert(s,tonumber(rule_body(0),10));}
<A> -> hex_int_const ->> {table.insert(s,tonumber(rule_body(0)));}
Parser generated from presented rule string will generate following result for following input string.
5*(10 + 5) - 0x10
result: 59
Programming language Lua of version 5.2 or greater is required for yapgen compilation.
The container generator cont is needed
for compilation of parser generator.
Enter build directory build.
cd build
cmake -DCMAKE_BUILD_TYPE="Release" ..
make
yapgen --parser_descr <file> - create parser from description file
--parser_save_cc <file> - save parser source in language C to file
--parser_save_js <file> - save parser source in JavaScript to file
--parser_save_rust <file> - save parser source in Rust to file
--parser_save_awk <file> - save parser source in AWK to file
--parser_save_php <file> - save parser source in PHP to file
--source <file> - load and parse source file
Example parsers are located in directory
build/rules.