Sei sulla pagina 1di 111

Parsing in Perl

Alberto Simões
ambs@cpan.org

YAPC::EU::2006

Alberto Simões Parsing in Perl


What we will talk about

Parsing...

what it is...

the tools to make it!

But not how to do it!

show some examples...

and compare their efficiency.

Alberto Simões Parsing in Perl


What we will talk about

Parsing...

what it is...

the tools to make it!

But not how to do it!

show some examples...

and compare their efficiency.

Alberto Simões Parsing in Perl


What we will talk about

Parsing...

what it is...

the tools to make it!

But not how to do it!

show some examples...

and compare their efficiency.

Alberto Simões Parsing in Perl


What we will talk about

Parsing...

what it is...

the tools to make it!

But not how to do it!

show some examples...

and compare their efficiency.

Alberto Simões Parsing in Perl


What we will talk about

Parsing...

what it is...

the tools to make it!

But not how to do it!

show some examples...

and compare their efficiency.

Alberto Simões Parsing in Perl


What we will talk about

Parsing...

what it is...

the tools to make it!

But not how to do it!

show some examples...

and compare their efficiency.

Alberto Simões Parsing in Perl


The Definitions

Alberto Simões Parsing in Perl


Parsing

In computer science, parsing is the process of analyzing an input


sequence (read from a file or a keyboard, for example) in order to
determine its grammatical structure with respect to a given formal
grammar. It is formally named syntax analysis. A parser is a
computer program that carries out this task. The name is
analogous with the usage in grammar and linguistics.

Parsing transforms input text into a data structure, usually a tree,


which is suitable for later processing and which captures the
implied hierarchy of the input. Generally, parsers operate in two
stages, first identifying the meaningful tokens in the input, and
then building a parse tree from those tokens.

Wikipedia (August 2006)

Alberto Simões Parsing in Perl


Parsing

In computer science, parsing is the process of analyzing an input


sequence (read from a file or a keyboard, for example) in order to
determine its grammatical structure with respect to a given formal
grammar. It is formally named syntax analysis. A parser is a
computer program that carries out this task. The name is
analogous with the usage in grammar and linguistics.

Parsing transforms input text into a data structure, usually a tree,


which is suitable for later processing and which captures the
implied hierarchy of the input. Generally, parsers operate in two
stages, first identifying the meaningful tokens in the input, and
then building a parse tree from those tokens.

Wikipedia (August 2006)

Alberto Simões Parsing in Perl


The Process

Lexical analysis is the processing of an input sequence of


characters (such as the source code of a computer program)
to produce, as output, a sequence of symbols called “lexical
tokens”, or just “tokens”. For example, lexers for many
programming languages convert the character sequence 123
abc into two tokens: 123 and abc (whitespace is not a token
in most languages). The purpose of producing these tokens is
usually to forward them as input to another program, such as
a parser.
Syntax analysis is a process in compilers that recognizes the
structure of programming languages. It is also known as
parsing.

Wikipedia (August 2006)

Alberto Simões Parsing in Perl


The Process

Lexical analysis is the processing of an input sequence of


characters (such as the source code of a computer program)
to produce, as output, a sequence of symbols called “lexical
tokens”, or just “tokens”. For example, lexers for many
programming languages convert the character sequence 123
abc into two tokens: 123 and abc (whitespace is not a token
in most languages). The purpose of producing these tokens is
usually to forward them as input to another program, such as
a parser.
Syntax analysis is a process in compilers that recognizes the
structure of programming languages. It is also known as
parsing.

Wikipedia (August 2006)

Alberto Simões Parsing in Perl


Approaches

Top-down parsing - A parser can start with the start symbol


and try to transform it to the input. Intuitively, the parser
starts from the largest elements and breaks them down into
incrementally smaller parts. LL parsers are examples of
top-down parsers.
Bottom-up parsing - A parser can start with the input and
attempt to rewrite it to the start symbol. Intuitively, the
parser attempts to locate the most basic elements, then the
elements containing these, and so on. LR parsers are examples
of bottom-up parsers. Another term used for this type of
parser is Shift-Reduce parsing

Wikipedia (August 2006)

Alberto Simões Parsing in Perl


Approaches

Top-down parsing - A parser can start with the start symbol


and try to transform it to the input. Intuitively, the parser
starts from the largest elements and breaks them down into
incrementally smaller parts. LL parsers are examples of
top-down parsers.
Bottom-up parsing - A parser can start with the input and
attempt to rewrite it to the start symbol. Intuitively, the
parser attempts to locate the most basic elements, then the
elements containing these, and so on. LR parsers are examples
of bottom-up parsers. Another term used for this type of
parser is Shift-Reduce parsing

Wikipedia (August 2006)

Alberto Simões Parsing in Perl


...boring...

Forget Wikipedia!

Alberto Simões Parsing in Perl


What is Parsing?

to recognize portions of text:


detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;

Alberto Simões Parsing in Perl


What is Parsing?

to recognize portions of text:


detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;

Alberto Simões Parsing in Perl


What is Parsing?

to recognize portions of text:


detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;

Alberto Simões Parsing in Perl


What is Parsing?

to recognize portions of text:


detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;

Alberto Simões Parsing in Perl


What is Parsing?

to recognize portions of text:


detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;

Alberto Simões Parsing in Perl


What is Parsing?

to recognize portions of text:


detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;

Alberto Simões Parsing in Perl


What is Parsing?

to recognize portions of text:


detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;

Alberto Simões Parsing in Perl


What is Parsing?

to recognize portions of text:


detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;

Alberto Simões Parsing in Perl


What is Parsing?

to recognize portions of text:


detect tokens;
integers, reals, strings, variables, reserved words, etc.
analyze a specific token sequence:
detect syntax;
define the order tokens make sense;
interpret the sequence and perform an action:
perform semantic actions;
execute the code defined; generate code;

Alberto Simões Parsing in Perl


So, Regular Expressions?

yes!
RegExp are good for tokens;
RegExps are good for regular expressions :-)

no!
most real grammars can’t be parsed with RegExps;

Alberto Simões Parsing in Perl


So, Regular Expressions?

yes!
RegExp are good for tokens;
RegExps are good for regular expressions :-)

no!
most real grammars can’t be parsed with RegExps;

Alberto Simões Parsing in Perl


So, Regular Expressions?

yes!
RegExp are good for tokens;
RegExps are good for regular expressions :-)

no!
most real grammars can’t be parsed with RegExps;

Alberto Simões Parsing in Perl


So, Regular Expressions?

yes!
RegExp are good for tokens;
RegExps are good for regular expressions :-)

no!
most real grammars can’t be parsed with RegExps;

Alberto Simões Parsing in Perl


So, Regular Expressions?

yes!
RegExp are good for tokens;
RegExps are good for regular expressions :-)

no!
most real grammars can’t be parsed with RegExps;

Alberto Simões Parsing in Perl


Then?

Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;

Alberto Simões Parsing in Perl


Then?

Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;

Alberto Simões Parsing in Perl


Then?

Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;

Alberto Simões Parsing in Perl


Then?

Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;

Alberto Simões Parsing in Perl


Then?

Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;

Alberto Simões Parsing in Perl


Then?

Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;

Alberto Simões Parsing in Perl


Then?

Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;

Alberto Simões Parsing in Perl


Then?

Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;

Alberto Simões Parsing in Perl


Then?

Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;

Alberto Simões Parsing in Perl


Then?

Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;

Alberto Simões Parsing in Perl


Then?

Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;

Alberto Simões Parsing in Perl


Then?

Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;

Alberto Simões Parsing in Perl


Then?

Typically:
flex for lexical analysis
(re2c for thread-safe and reentrancy);
bison for syntactic analysis
(lemon for thread-safe and reentrancy);
but that is for C;
Perl 5 has lexical analysis (RegExps);
Perl 5 doesn’t have Grammar Support;
but we have CPAN!;
Parse::RecDescent;
Parse::Yapp;
Parse::YALALR;
Perl 6 will have Grammar Support (Hurray!)
PGE — Parrot Grammar Engine;

Alberto Simões Parsing in Perl


What I’ve tested

flex + bison;
re2c + lemon;
Parse::RecDescent;
Parse::YAPP;
flex + Parse::YAPP;
Parrot Grammar Engine
flex+bison and re2c+lemon will appear just at the end, as a
baseline of efficiency.

Alberto Simões Parsing in Perl


What I’ve tested

flex + bison;
re2c + lemon;
Parse::RecDescent;
Parse::YAPP;
flex + Parse::YAPP;
Parrot Grammar Engine
flex+bison and re2c+lemon will appear just at the end, as a
baseline of efficiency.

Alberto Simões Parsing in Perl


What I’ve tested

flex + bison;
re2c + lemon;
Parse::RecDescent;
Parse::YAPP;
flex + Parse::YAPP;
Parrot Grammar Engine
flex+bison and re2c+lemon will appear just at the end, as a
baseline of efficiency.

Alberto Simões Parsing in Perl


What I’ve tested

flex + bison;
re2c + lemon;
Parse::RecDescent;
Parse::YAPP;
flex + Parse::YAPP;
Parrot Grammar Engine
flex+bison and re2c+lemon will appear just at the end, as a
baseline of efficiency.

Alberto Simões Parsing in Perl


What I’ve tested

flex + bison;
re2c + lemon;
Parse::RecDescent;
Parse::YAPP;
flex + Parse::YAPP;
Parrot Grammar Engine
flex+bison and re2c+lemon will appear just at the end, as a
baseline of efficiency.

Alberto Simões Parsing in Perl


What I’ve tested

flex + bison;
re2c + lemon;
Parse::RecDescent;
Parse::YAPP;
flex + Parse::YAPP;
Parrot Grammar Engine
flex+bison and re2c+lemon will appear just at the end, as a
baseline of efficiency.

Alberto Simões Parsing in Perl


What I’ve tested

flex + bison;
re2c + lemon;
Parse::RecDescent;
Parse::YAPP;
flex + Parse::YAPP;
Parrot Grammar Engine
flex+bison and re2c+lemon will appear just at the end, as a
baseline of efficiency.

Alberto Simões Parsing in Perl


My Test Case (1/2)
a simple calculator;
sums, subtractions, variables, prints;
BNF:
Program ← Statement Program
Statement
0
Statement ← Variable =0 Expression 0 ;0
0
print 0 Expression 0 ;0
0
Expression ← Expression −0 Expression
0
Expression +0 Expression
Variable
Number
Number ← /\d + /
Variable ← /[a − z] + /

Alberto Simões Parsing in Perl


My Test Case (1/2)
a simple calculator;
sums, subtractions, variables, prints;
BNF:
Program ← Statement Program
Statement
0
Statement ← Variable =0 Expression 0 ;0
0
print 0 Expression 0 ;0
0
Expression ← Expression −0 Expression
0
Expression +0 Expression
Variable
Number
Number ← /\d + /
Variable ← /[a − z] + /

Alberto Simões Parsing in Perl


My Test Case (1/2)
a simple calculator;
sums, subtractions, variables, prints;
BNF:
Program ← Statement Program
Statement
0
Statement ← Variable =0 Expression 0 ;0
0
print 0 Expression 0 ;0
0
Expression ← Expression −0 Expression
0
Expression +0 Expression
Variable
Number
Number ← /\d + /
Variable ← /[a − z] + /

Alberto Simões Parsing in Perl


My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
My Test Case (2/2)
automatic test generation;
randomly add, subtract and define variables;
randomly print variables;
example:
a = 10;
a = 150 - a + 350;
print a;
different test sizes:
10 lines;
100 lines;
1 000 lines;
10 000 lines;
100 000 lines;
1 000 000 lines;
2 000 000 lines;
4 000 000 lines;
6 000 000 lines;
Alberto Simões Parsing in Perl
Now, the results

Alberto Simões Parsing in Perl


Parse::RecDescent ID
Author: Damian Conway
Lastest Release: 1.94 (April 9, 2003)
Available from: CPAN

Alberto Simões Parsing in Perl


Parse::RecDescent rationale
⇑ full Perl implementation;
⇑ mixed lexical and syntactic analyzer in same code;
⇓ slow;
⇓ only support LL(1) grammars;

Alberto Simões Parsing in Perl


Parse::RecDescent
use Parse::RecDescent;
our %VAR;
my $grammar = q{
Program: Statement(s) /\Z/ { 1 }

Statement: Var ’=’ Expression ’;’ { $main::VAR{$item[1]} = $item[3]; }


| /print/ Expression ’;’ { print "> $item[2]\n"; }

Expression: Number ’+’ Expression { $item[1]+$item[3] }


| Number ’-’ Expression { $item[1]-$item[3] }
| Var ’+’ Expression { ($main::VAR{$item[1]} || 0) + $item[3] }
| Var ’-’ Expression { ($main::VAR{$item[1]} || 0) + $item[3] }
| Var { $main::VAR{$item[1]} || 0; }
| Number { $item[1]; }

Number: /+
./

Var: /[a-z]+/
};

my $parser = new Parse::RecDescent($grammar);


undef $/;
my $text = <STDIN>;
$parser->Program($text) or die "** Parse Error **\n";

Alberto Simões Parsing in Perl


Problems

Unfortunately, the program does not respect left association of the


operators. Couldn’t manage to solve that (didn’t try hard).

3 − 2 + 1 is evaluated as Number (3) − Expression(2 + 1), thus,


evaluating it to 0 instead of the correct answer: 2

Well, I had a cheat version, but it made the test program a lot
slower than it is at the moment.

Alberto Simões Parsing in Perl


Problems

Unfortunately, the program does not respect left association of the


operators. Couldn’t manage to solve that (didn’t try hard).

3 − 2 + 1 is evaluated as Number (3) − Expression(2 + 1), thus,


evaluating it to 0 instead of the correct answer: 2

Well, I had a cheat version, but it made the test program a lot
slower than it is at the moment.

Alberto Simões Parsing in Perl


Problems

Unfortunately, the program does not respect left association of the


operators. Couldn’t manage to solve that (didn’t try hard).

3 − 2 + 1 is evaluated as Number (3) − Expression(2 + 1), thus,


evaluating it to 0 instead of the correct answer: 2

Well, I had a cheat version, but it made the test program a lot
slower than it is at the moment.

Alberto Simões Parsing in Perl


Parse::RecDescent timings

test size spent time


10 0.104 s
100 0.203 s
1 000 1.520 s
10 000 87.310 s

Alberto Simões Parsing in Perl


Parse::RecDescent Memory Usage

perl recdes.pl 1,778,617,585,999 bytes x ms


bytes

x809F49D:Perl_safesysmal

6M

x809F54B:Perl_safesysrea
4M

2M heap-admin

0M
0.0 20000.0
40000.0
60000.0
80000.0
100000.0
120000.0
140000.0
160000.0
180000.0
200000.0
220000.0
240000.0 ms

test file with 10 000 lines


Alberto Simões Parsing in Perl
Parse::YAPP ID
Author: Francois Desarmenien
Lastest Release: 1.05 (Nov 4, 2001)
Available from: CPAN

Alberto Simões Parsing in Perl


Parse::YAPP rationale
⇑ full Perl implementation;
⇑ supports bison-like LR grammars;
⇓ you need to specify your own lexical analyzer;
⇓ slow for big input files...
if you do not prepare a good lexical analyzer;

Alberto Simões Parsing in Perl


Parse::Yapp
%left ’+’ ’-’
%%
Program : Statement
| Program Statement
;

Statement : Var ’=’ Expression ’;’ { $main::VAR$_[1] = $_[3] }


| Print Expression ’;’ { print "> $_[2]\n" }
;

Expression : Expression ’-’ Expression { $_[1] - $_[3] }


| Expression ’+’ Expression { $_[1] + $_[3] }
| Var { $main::VAR{$_[1]} || 0 }
| Number { $_[1] }
;
%%

our %VAR;
my $p = new Calc();
undef $/;
my $File = <STDIN>;
$p->YYParse( yylex => \&yylex,
yyerror => \&yyerror);

Alberto Simões Parsing in Perl


Parse::Yapp
sub yyerror {
if ($_[0]->YYCurtok) {
printf STDERR (’Error: a "%s" (%s) was fond where %s was expected’."\n",
$_[0]->YYCurtok, $_[0]->YYCurval, $_[0]->YYExpect)
} else {
print STDERR "Expecting one of ",join(", ",$_[0]->YYExpect),"\n";
}
}

sub yylex{
for($File){
1 while (s!^(\s+|\n)!!g); # Advance spaces
return ("","") if $_ eq ""; # EOF

# Tokens
s!^(\d+)!! and return ("Number", $1);
s!^print!! and return ("Print", "print");
s!^([a-z]+)!! and return ("Var", $1);
# Operators
s!([;+-=])!! and return ($1,$1);

print STDERR "Unexpected symbols: ’$File’\n" ;


}
}
Alberto Simões Parsing in Perl
Parse::YAPP timings

test size Parse::RecDescent Parse::YAPP


10 0.104 s 0.016 s
100 0.203 s 0.034 s
1 000 1.520 s 0.272 s
10 000 87.310 s 4.972 s
100 000 — 2 253.657 s

Alberto Simões Parsing in Perl


Parse::Yapp Memory Usage

perl Calc.pl 74,532,562,124 bytes x ms


bytes

1,200k

x809F49D:Perl_safesysmal
1,000k

800k

heap-admin

600k

400k
x809F54B:Perl_safesysrea

200k

0k
0.0 20000.0 40000.0 60000.0 ms

test file with 10 000 lines


Alberto Simões Parsing in Perl
Parse::YAPP + flex ID
Idea by: Alberto Simões
Latest Release: n/a
Available from: The Perl Review v0i3, 2002

Alberto Simões Parsing in Perl


Parse::YAPP+flex rationale
⇑ fast and robust for big input files;
⇑ supports bison-like LR grammars;
⇓ to glue Perl and C takes some work;
⇓ you need a C compiler;
⇓ you need to know a little of C and flex;

Alberto Simões Parsing in Perl


Parse::Yapp + flex: the lexical analyzer

%{
#define YY_DECL char* yylex() void;
%}

char buffer[15];

%%
"print" { return strcpy(buffer, "Print"); }
[0-9]+ { return strcpy(buffer, "Number"); }
[a-z]+ { return strcpy(buffer, "Var"); }
\n { }
" " { }
. { return strcpy(buffer, yytext); }
%%

int perl_yywrap(void) { return 1; }

char *perl_yylextext(void) { return perl_yytext; }

Alberto Simões Parsing in Perl


Parse::Yapp + flex: the syntactic analyzer

%left ’+’ ’-’


%%
Program : Statement
| Program Statement
;

Statement : Var ’=’ Expression ’;’ { $main::VAR$_[1] = $_[3] }


| Print Expression ’;’ { print "> $_[2]\n"; }
;

Expression : Expression ’-’ Expression { $_[1] - $_[3] }


| Expression ’+’ Expression { $_[1] + $_[3] }
| Var { $main::VAR{$_[1]} || 0 }
| Number { $_[1] }
;
%%
our %VAR;

Alberto Simões Parsing in Perl


Parse::Yapp + flex: just that?

NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;

Can you give details?


Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf

Alberto Simões Parsing in Perl


Parse::Yapp + flex: just that?

NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;

Can you give details?


Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf

Alberto Simões Parsing in Perl


Parse::Yapp + flex: just that?

NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;

Can you give details?


Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf

Alberto Simões Parsing in Perl


Parse::Yapp + flex: just that?

NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;

Can you give details?


Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf

Alberto Simões Parsing in Perl


Parse::Yapp + flex: just that?

NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;

Can you give details?


Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf

Alberto Simões Parsing in Perl


Parse::Yapp + flex: just that?

NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;

Can you give details?


Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf

Alberto Simões Parsing in Perl


Parse::Yapp + flex: just that?

NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;

Can you give details?


Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf

Alberto Simões Parsing in Perl


Parse::Yapp + flex: just that?

NO!
you need XS glue code;
you need some Perl glue code;
you need a decent makefile;

Can you give details?


Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;
http://alfarrabio.di.uminho.pt/~albie/
publications/perlflex.pdf

Alberto Simões Parsing in Perl


Parse::YAPP + flex timings

test size RecDescent YAPP YAPP + flex


10 0.104 s 0.016 s 0.034 s
100 0.203 s 0.034 s 0.049 s
1 000 1.520 s 0.272 s 0.174 s
10 000 87.310 s 4.972 s 1.168 s
100 000 — 2 253.657 s 12.145 s
1 000 000 — — 122.377 s
2 000 000 — — 264.219 s
4 000 000 — — 530.527 s
6 000 000 — — 800.705 s

Alberto Simões Parsing in Perl


Parse::Yapp + flex Memory Usage

perl parse.pl 20,106,601,308 bytes x ms


bytes

x809F49D:Perl_safesysmal

600k

heap-admin

400k

x4032CAF:perl_yyalloc

200k
x809F54B:Perl_safesysrea

0k
0.0 2000.0 4000.0 6000.0 8000.0 10000.012000.014000.016000.018000.020000.022000.0 ms

test file with 10 000 lines


Alberto Simões Parsing in Perl
Parrot Grammar Engine ID
Author: mostly, Patrick Michaud
Lastest Release: to be released yet
Available from: Parrot releases or Parrot SVN tree

Alberto Simões Parsing in Perl


PGE rationale
⇑ built-in in Perl 6;
⇑ includes constructs to simplify the LL(1) constrain;
m not yet fast... but we are working on it;
⇓ Mainly a top-down parser (although bottom-up should also be suppo
⇓ ATM you need to write semantic actions in PIR;

Alberto Simões Parsing in Perl


PGE implementation
grammar Benchmark;

token program { <?statement>+ }

rule statement {
| print <expression> ; {{ $I0 = match[’expression’];
print $I0; print "\n" }}
| <var> = <expression> ; {{ $P0 = match[’expression’];
$S0 = match[’var’]; set_global $S0, $P0 }}
}

rule expression { <value> [ <add> | <sub> ]* {{ $I0 = match[’value’]


# 25 lines removed...
.return($I0) }}
}
rule add { \+ <value> }
rule sub { \- <value> }

rule value { <number> {{ $I0 = match[’number’]; .return ($I0) }}


| <var> {{ $S0 = match[’var’];
$P0 = get_global $S0; $I0 = $P0; .return($I0) }}
}
token number { \d+ }
token var { <[a..z]>+ }
Alberto Simões Parsing in Perl
PGE timings

test size RecDescent YAPP YAPP + flex PGE


10 0.104 s 0.016 s 0.034 s 0.124 s
100 0.203 s 0.034 s 0.049 s 0.253 s
1 000 1.520 s 0.272 s 0.174 s 1.463 s
10 000 87.310 s 4.972 s 1.168 s 16.189 s
100 000 — 2 253.657 s 12.145 s 665.746 s
1 000 000 — — 122.377 s —
2 000 000 — — 264.219 s —
4 000 000 — — 530.527 s —
6 000 000 — — 800.705 s —

Alberto Simões Parsing in Perl


PGE Memory Usage

../../../../parrot -j main.pir 92,090,753,626 bytes x ms


bytes

8M x417A7DF:mem_sys_allocat

x417A73D:mem_sys_allocat
6M

x417A82F:mem__internal_a

4M

heap-admin

2M
x417A880:mem__sys_reallo

0M
0.0 2000.0 4000.0 6000.0 8000.0 10000.0 12000.0 ms

test file with 10 000 lines


Alberto Simões Parsing in Perl
Remember I had C implementations?

Let’s look into their memory usage.

Alberto Simões Parsing in Perl


Remember I had C implementations?

Let’s look into their memory usage.

Alberto Simões Parsing in Perl


Timings for C implementations

test size Parse:: Parse:: YAPP PGE re2c + flex +


RecDescent YAPP + flex lemon bison
10 0.104 s 0.016 s 0.034 s 0.124 s 0.001 s 0.001 s
100 0.203 s 0.034 s 0.049 s 0.253 s 0.001 s 0.001 s
1 000 1.520 s 0.272 s 0.174 s 1.463 s 0.002 s 0.002 s
10 000 87.310 s 4.972 s 1.168 s 16.189 s 0.009 s 0.009 s
100 000 — 2 253.657 s 12.145 s 665.746 s 0.089 s 0.103 s
1 000 000 — — 122.377 s — 0.850 s 0.862 s
2 000 000 — — 264.219 s — 1.896 s 1.891 s
4 000 000 — — 530.527 s — 4.327 s 3.604 s
6 000 000 — — 800.705 s — 5.681 s 5.665 s

Alberto Simões Parsing in Perl


flex+bison Memory Usage

parser 16,427,193 bytes x ms


bytes

x80492D9:yyalloc

60k

x40625FE:g_malloc0
40k

20k x401914F:posix_memalign

0k
0.0 50.0 100.0 150.0 200.0 250.0 300.0 350.0 ms

test file with 10 000 lines


Alberto Simões Parsing in Perl
re2c+lemon Memory Usage

parser 1,418,530 bytes x ms


bytes

x40625FE:g_malloc0
6k

x401914F:posix_memalign

4k

x8048BD2:ParseAlloc

2k

heap-admin

0k
0.0 50.0 100.0 150.0 200.0 250.0 300.0 ms

test file with 10 000 lines


Alberto Simões Parsing in Perl
Comparing them all

Alberto Simões Parsing in Perl


Performance Comparison
10000
re2c+lemon
bison+flex
1000 Parse::Yapp + flex
PGE
Parse::Yapp
100 Parse::RecDescent
Time (seconds)

10

0.1

0.01

0.001
10 100 1000 10000 100000 1e+06 1e+07
Test Size (lines)

Alberto Simões Parsing in Perl


Thanks!!

Luciano Rocha for the flex + bison and re2c + lemon


implementations;
Rúben Fonseca for the PGE idea;
Patrick Michaud and Kevin Tew for the PGE implementation;
and, of course, Larry Wall, Gloria Wall, Leopold Toetsch, Chip
Salzenberg, Allison Randal, Damian Conway, Anna
Kournikova, Francois Desarmenien, Jerry Gay, Will Coleda,
Simon Cozens, Vern Paxson, Jef Poskanzer, Kevin Gong, brian
d foy, Santa Claus, Audrey Tang, José João Almeida,
Batman, Jonathan Scott Duff, Nuno Carvalho, Marty Pauley,
León Brocard, Josette Garcia, James Tisdall, José Castro,
Michael Schwern, Pamela Anderson, Andy Lester, Abigail,
Nicholas Clark, Magda Joana Silva, Matt Diephouse, Ilya
Martynov, Wikipedia, Randal Schwartz, Dan Sugalski, Jon
Orwant, Tom Christiansen, Johan Vromans, ........................

Alberto Simões Parsing in Perl


Thanks!!

Luciano Rocha for the flex + bison and re2c + lemon


implementations;
Rúben Fonseca for the PGE idea;
Patrick Michaud and Kevin Tew for the PGE implementation;
and, of course, Larry Wall, Gloria Wall, Leopold Toetsch, Chip
Salzenberg, Allison Randal, Damian Conway, Anna
Kournikova, Francois Desarmenien, Jerry Gay, Will Coleda,
Simon Cozens, Vern Paxson, Jef Poskanzer, Kevin Gong, brian
d foy, Santa Claus, Audrey Tang, José João Almeida,
Batman, Jonathan Scott Duff, Nuno Carvalho, Marty Pauley,
León Brocard, Josette Garcia, James Tisdall, José Castro,
Michael Schwern, Pamela Anderson, Andy Lester, Abigail,
Nicholas Clark, Magda Joana Silva, Matt Diephouse, Ilya
Martynov, Wikipedia, Randal Schwartz, Dan Sugalski, Jon
Orwant, Tom Christiansen, Johan Vromans, ........................

Alberto Simões Parsing in Perl


Thanks!!

Luciano Rocha for the flex + bison and re2c + lemon


implementations;
Rúben Fonseca for the PGE idea;
Patrick Michaud and Kevin Tew for the PGE implementation;
and, of course, Larry Wall, Gloria Wall, Leopold Toetsch, Chip
Salzenberg, Allison Randal, Damian Conway, Anna
Kournikova, Francois Desarmenien, Jerry Gay, Will Coleda,
Simon Cozens, Vern Paxson, Jef Poskanzer, Kevin Gong, brian
d foy, Santa Claus, Audrey Tang, José João Almeida,
Batman, Jonathan Scott Duff, Nuno Carvalho, Marty Pauley,
León Brocard, Josette Garcia, James Tisdall, José Castro,
Michael Schwern, Pamela Anderson, Andy Lester, Abigail,
Nicholas Clark, Magda Joana Silva, Matt Diephouse, Ilya
Martynov, Wikipedia, Randal Schwartz, Dan Sugalski, Jon
Orwant, Tom Christiansen, Johan Vromans, ........................

Alberto Simões Parsing in Perl


Thanks!!

Luciano Rocha for the flex + bison and re2c + lemon


implementations;
Rúben Fonseca for the PGE idea;
Patrick Michaud and Kevin Tew for the PGE implementation;
and, of course, Larry Wall, Gloria Wall, Leopold Toetsch, Chip
Salzenberg, Allison Randal, Damian Conway, Anna
Kournikova, Francois Desarmenien, Jerry Gay, Will Coleda,
Simon Cozens, Vern Paxson, Jef Poskanzer, Kevin Gong, brian
d foy, Santa Claus, Audrey Tang, José João Almeida,
Batman, Jonathan Scott Duff, Nuno Carvalho, Marty Pauley,
León Brocard, Josette Garcia, James Tisdall, José Castro,
Michael Schwern, Pamela Anderson, Andy Lester, Abigail,
Nicholas Clark, Magda Joana Silva, Matt Diephouse, Ilya
Martynov, Wikipedia, Randal Schwartz, Dan Sugalski, Jon
Orwant, Tom Christiansen, Johan Vromans, ........................

Alberto Simões Parsing in Perl

Potrebbero piacerti anche