Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Curtis Dunham
Curtis Dunham
Agenda
We will cover techniques for disciplined Perl, briey introduce nite automata and regular languages and then continue into regular expressions. Disciplined Perl
warnings strict
Finite Automata
Deterministic Nondeterministic
Curtis Dunham
Perl warnings
Perl can give you many types of warnings. For example, it can warn you about: Wrong sigil for %hash{$key} Troublesome string-to-numeric conversions Only using a variable once Incorrect uses of barewords ... and many, many other things.
Curtis Dunham
Use #!/usr/bin/perl -w This turns on warnings for everything Perl loads and runs by default. If someone says -w, they mean warnings.
Curtis Dunham
Enable lexical warnings with use warnings; This turns on warnings for the current compilation unit (le, block, etc.) You can also turn off warnings within a block/scope using no warnings; See perllexwarn.
Curtis Dunham
use strict
Enable strict mode with use strict; This turns on stricture (a stricter form of warnings) for the current compilation unit (le, block, etc.) Perl can be strict about three things: References (refs) Variables (vars) Subroutines (subs)
Curtis Dunham
strict refs
use strict refs; Prohibits use of symbolic references. $ref = \$foo; # no problem: print $$ref; # hard reference $ref = "foo"; # so far so good... print $$ref; # symbolic ref: runtime error
Curtis Dunham
strict vars
use strict vars; Forces variables to be declared. my $a = 1; # my constitutes declaration our $b = 1; # so does our" $c = 1; # strict will complain (fatally)
Curtis Dunham
strict subs
use strict subs; Restricts use of barewords that arent subroutines. See the documentation for the details.
Curtis Dunham
Documentation review
Where to learn more: perlsub - scoping and subroutines perlreftut - reference tutorial perlref - references perllexwarn, warnings - warnings strict - use strict
Curtis Dunham
Documentation
More on regular expressions: perlrequick - quick start perlretut - tutorial perlreref - quick reference perlre - full reference
Curtis Dunham
Finite Automata
The terms nite automata and nite state machine are roughly equivalent. The former, however, has a precise mathematical meaning, whereas the latter is a more generic term. Lets break the terms down. Finite Automaton
State Alphabet Transition Start state Accept states
Language
Curtis Dunham
A Finite Automaton
[0-9]
[0-9] moreDigits [0-9] start [-+] sign [1-9] [1-9] firstDigit \. \. decimalPt [0-9] finalDigits
Curtis Dunham
Regular Languages
A language is regular if it can be detected by a nite automata. A nite automata can only match regular languages.
Curtis Dunham
Regular languages are closed under the following operations: Concatenation Union Repetition (Kleene star) Said another way: the concatenation of two regular languages is always a regular language, the union of two regular languages is always a regular language, and repetitions of a regular language is always a regular language.
Curtis Dunham
Concatenation Example
Curtis Dunham
Union Example
Curtis Dunham
Kleene star
Curtis Dunham
Matching a number
Remember the rst nite automaton of the lecture? Here it is as a regular expression. /[+-]?[1-9][0-9]*(\.[0-9]*)?/
Curtis Dunham
Recall that regular languages are closed under concatenation, union, and repetition (Kleene star). In regular expressions, we might also use these terms: Juxtaposition: ab Alternation: a|b Repetition: a* This is the real Perl syntax. Use juxtaposition for concatenation, alternation with the pipe operator for union, and the asterisk (star) for Kleene star.
Curtis Dunham
Atoms
We can build up regular expressions from atoms. There are a number of atoms we can use in Perl regular expressions. Literal characters Escaped double-quotable characters (\n, \t, etc.) . (anything wildcard) Character classes (this is not an exhaustive list)
Curtis Dunham
Juxtaposition
juxtaposition n.
1 2
an act or instance of placing close together or side by side the state of being close together or side by side.
Curtis Dunham
Example: Juxtaposition
/foo/
Curtis Dunham
Example: Alternation
/foo|bar/
Curtis Dunham
Example: Repetition
Curtis Dunham
Repetition intuitively
You can think of * as zero or more. As in the previous slide, one or more is quite often what you really want. This is what + means. /o*/ # zero or more os /o+/ # one or more os /oo*/ # same as /o+/ *, +, and other operators that express different types of repetition are called quantiers.
Curtis Dunham
While the precedence rules of regular expressions are often ignored, they are very important for understanding how your regular expressions work, and why certain constructs are often mistakes.
Curtis Dunham
Precedence example
Parens are used for grouping, but in this case, we can intuitively think of them as a way to override default precedence (just like in math and most programming languages, i.e. they have the highest precedence).
Curtis Dunham
Another quantier
You know the * and + quantiers. Sometimes you want to express that something is optional, i.e. you might not see it, or you might see it one time. For this, we would use ?. /o*/ /o+/ /o?/ # zero or more os # one or more os # zero or one o
Curtis Dunham
Anchors
In the previous examples, such as /foo/, the string foo might be found and successfully matched anywhere in the string. Sometimes we want to only successfully match at a given position within a string. /^foo/ # match foo only at the beginning /foo$/ # match foo only at the end # (or before newline at the end) The and $ hold the matching to a particular position that (intuitively) doesnt move, which is why we call such a metacharacter an anchor. Special note: and $ do not match characters.
Curtis Dunham
/^foo|bar$/ # vs. /^(foo|bar)$/ These two regular expressions recognize two very different languages.
Curtis Dunham
Curtis Dunham