Sei sulla pagina 1di 32

CS 105 Perl: Introduction to Regular Expressions

Curtis Dunham

September 28, 2009

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Agenda
We will cover techniques for disciplined Perl, briey introduce nite automata and regular languages and then continue into regular expressions. Disciplined Perl
warnings strict

Finite Automata
Deterministic Nondeterministic

Regular Languages Regular Expressions

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Perl warnings

Perl can give you many types of warnings. For example, it can warn you about: Wrong sigil for %hash{$key} Troublesome string-to-numeric conversions Only using a variable once Incorrect uses of barewords ... and many, many other things.

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Perl warnings: the old way

Use #!/usr/bin/perl -w This turns on warnings for everything Perl loads and runs by default. If someone says -w, they mean warnings.

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Perl warnings: the new way

Enable lexical warnings with use warnings; This turns on warnings for the current compilation unit (le, block, etc.) You can also turn off warnings within a block/scope using no warnings; See perllexwarn.

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

use strict

Enable strict mode with use strict; This turns on stricture (a stricter form of warnings) for the current compilation unit (le, block, etc.) Perl can be strict about three things: References (refs) Variables (vars) Subroutines (subs)

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

strict refs

use strict refs; Prohibits use of symbolic references. $ref = \$foo; # no problem: print $$ref; # hard reference $ref = "foo"; # so far so good... print $$ref; # symbolic ref: runtime error

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

strict vars

use strict vars; Forces variables to be declared. my $a = 1; # my constitutes declaration our $b = 1; # so does our" $c = 1; # strict will complain (fatally)

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

strict subs

use strict subs; Restricts use of barewords that arent subroutines. See the documentation for the details.

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Documentation review

Where to learn more: perlsub - scoping and subroutines perlreftut - reference tutorial perlref - references perllexwarn, warnings - warnings strict - use strict

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Documentation

More on regular expressions: perlrequick - quick start perlretut - tutorial perlreref - quick reference perlre - full reference

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Finite Automata
The terms nite automata and nite state machine are roughly equivalent. The former, however, has a precise mathematical meaning, whereas the latter is a more generic term. Lets break the terms down. Finite Automaton
State Alphabet Transition Start state Accept states

Language

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

A Finite Automaton

[0-9]

[0-9] moreDigits [0-9] start [-+] sign [1-9] [1-9] firstDigit \. \. decimalPt [0-9] finalDigits

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Regular Languages

A language is regular if it can be detected by a nite automata. A nite automata can only match regular languages.

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Fundamental Constructions for Regular Languages

Regular languages are closed under the following operations: Concatenation Union Repetition (Kleene star) Said another way: the concatenation of two regular languages is always a regular language, the union of two regular languages is always a regular language, and repetitions of a regular language is always a regular language.

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Concatenation Example

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Union Example

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Kleene star

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Matching a number

Remember the rst nite automaton of the lecture? Here it is as a regular expression. /[+-]?[1-9][0-9]*(\.[0-9]*)?/

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Assembling regular expressions

Recall that regular languages are closed under concatenation, union, and repetition (Kleene star). In regular expressions, we might also use these terms: Juxtaposition: ab Alternation: a|b Repetition: a* This is the real Perl syntax. Use juxtaposition for concatenation, alternation with the pipe operator for union, and the asterisk (star) for Kleene star.

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Atoms

We can build up regular expressions from atoms. There are a number of atoms we can use in Perl regular expressions. Literal characters Escaped double-quotable characters (\n, \t, etc.) . (anything wildcard) Character classes (this is not an exhaustive list)

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Juxtaposition

juxtaposition n.
1 2

an act or instance of placing close together or side by side the state of being close together or side by side.

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Example: Juxtaposition

/foo/

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Example: Alternation

/foo|bar/

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Example: Repetition

/foo*/ Probably not what you want.

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Repetition intuitively

You can think of * as zero or more. As in the previous slide, one or more is quite often what you really want. This is what + means. /o*/ # zero or more os /o+/ # one or more os /oo*/ # same as /o+/ *, +, and other operators that express different types of repetition are called quantiers.

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Precedence in Regular Expressions

Precedence from high to low precedence:


1 2 3

Repetition: *, +, and other quantiers Juxtaposition Alternation

While the precedence rules of regular expressions are often ignored, they are very important for understanding how your regular expressions work, and why certain constructs are often mistakes.

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Precedence example

/foo*/ # vs. /(foo)*/

# match fo, foo, fooo, etc. # match , foo, foofoo, etc.

Parens are used for grouping, but in this case, we can intuitively think of them as a way to override default precedence (just like in math and most programming languages, i.e. they have the highest precedence).

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Another quantier

You know the * and + quantiers. Sometimes you want to express that something is optional, i.e. you might not see it, or you might see it one time. For this, we would use ?. /o*/ /o+/ /o?/ # zero or more os # one or more os # zero or one o

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Anchors
In the previous examples, such as /foo/, the string foo might be found and successfully matched anywhere in the string. Sometimes we want to only successfully match at a given position within a string. /^foo/ # match foo only at the beginning /foo$/ # match foo only at the end # (or before newline at the end) The and $ hold the matching to a particular position that (intuitively) doesnt move, which is why we call such a metacharacter an anchor. Special note: and $ do not match characters.

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Another precedence example

/^foo|bar$/ # vs. /^(foo|bar)$/ These two regular expressions recognize two very different languages.

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Another precedence example (graphically)

Curtis Dunham

CS 105 Perl: Introduction to Regular Expressions

Potrebbero piacerti anche