Perl Regular Expression Summary


Meta Characters
\s white space [\f\t\n\r ] or form-feed, tab, newline, carriage return, space
\d any digit [0-9]
\w word character [A-Za-z0-9_].


Matches with m//

m/\w*?/ m(\w*?) m{w*?} or m#http://# ....


Perl's Option Modifiers /s /i /x or "/six"

/s : #it searches pattern in multiple lines across newlines '\n'; the "." looks within single line
/i : #case (i)nsensitive
/x : #add white space, useful for adding comments in regular expressions

You can combine option modifiers


Word Anchors /\bword\b/ or #\bword\b#

#\bperl# matches "perlscripting"

#perl\b# matches "useperl"

#\bsearc\B# matches "searches", "searching" but not "search" or "researching"



The Binding Operator =~

my #anwser = <STDIN> =~ #\byes\b/i;



The Match Variables $1, $2, ..., $` $& $'

if (#(\S+); (\S+); (\S+)#) {
print "$1 $2 $3 \n";
}

Note: sometimes, you need quote '$1' in print"";


It's always good to save your match in a variable instead of $1 as $1 may be changed by other matches

if ($statement =~ /(\w+)/) {
my $magicword = $1;
...
}


Parentheses for grouping only - non-capturing parentheses (?:)


1 #!/usr/bin/perl
2 use strict;
3
4 while (<>) {
5 if ( m#(?:UNIX).*?(Solaris|AIX|Linux)#i ) {
6 print "$1\n";
7 }
8 }


$ cat job.txt
Required skills:
1) 5 to 10 years Perl or Python , shell scripting skills;
2) at least 10 years UNIX e.g. AIX Solaris Linux Administration skills;
3) 5 to 10 years weblogic, tomcat, websphere administration skills;
Preferred skills:
1) UNIX Internal Coding Skills with Python, Perl or C;
2) TCP/IP Internal and Protocols;
3) Understand Java Servlets/JSP/Web Applications;
4) Understand Oracle/Sybase/MySQL;



# $1 is AIX not UNIX because (?:UNIX) is non-capturing.
$ ./98.pl < job.txt
AIX

shan@InternetPC ~/perl
$


Named Captures

In addition to capturing parts of string in $1, $2, $3...
Perl allows you to name captures in hash named %+: the key
is the lable we used and the value is the matched part in regex.



1 #!/usr/bin/perl
2 use 5.010;
3
4
5 while (<>) {
6 if ( m/(?:UNIX)\s*(?<os1>\w+) (?<os2>\w+) (?<os3>\w+) / ) {
7 say "Required Operating Systems are $+{os1} $+{os2} $+{os3}\n";
8 }
9 }

$ ./912.pl < job.txt
Required Operating Systems are AIX Solaris Linux



NEED to know vim copy groups between (...), eg copy block of chars inside parentheses

*****************************
Now that we have a way to label matches, we also need a way to refer to them for back
references. Previously, we used either \1 or \g{1} for this. With a labeled group, we can
use the label in \g{label}:
use 5.010;
my $names = 'Fred Flinstone and Wilma Flinstone';
if( $names =~ m/(?<last_name>\w+) and \w+ \g{last_name}/ ) {
say "I saw $+{last_name}";
}
We can do the same thing with another syntax. Instead of using \g{label}, we use
\k<label>:?
use 5.010;
my $names = 'Fred Flinstone and Wilma Flinstone';
if( $names =~ m/(?<last_name>\w+) and \w+ \k<last_name>/ ) {

******************************

General Quantifiers





Substitutions with S/// : search and replace

$_ = "green scaly dinosaur";
s/(\w+) (\w+)/$2, $1/; # Now it's "scaly, green dinosaur"
s/^/huge, /; # Now it's "huge, scaly, green dinosaur"
s/,.*een//; # Empty replacement: Now it's "huge dinosaur"
s/green/red/; # Failed match: still "huge dinosaur"
s/\w+$/($`!)$&/; # Now it's "huge (huge !)dinosaur"
s/\s+(!\W+)/$1 /; # Now it's "huge (huge!) dinosaur"
s/huge/gigantic/; # Now it's "gigantic (huge!) dinosaur"


There is a useful Boolean value from s///;
it is true if a substitution was successful;
otherwise, it is false:

$_ = "fred flintstone";
if (s/fred/wilma/) {
print "Successfully replaced fred with wilma!\n";
}


Option Modifiers: /s /i /x or "six"

s{ABC}{}s;

The Binding Operator : =~


Case Shifting

\U ; # \U escape forces what follows to all upper case
\L ; # \L escape forces lowercase

$_=" Steve and John are good friends\n";
s/(steve|john)/\U$1/gi -> STEVE and JOHN ...
s/(steve|john)/|L$1/gi -> steve and john ...



You can even stack them up.
Using \u with \L means ¡°all lowercase, but capitalize the
first letter¡±:*

s/(fred|barney)/\u\L$1/ig; # $_ is now "I saw Fred with Barney."

The split and join operators
@fields = split /separator/, $string;
@fields = split /:/, "abc:def::g:h"; # gives ("abc", "def", "", "g", "h")

my $result = join $glue, @pieces;
my $x = join ":", 4, 6, 8, 10, 12; # $x is "4:6:8:10:12"

m// in list context

When a pattern match (m//) is used in a list context, the return value is a list of the
memory variables created in the match, or an empty list if the match failed:
$_ = "Hello there, neighbor!";
my($first, $second, $third) = /(\S+) (\S+), (\S+)/;
print "$second is my $third\n";

my $text = "Fred dropped a 5 ton granite block on Mr. Slate";
my @words = ($text =~ /([a-z]+)/ig);
print "Result: @words\n";
# Result: Fred dropped a ton granite block on Mr Slate


Nongreedy Quantifiers "?" : *? +?


curl 'http://www.google.com' | perl -e 'while (<>) { m#<style>(.*?)</style>#; print "$&\n";}'



Matching Multiple-Line Text "/m" m=multiple lines

This is where Perl beats classic regular expressions. Perl can match multiple lines of text
just as matching single lines.

13 #9.15
14 $_="The only reason I'm using Perl is because \n it is perfect for processing text files \n with the built-in regular expressions in Perl.\n";
15
16 print "Found 'perl' at start of line\n" if /\bperl\b/im;
17 print "$`:$&:$'\n";

Found 'perl' at start of line
The only reason I'm using :Perl: is because
it is perfect for processing text files
with the built-in regular expressions in Perl.


19 #9.16 read entire google.html file into one variable, then add "__HTML__" in front of each line
20 $filename="/home/shan/perl/google.html";
21 open FILE, $filename
22 or die "Can't open '$filename': $!";
23 my $lines = join '', <FILE>;
24 $lines =~ s/^/__HTML__: /gm;
25 print $lines;


Updating Many Files:

1 #!/usr/bin/perl -w
2 # perl script to process many files
3 #
4 # usage : prcfile.pl input-file
5 #
6 # it will run regex operation inside while on input-file
7 # and saves the original file in the name input-file.org
8
9
10 use strict;
11
12 #name your own file extension for backup copy of the original file
13 $^I = ".org";
14
15 chomp (my $date = `date`);
16 while (<>) {
17
18 #insert your regex operations here:
19 s/^/$date:/;
20 s/$/___END___/;
21
22 #update
23 print;
24 }


$./prcfile 9.pl

$ more 9.pl
Mon Dec 14 17:09:45 PST 2009:#!/usr/bin/perl___END___
Mon Dec 14 17:09:45 PST 2009:___END___
Mon Dec 14 17:09:45 PST 2009:___END___
Mon Dec 14 17:09:45 PST 2009:___END___
Mon Dec 14 17:09:45 PST 2009:$_="steve and John are good friends\n";___END___
Mon Dec 14 17:09:45 PST 2009:s/(steve|john)/\U$1/gi; #-> STEVE and JOHN ...___END___
Mon Dec 14 17:09:45 PST 2009:print "$_\n";___END___
Mon Dec 14 17:09:45 PST 2009:___END___
Mon Dec 14 17:09:45 PST 2009:$_="steve and John are good friends\n";___END___
Mon Dec 14 17:09:45 PST 2009:s/(steve|john)/\L$1/gi; #-> steve and john ...___END___
Mon Dec 14 17:09:45 PST 2009:print "$_\n";___END___
Mon Dec 14 17:09:45 PST 2009:___END___
Mon Dec 14 17:09:45 PST 2009:#9.15___END___
Mon Dec 14 17:09:45 PST 2009:$_="The only reason I'm using Perl is because \n it is perfect for processing text files \n with the built-in regular expressions in Perl.\n";___END___
Mon Dec 14 17:09:45 PST 2009:___END___
Mon Dec 14 17:09:45 PST 2009:print "Found 'perl' at start of line\n" if /\bperl\b/im;___END___
Mon Dec 14 17:09:45 PST 2009:print "$`:$&:$'\n";___END___
Mon Dec 14 17:09:45 PST 2009:___END___
Mon Dec 14 17:09:45 PST 2009:#9.16___END___
Mon Dec 14 17:09:45 PST 2009:$filename="/home/shan/perl/google.html";___END___
Mon Dec 14 17:09:45 PST 2009:open FILE, $filename___END___
Mon Dec 14 17:09:45 PST 2009:or die "Can't open '$filename': $!";___END___
Mon Dec 14 17:09:45 PST 2009:my $lines = join '', <FILE>;___END___
Mon Dec 14 17:09:45 PST 2009:$lines =~ s/^/__HTML__: /gm;___END___
Mon Dec 14 17:09:45 PST 2009:print $lines;___END___



or, we can use command line to do the same:

perl -p -i.org -w -e 'chomp (my $date=`date`);s/^/$date:/g;' 9.pl


Here are the explanations of these command line options,

1. "perl -p" :

while ($_ = <STDIN>) {
print $_;
}

$_ , = and STDIN are all optional.

therefore, the above code can be written as:

while (<>) {
print;
}

2. "-i.org" #sets $^I to ".org"

3. "-w" #turns on warnings

4. -e " executable codes "

5. @ARGV