It will try to do two things. First, it will provide a succinct summary of major Perl elements. Second, it will provide perspective and relate features to one another. Thus, it will be a kind of extended and structured checklist, with commentary.
The discussion will be built around answering two questions:
If you are just learning Perl, I urge you to read a book, such as the Schwartz (with Wall) introductory text, Learning Perl, published by O'Reilly & Associates, or their more advanced Programmig Perl. You may also wish to look at my on-line Perl/CGI Tutorial.
Read whatever book you choose at least twice. The first time, fast, to get a sense of the whole with respect to both capabilities and scale. The second time, slowly. Work through the concepts, trying things out by writing short, throw-away programs. There's a world of difference between comprehension and generation. It's one thing to understand a programming language to the extent that you can follow someone else's code, but a different level and kind of knowledge is required to write an extensive program. The only way to develop such knowledge is through hands-on experience.
This discussion is likely to be most useful as you work through the book you have selected the second time and afterwards, as you recall a capability but forget the operator or its syntax.
The discussion will include six major sections:
Perl provides three kinds of variables: scalars, arrays, and associative arrays. The initial character of the name identifies the particular type of variable and, hence, its functionality.
$aVar = 4; $bVar = "a string of words"; $cVar = 4.5; # a decimal number $dVar = 3.14e10; # a floatingpoint number
@aList = (2, 4, 6, 8); @bList = @aList; # creates new array and gives it values of @aList $aList[0] = 1; # changes the value of first item from 2 to 1
key
and the second element is a value associated with that key. Perl uses the "percent" symbol and curly braces with respect to the name of an associative array as a whole, whereas individual elements within an array are referred to as scalars and the index is still placed in curly braces. Instead of using numbers to index the array, key values, such as $name{"QUERY_STRING"}, are used to reference the value associated with that particular key, i.e., QUERY_STRING. Since the associated value is a scalar, the variable has a $ prefix.
$aAA{"A"} = 1; # creates first row of assoc. array $aAA{"B"} = 2; # creates second row of assoc. array %bAA = %aAA; # creates new assoc. array and gives it values of %aAA $aAA{"A"} = 3; # changes the value of first item from 1 to 3 %aAA = ("A", 1, "B", 2); # same as first two stmts., above
$aVar1 = 0xff; # hex assign. for 255 decimal $aVar2 = 0377; # octal assign. for same thing
$aVar1 = 0xff; # set $aVar1 = 255 decimal $aVar2 = 'aVar2 = $aVar1'; # set $aVar2 = literal string $aVar3 = "aVar3 = $aVar1"; # set $aVar3 = variable interpolated string, with $aVar1 replaced by 255 $aVar4 = 'only single quote interpolated characters are \' and \\' double quote interpolated characters include:\n newline \a bell \\ backslash \" double quote \l lowercase next letter \u uppercase next letter \L lowercase letters follow \U uppercase letters follow \E terminate \L or \E
+ plus - minus * multiply / divide ** exponentiation % modulus # e.g., 7 % 3 = 1 == equal != not equal < less than > greater than <= less than or equal to >= greater than or equal to += binary assignment # e.g., $A += 1; -= same, subtraction *= same, multiplication ++ autoincrement # e.g., ++$A; also, $A++ -- autodecrement
. concatenate x n repetition # e.g., "A" x 3 => "AAA" eq equal ne not equal lt less than gt grater than le less than or equal to ge greater than or equal to chop() # remove last character in string index ($string, $substring) # position of substring in string, zero-based; -1 if not found index ($string, $substring, $skip) # skip number of chars substr($string, $start, $length) # substring substr($string, -$start, $length) # defined from end substr($string, $start) # rest of string
Form:
pack("format", $value1, $value2, . . .); unpack ("format", character_string);
Example:
$IP = pack("CCCC", 152, 2, 128, 184); # create IP address ($var1, $var2, $var3, $var4) = unpack("CCCC", $IP); # inverse of the above
Format specifications can be given in context (in quotes) or they can be assigned to a string variable. There are a number of options available. See a standard text or the Perl man page for a complete list. In the example above, the "C" stands for an unsigned character value. One useful format to know is the following, which can be used to construct the address structure needed to bind a socket to a remote host:
$socket_addr_ptrn = 'S n a4 x8';
The "S" denotes a "short" unsigned integer. The "n" is a short integer in network order. The "a4" is an unpadded ASCII string, four bytes long. And, the "x8" is eight bytes of padding.
@aList = (2, 4, 6, 8); # explicit values @aList = (1..4); # range of values @aList = (1, "two", 3, "four"); # mixed values @aList = (); # empty list @bList = @aList;
$aList[0] # first item in @aList $aList[0,1] # slice, first two items in @aList $aList[$too_big] # access beyond array bounds returns undef, i.e., 0 or ''
$#aList # index of last item) push (@aList, $aNewItem); # @aList = @aList, $aNewItem $LastItem = pop (@aList); # inverse of push unshift (@aList, $aNewItem); # @aList = $aNewItem, @aList $FirstItem = shift (@aList); # inverse of unshift @aList = reverse (@aList); # reverse items @aList = sort (@aList); # sort items, alphabetically chop (@aList); # remove last character from each item @aList = <STDIN>; # one line of input per item
$aAA{"A"} = 1; # creates first row of assoc. array $aAA{"B"} = 2; # creates second row of assoc. array %aAA = ("A", 1, "B", 2); # same as first two stmts., above %bAA = %aAA; # creates new assoc. array and gives it values of %aAA $aAA{"A"} = 3; # changes the value of first item from 1 to 3
keys (%aAA) # list of keys for %aAA values (%aAA) # list of values for %aAA each (%aAA) # next key/value pair, as list delete $aAA{"A"}; # deletes key/value pair referenced
Form: BLOCK
Example:
{ stmt_1; stmt_2; stmt_3; }
Form: if (EXPR) BLOCK
Example:
if (expression) { true_stmt_1; true_stmt_2; true_stmt_3; }
Form: if (EXPR) BLOCK else BLOCK
Example:
if (expression) { true_stmt_1; true_stmt_2; true_stmt_3; } else { false_stmt_1; false_stmt_2; false_stmt_3; }
Form: if (EXPR) BLOCK elseif (EXPR) BLOCK . . . else BLOCK
Example:
if (expression_A) { A_true_stmt_1; A_true_stmt_2; A_true_stmt_3; } elseif (expression_B) { B_true_stmt_1; B_true_stmt_2; B_true_stmt_3; } else { false_stmt_1; false_stmt_2; false_stmt_3; }
Form: LABEL: while (EXPR) BLOCK
The LABEL in this and the following control statements is optional. In addition to description, it also provides function in the quasi-goto statements: last, next, and redo. Perl conventional calls for labels to be expressed in uppercase to avoid confusion with variables or key words.
Example:
ALABEL: while (expression) { stmt_1; stmt_2; stmt_3; }
Form: LABEL: until (EXPR) BLOCK
Example:
ALABEL: until (expression) { # while not stmt_1; stmt_2; stmt_3; }
Form: LABEL: for (EXPR; EXPR; EXPR) BLOCK
Example:
ALABEL: for (initial exp; test exp; increment exp) { # e.g., ($i=1; $i<5; $i++) stmt_1; stmt_2; stmt_3; }
Form: LABEL: foreach VAR (EXPR) BLOCK
Example:
ALABEL: foreach $i (@aList) { stmt_1; stmt_2; stmt_3; }
The last
operator, as well as the next
and redo
operators that follow, apply only to loop control structures. They cause execution to jump from where they occur to some other position, defined with respect to the block structure of the encompassing control structure. Thus, they function as limited forms of goto statements.
Last causes control to jump from where it occurs to the first statement following the enclosing block.
Example:
ALABEL: while (expression) { stmt_1; stmt_2; last; stmt_3; } # last jumps to here
If last
occurs within nested control structures, the jump can be made to the end of an outer loop by adding a label to that loop and specifying the label in the last
statement.
Example:
ALABEL: while (expression) { stmt_1; stmt_2; BLABEL: while (expression) { stmt_a; stmt_b; last ALABEL; stmt_c; } stmt_3; } # last jumps to here
The next
operator is similar to last
except that execution jumps to the end of the block, but remains inside the block, rather than exiting the block. Thus, iteration continues normally.
Example:
As withALABEL: while (expression) { stmt_1; stmt_2; next; stmt_3; # next jumps to here }
last, next
can be used with a label to jump with respect to an outer designated loop.
The redo
operator is similar to next
except that execution jumps to the top of the block without re-evaluating the control expression.
Example:
As withALABEL: while (expression) { # redo jumps to here stmt_1; stmt_2; redo; stmt_3; }
last, next
can be used with a label to jump with respect to an outer designated loop.
Functions include two parts: the invocation and definition.
Form:
&name()
Example:
&aFunction()
sub
; followed by the name of the function, without the ampersand prefix; followed by the block of code that is executed when the function is called, enclosed within curly braces.
Example:
sub aFunction { stmt_1; stmt_2; stmt_3; }
Example:
sub aFunction { stmt_1; stmt_2; $a = $b + $c; }
In this example, the function will return the value of $a
at the time when the function ends. Note: operators, such as print
return values of 0
or 1
, indicating failure or success. Thus, print ($a);
as the last statement in a function would result in a return of 0
or 1
for the function, not the value of $a
.
@_
.
Example:
&aFunction ($a, "Literal_string", $b); sub aFunction { foreach $temp(@_) { print "$temp \n"; } }
Local variables are, by convention, defined at the top of a Perl function. They are defined by the keyword, local
, followed by a list of variable names, within parentheses.
Example:
&aFunction ($a, $b); sub aFunction { local ($aLocal, $bLocal); $aLocal = $_[0]; $bLocal = $_[1]; }
$aLocal
and $bLocal
will have the same values inside the function as $a
and $b
have at the time the function was invoked. Changes to either local variable inside the function, however, will not affect the values of $a
or $b
.
STDIN is accessed through the angle brackets (<>) operator. When placed in a scalar context, the operator returns the next line; when place in an array context, it returns the entire file, one line per item in the array.
Examples:
$a = <STDIN>;# returns next line in file @a = <STDIN>; # returns entire file
STDOUT is the default file accessed through a print
statement.
STDERR is the file used by the system to which it writes error messages; it is usually mapped to the terminal display.
open
statement. By convention, Perl filehandle names are written in all uppercase, to differentiate them from keywords and function names.
Form:
open (FILEHANDLE, "filename");
Example:
open (INPUT, "index.html");
In the above, the file is opened for read access. It may also be opened for write access and for update. The difference between the two is that write replaces the file contents, whereas update appends new data to the end of the current contents. These two options are indicated by appending either a single or a double greater than (>) symbol to the file name as a prefix:
Form:
open (FILEHANDLE, ">filename"); # write access open (FILEHANDLE, ">>filename"); # update
Examples:
open (INPUT, ">index.html"); open (INPUT, ">>index.html");
Since Perl will continue operating regardless of whether the open was successful or not, you need to test the open
statement. Like other Perl constructs, the open
statement returns a true or false value, indicating success or failure. One convenient construct in which this value can be tested and appropriate response taken is with the logical or and die
operators. die
can be used to deliver a message to STDERR and terminate the Perl program. The following construct can be paraphrased: "open or die."
Form:
open (FILEHANDLE, "filename") || die "Message written to STDERR";
Example:
open (INPUT, "index.html") || die "Error opening file index.html ";
open
is encountered. they may also be closed explicitly.
Form:
close (FILEHANDLE);
Example:
close (INPUT);
Form:
<FILEHANDLE>
Example:
while (<INPUT>) { # read one line at a time until EOF chop; # remove newline print line = $_ \n"; # print line read using default scalar variable }
print
operator.
Form:
print FILEHANDLE (content);
Example:
print OUTPUT "$next \n"; # outputs contents of $next followed by newline char.
Form:
-SYMBOL # where SYMBOL is a single character designator
See a Perl manual for the complete list; some of the more useful ones include the following:
Examples:
-r # readable -w # writeable -x # executable -o # owned by user -e # exists -z # zero content -s # nonzero content (size) -f # plain file -d # directory -l # symbolic link -T # text file -B # binary file -M # modification age -A # access age
The DBM provides a transparent interface between associative arras internal to a Perl program and a pair of files that are managed by DBM in which the keys and corresponding values are stored. Thus, when one inserts, changes, or deletes a key and/or value, the system makes the appropriate update to the persistent file version of the array. When accessing, each ( )
, which returns both key and value, is a more efficient operator than foreach ( )
.
There are only two operators associated with DBM associate arrays: dbmopen
and dmbclose
. Once such as array has been opened, all interaction with it is conventional.
Form:
dbmopen(%ASSOC_ARRAY, "dbmfile", $mode);
Example:
dbmopen(%AN_ASSOC_ARRAY, "name_address", $mode);
In this example, the array, %AN_ASSOC_ARRAY
can be created and manipulated within the Perl program. It's actual values will be maintained, however, in two files managed by DBM, name_address.dir and name_address.pag. The $mode includes a standard UNIX access mode, such as "0755". For details, see the discussion under chmod in the section on System Operators, below.
Form:
dbmclose(%ASSOC_ARRAY);
Example:
dbmclose(%AN_ASSOC_ARRAY);
Perl includes an evaluation component that, given a pattern and a string in which to search for that pattern, determines whether -- and if so, where -- the pattern occurs. These patterns are referred to as regular expressions.
Perl provides a general mechanism for specifying regular expressions. By default, regular expressions are strings that are bounded or delimited by slashes, e.g., /cat/
. By default, the string that will be searched is $_
. However, the delimiter can be changed to virtually any nonalphanumeric character by preceding the first occurrence of the new delimiter with an m
, e.g., m#cat#. In this example, the pound sign (#) becomes the delimiter. And, of course, one can apply the expression to strings other than those contained in the default variable, $_
, as will be explained below.
In addition to providing a general mechanism for evaluating regular expressions, Perl provides several operators that perform various manipulations on strings based upon the results of the evaluation. Several of these were introduced in the Perl/CGI Tutorial. They included the substitution
and split
operators. They will be described in more detail, below.
The discussion will begin by describing the various mechanism for specifying patterns and then discuss expression-based operators.
/cat/
, as discussed in the introduction to this section. Normally, such an expression would appear in some conditional context, such as an if
statement.
Example:
if (/cat/) { print "cat found in $_\n"; }
Example:
/.at/ # matches "cat," "bat", but not "at"
An explicit category or class of characters can be specified by placing the characters in square brackets.
Example:
/[0123456789]/
Ranges of characters can also be specified:
Examples:
/[0-9]/ /[a-z]/ /[A-Z]/ /[0-9a-zA-Z]/
Several predefined categories are available. These include:
\d # digits \w # words \s # space \D # not digits \W # not words \S # not space
Any character or range can be turned into a not condition by placing a carat ( ^ ) in front of it.
Example:
/[^0-9]/ # not a digit
Examples:
/a*t/ # any number of a's followed by t /a+t/ # one or more a's followed by t /a?t/ # zero or one a followed by t /a{2,4}t/ # between 2 and 4 a's followed by t /a{2,}t/ # 2 and or more a's followed by t /a{2}t/ # exactly 2 a's followed by t
Pattern matching is greedy meaning that if a pattern can be found at more than one place in the string but one instance is longer than the others, the longest match will be identified, thereby affecting patterned-based operators such as substitution, discussed below.
Other information available in variables include $&, the sequence that matched; $`, everything in the string up to the match; and $', everything in the string beyond the match.
Examples:
/c(.*)t/ # in caaat, \1 is "aaa"; $1 has the same value $& is "aaa" $` is "c" $' is "t"
Example:
/\bat/ # matches "at" and "attention", but not "bat" /at\b/ # matches "at" and "bat", but not "attention" /at\B/ # matches "attention" but not "at" and "bat" /^at/ # matches "at $5.00, it' is a bargain" but not "where you are at" /at$/ # matches "where you are at" but not "at $5.00, it is a bargain" /\$/ # matches "at $5.00, it is a bargain"
Example:
$word = "cat; /$word/ # matches strings that contain "cat"
$_
, the search is performed in the string specified.
Example:
$a =~ /cat/ # does the content of $a contain "cat"?=~ /cat/ # does the next line of input contain "cat"?
Example:
/cat/i # matches "cat", "CAT", "Cat", etc.
Form:
s/pattern/replacement/ s/pattern/replacement/gi $var =~ s/pattern/replacement/
In the second version, ( g ) and ( i ) indicate that the replacement should be made for all occurrences and that the match should ignore case. In the third version, the action is performed on the variable indicated, instead of on the default variable, $_
Examples:
s/cat/dog/ # replaces "cat" with "dog" in $_ s/cat/dog/gi # same thing, but applies to "CAT", "Cat" wherever they appear $a =~ s/cat/dog/ # applies the operation to $a
Form:
@var = split(/pattern/, string); @var = split(/pattern/)
If no string is specified, the operator is applied to $_
.
Examples:
@a = split(/cat/, $aString); @a = split(/cat/);
In the first example, the contents of $aString are split on "cat" and the two parts assigned to the array, @a. In the second, the operator applies to the contents of $_
.
split
. Takes a list of values, concatenates them, and returns the resulting string.
Form:
$var = join("item_1", $item2, . . .);
Example:
$a = join('cat", "dog", "bird"); # returns "catdogbird" $a = join($b, $c);
Perl system operators can be divided into two large categories: file/directory operators and process operators. However, while useful, the distinction does not always hold. For example, Perl provides a mechanism whereby operators can be accessed as if they were files, permitting a Perl script to "read" the data they produce or to "write" to them to supply input data.
Form:
chdir ("/path/ . . . /directory");
Example:
chdir ("/afs/unc.cs.edu/home/jbs/public_html/perl");
Note that the path is defined within the namespace of the UNIX file system, not the namespace as configured for a Web server. In the UNC CS environment, users have individual home directories under the UNIX directory, /home. Under each user's home directory is a public_html directory, intended to be used for his or her Web-related materials.
When specifying a path for a Web server, ~login can be used to abbreviate the path to the user's home directory because of the way the server is configured. As a result, the Web server automatically inserts /public_html into the path following ~login. Consequently, for files and directories below public_html, one MUST NOT specify /public_html; otherwise, that directory will be duplicated in the Web path.
When specifying the path for a Perl program, /home/login is used instead of the tilde abbreviation and public_html MUST be included, if it lies along the path. One implication of this difference in the two name spaces is that Perl programs (as well as executables in another language) can reference files outside the subset of the filespace for which the server is configured.
Form:
opendir (DIR_NAME, "/path/ . . . /directory");
Example:
opendir (DIR, "/afs/unc.cs.edu/home/jbs/public_html/perl");-
Form:
readdir(DIR_NAME)
Example:
$name = readdir(DIR); # just one name @name = readdir(DIR); # all names
Form:
closedir(DIR_NAME)
Example:
closedir(DIR);
Form:
symlink("path", "LINK_NAME");
Example:
symlink("/afs/unc.cs.edu/home/jbs/public_html/perl", DIR_PERL);
In this example, the separate directory I use for Perl scripts can be referenced directly from the context (directory) where a Perl script is run by a Web server, such as cgi-bin or wwwc-bin.
Form:
link("path/file", "LINK_NAME");
Example:
Note that the path in this example assumes the symlink created in the prior step and that the hard link defined here is to a particular file, not to a directory. In this particular case, the link is NOT actually made because the two files exist on different volumes.link("DIR_PERL/hello.html", HELLO_HTML.PL);
Form:
readlink("LINK_NAME");
Example:
readlink("DIR_PERL");
Form:
unlink("LINK_NAME");
Example:
unlink(HELLO_HTML.PL);
Form:
rename("old_file_name", "new_file_name");
Example:
rename("hello_html", "hello_html.pl");
Form:
mkdir("new_dir_name", mode);
Example:
mkdir("perl_scripts", 0777);
Form:
rmdir("dir_name");
Example:
rmdir("perl_scripts");
Modes, again, are composable octal values that can be found in the man page for chmod. In general a value of "4" indicates read, "2" write, and "1" execute. When the value occurs in the highest order position, it refers to the owner; in the middle position, it refers to the owner's group; and in the low-order position, it refers to others. Thus, 755 gives read and execute permission to everyone, but reserves write access for only the owner.
Form:
chmod(mode, "file_name");
Example:
chmod(0775, "hello_html.pl");
The next group of operators are concerned with UNIX processes.
Process operators range in functionality from the capability to execute a UNIX system operator as is normally done from a shell to the operators used to implement a client/server architecture using forked processes with multiple threads of execution. The discussion will proceed from the simple to the more complex.
When the process executes, it inherits the files of the Perl process from which it is launched. Thus, output produced by the process normally goes to the standard files, such as STDOUT and STDERR, whereas input normally comes from the standard input file, STDIN. However, data can be redirected to/from other files.
In addition to inheriting the files of the parent process, the child also inherits its environment variables. These are available through the %ENV associative array. For a more through discussion of environment variables within a Web context, see my CGI/PERL Tutorial.
Form:
system("process"):
Example:
In the first example, Perl asks the system to execute the path process which, in turn, determines the path of the current location and writes that information to STDOUT, which is normally displayed on the terminal screen; it then returns a value of zero to the Perl Program from which it was launched, indicating success.system("pwd"); system("xpwd");
In the second example, perl asks the system to execute a nonexistent process. The system operator cannot oblige (fails) and returns a nonzero value (i.e., 65,280).
Form:
`process`
Example:
$A = system("pwd"); $A = `pwd`;
In the first example, the pwd process is launched and $A is given the value 0 if it executes properly or some nonzero value (i.e., 62,580) if it does not.
In the second example, the pwd process is launched, but instead of writing its results to some file (e.g., STDOUT), the result is returned by the backquotes operator and assigned to $A. Thus, in this case, $A ends up with the character string that designates the path to the directory where the Perl script is executed (e.g., /afs/cs.unc.edu/home/jbs/public_html/perl).
The filehandle form of process interaction is based on underlying UNIX pipes; consequently, if the process is to be accessed through a read, a vertical bar (|) must be appended to the right side of the process name; conversely, if it is to be accessed through a write, the vertical bar goes on the left side.
Form:
open(PROC_HANDLE, "process|"); close(PROC_HANDLE); open(PROC_HANDLE, "|process"); close(PROC_HANDLE);
Example:
open (PWD_HANDLE, "pwd|"); # read and process pwd data close(PWD_HANDLE); open (MORE_HANDLE, "|more"); # generate and write data to more close(MORE_HANDLE);
In the first example, the process, pwd, will be opened for read access in the Perl program. In the second example, the process, more, will be opened for write access.
In both cases, the processes should be closed since they will continue running otherwise. Any I/O directed to or from the processes would, of course, be done between the open and close statements.
Form:
exec "process";
Example:
Note: exec causes problems for the Web server when used in the CGI context.exec "pwd";
Form:
(fork)
Example:
if (fork) { # parent process } else { # child process }
Form:
exit;
Example:
In this example, the unless functions as an if not construct; hence the false condition in which the child process is defined comes first. Once the code in that condition is completed, if it were not stopped (i. e., by the exit), the child process would continue beyond the curly braces where the parent process code appears.unless (fork) { # zero(0) condition # child process exit; { # one(1) condition # parent process
Form:
wait
Example:
unless (fork) { # zero(0) condition # child process exit; { # one(1) condition # parent process wait; # parent waits until child completes before continuing
Go to course homepage