Perl Basics

Introduction

This discussion of Perl Basics is intended to complement, not replace, other Perl resources, such as published texts and reference books or network libraries and discussion groups. How?

It will try to do two things. First, it will provide a succinct summary of major Perl elements. Second, it will provide perspective and relate features to one another. Thus, it will be a kind of extended and structured checklist, with commentary.

The discussion will be built around answering two questions:

What are the things Perl provides you to work with?
What can you do to those things?

If you are just learning Perl, I urge you to read a book, such as the Schwartz (with Wall) introductory text, Learning Perl, published by O'Reilly & Associates, or their more advanced Programmig Perl. You may also wish to look at my on-line Perl/CGI Tutorial.

Read whatever book you choose at least twice. The first time, fast, to get a sense of the whole with respect to both capabilities and scale. The second time, slowly. Work through the concepts, trying things out by writing short, throw-away programs. There's a world of difference between comprehension and generation. It's one thing to understand a programming language to the extent that you can follow someone else's code, but a different level and kind of knowledge is required to write an extensive program. The only way to develop such knowledge is through hands-on experience.

This discussion is likely to be most useful as you work through the book you have selected the second time and afterwards, as you recall a capability but forget the operator or its syntax.

The discussion will include six major sections:

Variables and their Basic Operators
Control structures
Functions
Input/Output
Regular Expressions
System Operators

Condensing this material into a small number of categories, I hope, will give you a better sense of the whole, facilitate learning and, later, help you find what you need. With a little extra push, I might have been able to reduce these six categories to four, but that would have overloaded some and distorted others. So, I have compromised at six.

1. Variables and their Basic Operators

1.1 Variables

The things Perl makes available to the programmer to work with are variables. Unlike many other programming languages, Perl does not require separate declaration of variables; they are defined implicitly within some expression, such as an assignment statement.

Perl provides three kinds of variables: scalars, arrays, and associative arrays. The initial character of the name identifies the particular type of variable and, hence, its functionality.

$name

scalar variable, either a number or string; Perl does not differentiate between the two, nor does it differentiate between integers and reals.

$aVar = 4;
$bVar = "a string of words";
$cVar = 4.5; # a decimal number
$dVar = 3.14e10; # a floatingpoint number

@name()

array ; a one-dimensional list of scalars. Perl uses the "at" symbol and parentheses with respect to the name of an array as a whole, whereas individual elements within an array are referred to as scalars and the index is placed in square brackets.

@aList = (2, 4, 6, 8);
@bList = @aList; # creates new array and gives it values of @aList
$aList[0] = 1; # changes the value of first item from 2 to 1

%name{}

associative array ; a special, 2-dimensional array, ideal for handling attribute/value pairs. The first element in each row is a key and the second element is a value associated with that key. Perl uses the "percent" symbol and curly braces with respect to the name of an associative array as a whole, whereas individual elements within an array are referred to as scalars and the index is still placed in curly braces. Instead of using numbers to index the array, key values, such as $name{"QUERY_STRING"}, are used to reference the value associated with that particular key, i.e., QUERY_STRING. Since the associated value is a scalar, the variable has a $ prefix.

$aAA{"A"} = 1;  # creates first row of assoc. array
$aAA{"B"} = 2;  # creates second row of assoc. array
%bAA = %aAA;  # creates new assoc. array and gives it values of %aAA
$aAA{"A"} = 3;  # changes the value of first item from 1 to 3
%aAA = ("A", 1, "B", 2);  # same as first two stmts., above

1.2 Operators

If variables are the nouns Perl provides, operators are the verbs. Operators access and change the values of variables. Some, such as assignment, apply to all three kinds of variables; however, most are specialized with respect to a particular type. Consequently, operators will be discussed with respect to the three basic types of variables.

1.2.1 Scalar Operators

assignment

see above

hex and octal assignment

$aVar1 = 0xff; # hex assign. for 255 decimal
$aVar2 = 0377; # octal assign. for same thing

single and double quote strings

$aVar1 = 0xff; # set $aVar1 = 255 decimal
$aVar2 = 'aVar2 = $aVar1'; # set $aVar2 = literal string
$aVar3 = "aVar3 = $aVar1"; # set $aVar3 = variable interpolated string, with $aVar1 replaced by 255
$aVar4 = 'only single quote interpolated characters are \' and \\'
double quote interpolated characters include:

\n  newline
\a  bell
\\  backslash
\"  double quote
\l  lowercase next letter
\u  uppercase next letter
\L  lowercase letters follow
\U  uppercase letters follow
\E  terminate \L or \E

operators for numbers


+   plus
-   minus
*   multiply
/   divide
**  exponentiation
%   modulus  # e.g., 7 % 3 = 1
==  equal
!=  not equal
<   less than
>   greater than
<=  less than or equal to
>=  greater than or equal to
+=  binary assignment  # e.g., $A += 1;
-=  same, subtraction
*=  same, multiplication
++  autoincrement  # e.g., ++$A; also, $A++
--  autodecrement

operators for strings


.   concatenate
x n repetition  # e.g., "A" x 3 => "AAA"
eq  equal
ne  not equal
lt  less than
gt  grater than
le  less than or equal to
ge  greater than or equal to
chop()  # remove last character in string
index ($string, $substring)  # position of substring in string, zero-based; -1 if not found
index ($string, $substring, $skip)  # skip number of chars
substr($string, $start, $length)  # substring
substr($string, -$start, $length)  # defined from end
substr($string, $start)  # rest of string

conversion between numbers and strings

Automatic, determined by the operator, if reasonable (e.g., "1.23" as string converts to 1.23 as number). If unreasonable, string converts to zero (0) as number (e.g., "not_a_number" converts to 0).

conversion between packed and unpacked forms

It is often necessary to convert from a character or scalar form to a packed binary representation, and back. A common example is building an IP address data structure. The two operators for doing this are pack and unpack. Pack takes a format specification and a list of values and packs them into a character string; conversely, unpack takes a format and a character string and breaks the string apart, according to the format, and assigns the parts to a list of variables.

Form:


pack("format", $value1, $value2, . . .);
unpack ("format", character_string);

Example:


$IP = pack("CCCC", 152, 2, 128, 184);  # create IP address
($var1, $var2, $var3, $var4) = unpack("CCCC", $IP);  # inverse of the above

Format specifications can be given in context (in quotes) or they can be assigned to a string variable. There are a number of options available. See a standard text or the Perl man page for a complete list. In the example above, the "C" stands for an unsigned character value. One useful format to know is the following, which can be used to construct the address structure needed to bind a socket to a remote host:


$socket_addr_ptrn = 'S n a4 x8';

The "S" denotes a "short" unsigned integer. The "n" is a short integer in network order. The "a4" is an unpadded ASCII string, four bytes long. And, the "x8" is eight bytes of padding.

<STDIN> as scalar

Designates the next line of text from standard input.

1.2.2 Array Operators

assignment


@aList = (2, 4, 6, 8);  # explicit values
@aList = (1..4);  # range of values
@aList = (1, "two", 3, "four");  # mixed values
@aList = ();  # empty list
@bList = @aList;

access

Individual items in array accessed as scalars.


$aList[0]  # first item in @aList
$aList[0,1]  # slice, first two items in @aList
$aList[$too_big]  # access beyond array bounds returns undef, i.e., 0 or ''

additional operators


$#aList  # index of last item)
push (@aList, $aNewItem);  # @aList = @aList, $aNewItem
$LastItem = pop (@aList);  # inverse of push
unshift (@aList, $aNewItem);  # @aList = $aNewItem, @aList
$FirstItem = shift (@aList);  # inverse of unshift
@aList = reverse (@aList);  # reverse items
@aList = sort (@aList);  # sort items, alphabetically
chop (@aList);  # remove last character from each item
@aList = <STDIN>; # one line of input per item

1.2.3 Associative Array Operators

assignment


$aAA{"A"} = 1;  # creates first row of assoc. array
$aAA{"B"} = 2;  # creates second row of assoc. array
%aAA = ("A", 1, "B", 2);  # same as first two stmts., above
%bAA = %aAA;  # creates new assoc. array and gives it values of %aAA
$aAA{"A"} = 3;  # changes the value of first item from 1 to 3

additional operators


keys (%aAA)  # list of keys for %aAA
values (%aAA)  # list of values for %aAA
each (%aAA)  # next key/value pair, as list
delete $aAA{"A"};  # deletes key/value pair referenced

2. Control Structures

Perl is an iterative language in which control flows naturally from the first statement in the program to the last statement unless something interrupts. Some of the things that can interrupt this linear flow are conditional branches and loop structures. Perl offers approximately a dozen such constructs, which are described below. The basic form will be shown for each followed by a partial example.

statement block

Statement blocks provide a mechanism for grouping statements that are to be executed as a result some expression being evaluated. they are used in all of the control structures discussed below. Statement blocks are designated by enclosing curly braces.

Form: BLOCK

Example:


{
     stmt_1;
     stmt_2;
     stmt_3;
}

if statement

Form: if (EXPR) BLOCK

Example:


if (expression) {
     true_stmt_1;
     true_stmt_2;
     true_stmt_3;
}

if/else statement

Form: if (EXPR) BLOCK else BLOCK

Example:


if (expression) {
     true_stmt_1;
     true_stmt_2;
     true_stmt_3;
} else {
     false_stmt_1;
     false_stmt_2;
     false_stmt_3;
}

if/elseif/else statement

Form: if (EXPR) BLOCK elseif (EXPR) BLOCK . . . else BLOCK

Example:


if (expression_A) {
     A_true_stmt_1;
     A_true_stmt_2;
     A_true_stmt_3;
} elseif (expression_B) {
     B_true_stmt_1;
     B_true_stmt_2;
     B_true_stmt_3;
} else {
     false_stmt_1;
     false_stmt_2;
     false_stmt_3;
}

while statement

Form: LABEL: while (EXPR) BLOCK

The LABEL in this and the following control statements is optional. In addition to description, it also provides function in the quasi-goto statements: last, next, and redo. Perl conventional calls for labels to be expressed in uppercase to avoid confusion with variables or key words.

Example:


ALABEL:  while (expression) {
     stmt_1;
     stmt_2;
     stmt_3;
}

until statement

Form: LABEL: until (EXPR) BLOCK

Example:


ALABEL:  until (expression) {  # while not
     stmt_1;
     stmt_2;
     stmt_3;
}

for statement

Form: LABEL: for (EXPR; EXPR; EXPR) BLOCK

Example:


ALABEL:  for (initial exp; test exp; increment exp) {  # e.g.,  ($i=1; $i<5; $i++)
     stmt_1;
     stmt_2;
     stmt_3;
}

foreach statement

Form: LABEL: foreach VAR (EXPR) BLOCK

Example:


ALABEL:  foreach $i (@aList) {
     stmt_1;
     stmt_2;
     stmt_3;
}

last operator

The last operator, as well as the next and redo operators that follow, apply only to loop control structures. They cause execution to jump from where they occur to some other position, defined with respect to the block structure of the encompassing control structure. Thus, they function as limited forms of goto statements.

Last causes control to jump from where it occurs to the first statement following the enclosing block.

Example:


ALABEL:  while (expression) {
     stmt_1;
     stmt_2;
     last;
     stmt_3;
}
#  last jumps to here

If last occurs within nested control structures, the jump can be made to the end of an outer loop by adding a label to that loop and specifying the label in the last statement.

Example:


ALABEL:  while (expression) {
     stmt_1;
     stmt_2;
     BLABEL:  while (expression) {
          stmt_a;
          stmt_b;
          last ALABEL;
          stmt_c;
     }
     stmt_3;
}
#  last jumps to here

next operator

The next operator is similar to last except that execution jumps to the end of the block, but remains inside the block, rather than exiting the block. Thus, iteration continues normally.

Example:


ALABEL:  while (expression) {
     stmt_1;
     stmt_2;
     next;
     stmt_3;
#  next jumps to here
}

As with last, next can be used with a label to jump with respect to an outer designated loop.

redo operator

The redo operator is similar to next except that execution jumps to the top of the block without re-evaluating the control expression.

Example:


ALABEL:  while (expression) {
#  redo jumps to here
     stmt_1;
     stmt_2;
     redo;
     stmt_3;
}

As with last, next can be used with a label to jump with respect to an outer designated loop.

3. Functions

Functions are a fundamental part of most programming languages. On the one hand, they often behave like an operator, producing a change in the value of some variable or returning a value that can be assigned to a variable. On the other hand, they also control the flow of execution, transferring control from the point of invocation to the function definition block and back. Thus, whereas functions might be discussed under one of the preceding headings, they will be discussed separately here since they offer capabilities that go beyond conventional operator or flow control structures.

Functions include two parts: the invocation and definition.

invocation

The function is invoked within the context of some expression. There, it is recognized by the form of its name: an ampersand is placed before the name when the function is called; if the function takes arguments, they are placed within parentheses following the name of the function.

Form:


&name()

Example:


&aFunction()

definition

The function is defined through the keyword, sub; followed by the name of the function, without the ampersand prefix; followed by the block of code that is executed when the function is called, enclosed within curly braces.

Example:


sub aFunction {
     stmt_1;
     stmt_2;
     stmt_3;
}

To use functions effectively, we need three additional concepts: return values, arguments, and local variables.

return values

The value returned by a Perl function is the value of the last expression evaluated in the function.

Example:


sub aFunction {
     stmt_1;
     stmt_2;
     $a = $b + $c;
}

In this example, the function will return the value of $a at the time when the function ends. Note: operators, such as print return values of 0 or 1, indicating failure or success. Thus, print ($a); as the last statement in a function would result in a return of 0 or 1 for the function, not the value of $a.

arguments

Arguments are enclosed in parenthses following the name of the function during invocation; thus, they constitute a list. They are available within the function definition block through the predefined (list) variable, @_.

Example:


&aFunction ($a, "Literal_string", $b);

sub aFunction {
     foreach $temp(@_) {
          print "$temp \n";
     }
}

local variables

Any variables defined within the body of a Perl program are available inside a Perl function as global variables. Consequently, perl provides an explicit local operator that can be used to limit the scope of variables. Thus, one can define variables that are local to a function so that their use will not produce inadvertent side effects with any global variables that may have the same names. By the same token, they will not be visible outside of the function.

Local variables are, by convention, defined at the top of a Perl function. They are defined by the keyword, local, followed by a list of variable names, within parentheses.

Example:


&aFunction ($a, $b);

sub aFunction {
     local ($aLocal, $bLocal);
     $aLocal = $_[0];
     $bLocal = $_[1];
}

$aLocal and $bLocal will have the same values inside the function as $a and $b have at the time the function was invoked. Changes to either local variable inside the function, however, will not affect the values of $a or $b.

4. Input/Output

Perl provides basic I/O for both the standard input (keyboard) and output (display) devices and for files in the UNIX file system. More sophisticated I/O is provided through the UNIX DBM library.

4.1 File system I/O

standard files

Perl provides access to the standard files: STDIN, STDOUT, and STDERR.

STDIN is accessed through the angle brackets (<>) operator. When placed in a scalar context, the operator returns the next line; when place in an array context, it returns the entire file, one line per item in the array.

Examples:


$a = <STDIN>;# returns next line in file
@a = <STDIN>; # returns entire file

STDOUT is the default file accessed through a print statement.

STDERR is the file used by the system to which it writes error messages; it is usually mapped to the terminal display.

open file

Files are accessed within a Perl program through filehandles which are bound to filenames within the UNIX file system through an open statement. By convention, Perl filehandle names are written in all uppercase, to differentiate them from keywords and function names.

Form:


open (FILEHANDLE, "filename");

Example:


open (INPUT, "index.html");

In the above, the file is opened for read access. It may also be opened for write access and for update. The difference between the two is that write replaces the file contents, whereas update appends new data to the end of the current contents. These two options are indicated by appending either a single or a double greater than (>) symbol to the file name as a prefix:

Form:


open (FILEHANDLE, ">filename");   # write access
open (FILEHANDLE, ">>filename");  # update

Examples:


open (INPUT, ">index.html");
open (INPUT, ">>index.html");

Since Perl will continue operating regardless of whether the open was successful or not, you need to test the open statement. Like other Perl constructs, the open statement returns a true or false value, indicating success or failure. One convenient construct in which this value can be tested and appropriate response taken is with the logical or and die operators. die can be used to deliver a message to STDERR and terminate the Perl program. The following construct can be paraphrased: "open or die."

Form:


open (FILEHANDLE, "filename") || die "Message written to STDERR";

Example:


open (INPUT, "index.html") || die "Error opening file index.html ";

close file

Files are closed implicitly when another open is encountered. they may also be closed explicitly.

Form:


close (FILEHANDLE);

Example:


close (INPUT);

read file

The file, once opened and associated with a filehandle, can be read with the angle brackets operator (<>), which can be used in a variety of constructs.

Form:


<FILEHANDLE>

Example:


while (<INPUT>) {  # read one line at a time until EOF
     chop;  # remove newline
     print line = $_ \n";  # print line read using default scalar variable
}

write file

Once a file has been opened for either write or update access, data can be sent to that file through the print operator.

Form:


print FILEHANDLE (content);

Example:


print OUTPUT "$next \n";  # outputs contents of $next followed by newline char.

file tests

There are a number of circumstances where the actions taken by the Perl program should take into account attributes of the file, such as whether or not the file currently exists, whether or not it has content, etc. A number of tests can be performed on files through the dash (-) operator.

Form:


-SYMBOL  # where SYMBOL is a single character designator

See a Perl manual for the complete list; some of the more useful ones include the following:

Examples:


-r  # readable
-w  # writeable
-x  # executable
-o  # owned by user
-e  # exists
-z  # zero content
-s  # nonzero content (size)
-f  # plain file
-d  # directory
-l  # symbolic link 
-T  # text file
-B  # binary file
-M  # modification age
-A  # access age

4.2 UNIX DBM library I/O

Many UNIX systems include as a standard library a database management utility called DBM. Perl provides an interface to this library.

The DBM provides a transparent interface between associative arras internal to a Perl program and a pair of files that are managed by DBM in which the keys and corresponding values are stored. Thus, when one inserts, changes, or deletes a key and/or value, the system makes the appropriate update to the persistent file version of the array. When accessing, each ( ), which returns both key and value, is a more efficient operator than foreach ( ).

There are only two operators associated with DBM associate arrays: dbmopen and dmbclose. Once such as array has been opened, all interaction with it is conventional.

dbmopen

Opens the persistent array, given a file name and a access mode.

Form:


dbmopen(%ASSOC_ARRAY, "dbmfile", $mode);

Example:


dbmopen(%AN_ASSOC_ARRAY, "name_address", $mode);

In this example, the array, %AN_ASSOC_ARRAY can be created and manipulated within the Perl program. It's actual values will be maintained, however, in two files managed by DBM, name_address.dir and name_address.pag. The $mode includes a standard UNIX access mode, such as "0755". For details, see the discussion under chmod in the section on System Operators, below.

dbmclose

Closes the files.

Form:


dbmclose(%ASSOC_ARRAY);

Example:


dbmclose(%AN_ASSOC_ARRAY);

5. Regular Expressions and Related Operators

Regular expressions are strings that can be recognized by a regular grammar, a restricted type of context-free grammar. Basically, they are strings that can be parsed left to right, without backtracking, and requiring only exact symbol matching, matching of a symbol by a category of symbols, or matching of a symbol by a specified number of sequential occurrences of a symbol or category.

Perl includes an evaluation component that, given a pattern and a string in which to search for that pattern, determines whether -- and if so, where -- the pattern occurs. These patterns are referred to as regular expressions.

Perl provides a general mechanism for specifying regular expressions. By default, regular expressions are strings that are bounded or delimited by slashes, e.g., /cat/. By default, the string that will be searched is $_. However, the delimiter can be changed to virtually any nonalphanumeric character by preceding the first occurrence of the new delimiter with an m, e.g., m#cat#. In this example, the pound sign (#) becomes the delimiter. And, of course, one can apply the expression to strings other than those contained in the default variable, $_, as will be explained below.

In addition to providing a general mechanism for evaluating regular expressions, Perl provides several operators that perform various manipulations on strings based upon the results of the evaluation. Several of these were introduced in the Perl/CGI Tutorial. They included the substitution and split operators. They will be described in more detail, below.

The discussion will begin by describing the various mechanism for specifying patterns and then discuss expression-based operators.

5.1 Patterns

literals

The simples form of pattern is a literal string. Thus, one can search for /cat/, as discussed in the introduction to this section. Normally, such an expression would appear in some conditional context, such as an if statement.

Example:


if (/cat/)  {
  print "cat found in $_\n";
}

single-character patterns

In addition to including literal characters, expressions can contain categories of characters. the period ( . ) stands for any single character.

Example:


/.at/  # matches "cat," "bat", but not "at"

An explicit category or class of characters can be specified by placing the characters in square brackets.

Example:


/[0123456789]/

Ranges of characters can also be specified:

Examples:


/[0-9]/
/[a-z]/
/[A-Z]/
/[0-9a-zA-Z]/

Several predefined categories are available. These include:


\d  # digits
\w  # words
\s  # space
\D  # not digits
\W  # not words
\S  # not space

Any character or range can be turned into a not condition by placing a carat ( ^ ) in front of it.

Example:


/[^0-9]/  # not a digit

sequences

In addition to the literals and single category instances discussed above, patterns can include sequences in which a given symbol or category can occur a variable, but specified, number of times. An Asterisk ( * ) indicates any number of occurrences of any character that occurs in the position of which the asterisk in the pattern. A plus sign ( + ) indicates one or more of the preceding character. The question mark ( ? ) indicates zero or one of the preceding character. The concept of multiplier implied by these facilities is generalized by placing curly braces around a minimum and a maximum number of occurrences of the preceding character. Specialized forms of the general multiplier exist, as shown in the examples that follow.

Examples:


/a*t/  # any number of a's followed by t
/a+t/  # one or more a's followed by t
/a?t/  # zero or one a followed by t
/a{2,4}t/  # between 2 and 4 a's followed by t
/a{2,}t/  # 2 and or more a's followed by t
/a{2}t/  # exactly 2 a's followed by t

Pattern matching is greedy meaning that if a pattern can be found at more than one place in the string but one instance is longer than the others, the longest match will be identified, thereby affecting patterned-based operators such as substitution, discussed below.

memory

The portion of the string that matches a pattern can be assigned to a variable for use later in the statement or in subsequent statements. This is done by placing the portion to be remembered in parentheses ( () ). Within the same statement, the matched segment will be available in the variable, \1. Multiple segments, specified by multiple occurrences of parentheses through the pattern, are available in variables, \1, \2, \3, etc. in the order corresponding to the different parenthesized components. Beyond the scope of the statement, these stored segments are available in the variables, $1, $2, $3, etc.

Other information available in variables include $&, the sequence that matched; $`, everything in the string up to the match; and $', everything in the string beyond the match.

Examples:


/c(.*)t/  # in caaat, \1 is "aaa"; $1 has the same value
  $& is "aaa"
  $` is "c"
  $' is "t"

anchors

The pattern that is searched for in the string can be restricted to several specified locations, such as beginnings and endings of words or the beginnings and endings of the string. \b indicates a word boundary. \B indicates any place but a word boundary. Carat ( ^ ) restricts the pattern to the beginning of the string. Dollar sign ( $ ) specifies the end of the string. If a literal dollars sign occurs in the pattern, mark it with the backslash.

Example:


/\bat/  # matches "at" and "attention", but not "bat"
/at\b/  # matches "at" and "bat", but not "attention"
/at\B/  # matches "attention" but not "at" and "bat"
/^at/  # matches "at $5.00, it' is a bargain" but not "where you are at"
/at$/  # matches "where you are at" but not "at $5.00, it is a bargain" 
/\$/  # matches "at $5.00, it is a bargain"

variable interpolation

Variables are interpolated. Since the dollar sign is used to mark ends of strings, as explained above, it should not conflict with interpolation of scalar variables that begin with a dollar sign.

Example:


$word = "cat;
/$word/  # matches strings that contain "cat"

precedence

Know that it exists. Look it up on a text on Perl. Use parentheses.

explicit target string

The ( =~ ) operator takes two arguments: a string on the left and a regular expression pattern on the right. Instead of searching in the string contained in the default variable, $_, the search is performed in the string specified.

Example:


$a =~ /cat/  # does the content of $a contain "cat"?
 =~ /cat/  # does the next line of input contain "cat"?

case

Case can be ignored in the search by placing an ( i ) immediately after the last delimiter.

Example:


/cat/i  # matches "cat", "CAT", "Cat", etc.

5.2 Regular expression operators

Regular expression operators include a regular expression as an argument but instead of just looking for the pattern and returning a truth value, they perform some action on the string, such as replacing the matched portion with a specified substring, like the well-known "search and replace" commands in word processing programs.

substitution

Looks for the specified pattern and replaces it with the specified string. By default, it does this for only the first occurrence found in the string. Appending a ( g ) to the end of the expression tells the operator to make the substitution for all occurrences..

Form:


s/pattern/replacement/
s/pattern/replacement/gi
$var =~ s/pattern/replacement/

In the second version, ( g ) and ( i ) indicate that the replacement should be made for all occurrences and that the match should ignore case. In the third version, the action is performed on the variable indicated, instead of on the default variable, $_

Examples:


s/cat/dog/  # replaces "cat" with "dog" in $_
s/cat/dog/gi  # same thing, but applies to "CAT", "Cat" wherever they appear
$a =~ s/cat/dog/  # applies the operation to $a

split( )

Split searchers for a pattern in a specified string and, if it finds it, throws away the match portion and returns the "before" and "after" substrings as a list.

Form:


@var = split(/pattern/, string);
@var = split(/pattern/)

If no string is specified, the operator is applied to $_.

Examples:


@a = split(/cat/, $aString);
@a = split(/cat/);

In the first example, the contents of $aString are split on "cat" and the two parts assigned to the array, @a. In the second, the operator applies to the contents of $_.

join( )

Approximately the opposite of split. Takes a list of values, concatenates them, and returns the resulting string.

Form:


$var = join("item_1", $item2, . . .);

Example:


$a = join('cat", "dog", "bird");  # returns "catdogbird"
$a = join($b, $c);

6. System Operators

Perl offers a number of operators that mimic or call UNIX system operators available through a shell. Consequently, the discussion here will assume familiarity with corresponding UNIX facilities and will be oriented toward accessing those facilities through Perl. For additional details on system functions, per se, see the appropriate man pages or other UNIX sources.

Perl system operators can be divided into two large categories: file/directory operators and process operators. However, while useful, the distinction does not always hold. For example, Perl provides a mechanism whereby operators can be accessed as if they were files, permitting a Perl script to "read" the data they produce or to "write" to them to supply input data.

6.1 File/directory operators

chdir

Allows a Perl process to change its location to a specified directory within the file system. The function takes a single argument, an expression that evaluates to the path for the desired directory, and returns a true/false value indicating success/failure.

Form:


chdir ("/path/ . . . /directory");

Example:


chdir ("/afs/unc.cs.edu/home/jbs/public_html/perl");

Note that the path is defined within the namespace of the UNIX file system, not the namespace as configured for a Web server. In the UNC CS environment, users have individual home directories under the UNIX directory, /home. Under each user's home directory is a public_html directory, intended to be used for his or her Web-related materials.

When specifying a path for a Web server, ~login can be used to abbreviate the path to the user's home directory because of the way the server is configured. As a result, the Web server automatically inserts /public_html into the path following ~login. Consequently, for files and directories below public_html, one MUST NOT specify /public_html; otherwise, that directory will be duplicated in the Web path.

When specifying the path for a Perl program, /home/login is used instead of the tilde abbreviation and public_html MUST be included, if it lies along the path. One implication of this difference in the two name spaces is that Perl programs (as well as executables in another language) can reference files outside the subset of the filespace for which the server is configured.

opendir

Opens a directory so that subsequent operations can read the members of the directory, as described below. Takes two arguments: a filehandle that will be used with subsequent readdir operators and the path to the directory to be opened; the operator returns true/false indicating success/failure.

Form:


opendir (DIR_NAME, "/path/ . . . /directory");

Example:


opendir (DIR, "/afs/unc.cs.edu/home/jbs/public_html/perl");-

readdir

Once a directory is open (using opendir) and a file handle is established for it, the names of files and directories within it can be read into a Perl program. Like other read operators, readdir delivers either the name or all names, depending on whether it occurs within a scalar or array context.

Form:


readdir(DIR_NAME)

Example:


$name = readdir(DIR);  # just one name
@name = readdir(DIR);  # all names

closedir

Closes a directory that has been opened with opendir. Directories are automatically closed at end of execution of Perl program, but closedir provides an explicit operator for doing this and promotes "neatness."

Form:


closedir(DIR_NAME)

Example:


closedir(DIR);

symlink

UNIX links provide a mechanism whereby a file or directory that exists in one directory can be referenced in another directory. Two types of links exist. Symbolic links, also called soft links, are more flexible in that the file or directory that is pointed to does not have to exist at the time the link is created nor are there restrictions on where such a file or directory has to be stored. By contrast, hard links are not normally permitted for directories and UNIX requires that the file and its linked surrogate reside on the same physical volume. Because of these restrictions, symbolic links are likely to be more appropriate for most tasks.

Form:


symlink("path", "LINK_NAME");

Example:


symlink("/afs/unc.cs.edu/home/jbs/public_html/perl", DIR_PERL);

In this example, the separate directory I use for Perl scripts can be referenced directly from the context (directory) where a Perl script is run by a Web server, such as cgi-bin or wwwc-bin.

link

Hard links are created using the link operator. As already stated, hard links cannot be made to directories, and files linked to one another must reside on the same physical volume. Hard links constitute a form of reference by name. Consequently, there is no notion of a primary version of a file and its secondary aliases; all references to the underlying file are equal and no such reference is more fundamental than another.

Form:


link("path/file", "LINK_NAME");

Example:


link("DIR_PERL/hello.html", HELLO_HTML.PL);

Note that the path in this example assumes the symlink created in the prior step and that the hard link defined here is to a particular file, not to a directory. In this particular case, the link is NOT actually made because the two files exist on different volumes.

readlink

Provides the same information for a symlink to a Perl program that is provided by the ls -l command.

Form:


readlink("LINK_NAME");

Example:


readlink("DIR_PERL");

unlink

Files are removed using the unlink operator. For hard linked files, the underlying file is removed only when the last link or reference to it is removed. Consequently, unlink deletes the specified file within the current directory but does not affect other possible references to the file in other contexts.

Form:


unlink("LINK_NAME");

Example:


unlink(HELLO_HTML.PL);

rename

Files are moved using rename. If the system crashes during a move/rename, the file may be lost. Consequently, many people first copy a file and then delete the original version instead of moving/renaming.

Form:


rename("old_file_name", "new_file_name");

Example:


rename("hello_html", "hello_html.pl");

mkdir

Directories can be created within a Perl program using the mkdir operator. It takes two arguments, a name for the new directory and a mode, and it returns a true/false success/failure code. The mode designates access permissions for the directory and conform to standard UNIX octal values for such. For example, 0777 gives read/write/execute permission to owner, group, and others, whereas 0666 gives read/write permissions to everyone, but not execute permission. See the man page for chmod for a list of octal codes.

Form:


mkdir("new_dir_name", mode);

Example:


mkdir("perl_scripts", 0777);

rmdir

Removes a directory, but only if the directory is empty, i.e., all of its files have previously been deleted. It returns a true/false success/failure code.

Form:


rmdir("dir_name");

Example:


rmdir("perl_scripts");

chmod

Changes the access permissions on a file. It includes a mode and one or more files whose permissions are to be changed. It returns a true/false success/failure code.

Modes, again, are composable octal values that can be found in the man page for chmod. In general a value of "4" indicates read, "2" write, and "1" execute. When the value occurs in the highest order position, it refers to the owner; in the middle position, it refers to the owner's group; and in the low-order position, it refers to others. Thus, 755 gives read and execute permission to everyone, but reserves write access for only the owner.

Form:


chmod(mode, "file_name");

Example:


chmod(0775, "hello_html.pl");

The above are the primary file and directory operators. Several additional one are available, such as commands to change timestamps and ownership of files. They will not be discussed here. See a standard text on Perl for descriptions of them.

The next group of operators are concerned with UNIX processes.

6.2 Process operators

Process operators range in functionality from the capability to execute a UNIX system operator as is normally done from a shell to the operators used to implement a client/server architecture using forked processes with multiple threads of execution. The discussion will proceed from the simple to the more complex.

system

The simplest form of Perl process operator is the system operator. Just as a UNIX shell launches a new process to carry out a command, so the system operator causes Perl to launch a new process to carry out the indicated operation. It takes a single argument, the name of the process or command to be executed, and it returns a success/failure code. However, unlike many other operators, system normally returns a zero if successful and a nonzero value if unsuccessful.

When the process executes, it inherits the files of the Perl process from which it is launched. Thus, output produced by the process normally goes to the standard files, such as STDOUT and STDERR, whereas input normally comes from the standard input file, STDIN. However, data can be redirected to/from other files.

In addition to inheriting the files of the parent process, the child also inherits its environment variables. These are available through the %ENV associative array. For a more through discussion of environment variables within a Web context, see my CGI/PERL Tutorial.

Form:


system("process"):

Example:


system("pwd");
system("xpwd");

In the first example, Perl asks the system to execute the path process which, in turn, determines the path of the current location and writes that information to STDOUT, which is normally displayed on the terminal screen; it then returns a value of zero to the Perl Program from which it was launched, indicating success.

In the second example, perl asks the system to execute a nonexistent process. The system operator cannot oblige (fails) and returns a nonzero value (i.e., 65,280).

backquotes

Backquotes provide a means of returning to the Perl process the value generated by the child process that would have been written to STDOUT or another file had that process been launched through the system operator. This capability is needed, for example, if the Perl script is to be run through the CGI by a Web server and the results are to be written by the parent (Perl) process to STDOUT for transfer back to the Web client and display to the user. Otherwise, what would be available to the Perl program would be just the return code of the system operator.

Form:


`process`

Example:


$A = system("pwd");
$A = `pwd`;

In the first example, the pwd process is launched and $A is given the value 0 if it executes properly or some nonzero value (i.e., 62,580) if it does not.

In the second example, the pwd process is launched, but instead of writing its results to some file (e.g., STDOUT), the result is returned by the backquotes operator and assigned to $A. Thus, in this case, $A ends up with the character string that designates the path to the directory where the Perl script is executed (e.g., /afs/cs.unc.edu/home/jbs/public_html/perl).

filehandles

Since UNIX command processes normally write their output to a file, such as STDOUT, and/or receive their input from standard input (STDIN), they may be "opened" and assigned a filehandle so that subsequent I/O can come from or be directed to the Perl program through conventional read and write facilities. Thus, processes can be launched and subsequently treated as if they were files.

The filehandle form of process interaction is based on underlying UNIX pipes; consequently, if the process is to be accessed through a read, a vertical bar (|) must be appended to the right side of the process name; conversely, if it is to be accessed through a write, the vertical bar goes on the left side.

Form:


open(PROC_HANDLE, "process|");
close(PROC_HANDLE);

open(PROC_HANDLE, "|process");
close(PROC_HANDLE);

Example:


open (PWD_HANDLE, "pwd|");
#  read and process pwd data
close(PWD_HANDLE);

open (MORE_HANDLE, "|more");
#  generate and write data to more
close(MORE_HANDLE);

In the first example, the process, pwd, will be opened for read access in the Perl program. In the second example, the process, more, will be opened for write access.

In both cases, the processes should be closed since they will continue running otherwise. Any I/O directed to or from the processes would, of course, be done between the open and close statements.

exec

The exec operator works much like the system operator except that when it launches another process, the Perl program from which the launch originated immediately terminates.

Form:


exec "process";

Example:


exec "pwd";

Note: exec causes problems for the Web server when used in the CGI context.

fork

The most sophisticated and most powerful of the process operators is fork. It enables a process to launch a duplicate of itself that can run concurrently with the parent process from which it was launched and is often used to implement the server part of client/server designs. The parent and child processes are virtually identical, sharing the same code, variables, and open files. They are differentiated from one another only by the return code generated by the fork operator. It returns a value of zero (0) to the child process and a one (1) to the parent process. Thus, the fork operator often appears within a conditional statement, such as an if/else or an unless construct.

Form:


(fork)

Example:


if (fork) {
#  parent process
} else {
#  child process
}

exit

exit causes a process to terminate immediately. Thus, when used with a launched process, it functions much like a return statement in a subprocedure. It can be used to kill a forked process that would continue running, otherwise.

Form:


exit;

Example:


unless (fork) {  #  zero(0) condition
#  child process
exit;
{  #  one(1) condition
#  parent process

In this example, the unless functions as an if not construct; hence the false condition in which the child process is defined comes first. Once the code in that condition is completed, if it were not stopped (i. e., by the exit), the child process would continue beyond the curly braces where the parent process code appears.

wait

wait causes the parent process to wait until the child process completes execution before continuing.

Form:


wait

Example:


unless (fork) {  #  zero(0) condition
#  child process
exit;
{  #  one(1) condition
#  parent process 
wait;  #  parent waits until child completes before continuing

email: jbs@cs.unc.edu

url: http://www.cs.unc.edu/~jbs/

Go to course homepage