6. Echo STDIN Variables

The task in step 6 is to parse the data passed as a character string from the server to a CGI program through STDIN. This involves two new ideas: reading data from STDIN and then parsing that data. Reading data from STDIN is required when METHOD=POST, whereas when METHOD=GET, data are passed through the QUERY_STRING environment variable, as seen in step 5. Regardless of whether the data are obtained from an environment variable or read through STDIN, they must be parsed before they can be used by a CGI program since they are compressed into a continuous string, without spaces, and some "special" characters are mapped to other values.

The purpose of parsing is to break the encoded string passed between client and server into attribute/value pairs, translate several special characters that were encoded back into their original character forms, and to store the translated attribute/value pairs in a convenient data structure, i.e., an associative array.

Parsing strings in CGI, regardless of whether they are passed through STDIN or through QUERY_STRING, requires four steps:

  1. TOKENIZE the string into a list of attribute/value strings
  2. SPLIT each attribute/value string into separate key and value strings
  3. DEPLUS the strings to translate plus signs(+) into spaces
  4. DECODE 3-character hex representations of special characters and translate them back into their original 1-character forms

The order in which these four steps are carried out matters. Consider the following portion of a query and its encoded form:

I vote ++; let's make an offer & try to hire her!

name=john&textarea=I+vote+%2B%2B%3B+let%27s+make+an+offer+%26+try+to+hire+her%21

Decode should be done last. For example, ampersands are used to separate attribute/value strings from one another; consequently, ampersands in the data are translated into 3-character hex values. If these hex values were translated back to ampersands before tokenizing, they would be confused with the ampersands used as delimiters. The order generally recommended is that given above. One exception is to deplus first, while the input data exist as a single string, before tokenizing and splitting; using the example, below, explain why this exception works.

Parsing requires some five or six new Perl concepts and operators:

Each will be discussed in the context of the parse program, below.

<STDIN> is actually an operator. It returns the next line of input from the file, STDIN. Consequently, it does not need to be used with another operator or verb, such as a read. Consequently, the command

$in_string = <STDIN>;

reads the next line of input, which is the entire concatenated string of attributes and values, and places them in the scalar variable, $in_string.

Translate and squeeze. The next section of code translates plus signs (+), used to indicate spaces in the original data, back into spaces; it also removes multiple spaces so that only a single space exists between any two words. The Perl command used for this is tr, for translate. It takes two patterns, delimited by slash (/) characters, and translates instances of the first into the second. For example, it can be used to translate all uppercase characters to lowercase, or vice versa. Patterns can be quite complex, and will be discussed in more detail when regular expressions are described.

In the line of code shown, the plus sign is preceded by a backslash (\) to indicate that it is the character, plus, in this context and not the Perl operator for addition. The s at the end of the expression removes, or "squeezes out", multiple instances of the translated pattern, spaces in this case. Finally, the symbol =~ is actually an operator. It identifies the variable on the left as the one to which the operator on the right, the tr, is applied. Thus, it works like an assignment statement, although it is not literally that. Had it not been used, the translate would have been applied to an invisible (predefined) variable, called the default variable and denoted $_. It is a somewhat mysterious variable whose value is set as a result of the last operation; often it is the variable to which one would logically apply the next operation.

Split does what the name implies: it takes a pattern, shown between the slash (/) delimiters, and a character string, and returns a list of the portions of the string that precede and succeed the pattern. Thus, it produces a list of the portions of the string that don't match the pattern and throws away the portions that do match.

In the TOKENIZE step, the input string is split on the pattern, /&/, and the resulting list of attribute/value strings is assigned to an array, @attr_val_strings, indicated by the at-sign (@) prefix of the variable name.

In the SPLIT step that follows (an unfortunate choice of labels on my part), each such string is further split into the portions that come before and after an equal (=) sign, with the two strings assigned to a 1x2 array, @pair. Element $pair[0] is the part that comes before the equal sign, and $pair[1] is the part that comes after. In the next line of code, these two array values are assigned as the associated key and value parts for a row in the associative array, %attr_value. However, since the expressions refers to individual elements of the array, each such element is referred to using its scalar prefix. Let me paraphrase the line,

$attr_value{$pair[0]} = $pair[1];

Assign the contents of $pair[1], the part of the string that came after the equal sign, as the value element in the row of the associative array, %attr_value, that is indexed by the key, $pair[0], which is the part of the string that came before the equal sign. Since the assignment applies to only a single element in the array, the scalar name, $attr_value, is used.

Associative array assignment. Just did it!

Substitution. The substitution operator, s, is at the center of the DECODE step. This is the Matterhorn. Once we get over this peak, it's all downhill form there. As with many Perl expressions, there is a great deal of magic packed into this single line of code. That's the beauty of the language, if you like it, or its downfall, if you don't. But it is one of the main characteristics that makes Perl what it is. We'll work from the outside in.

The DECODE block of code looks for 3-character sequences that consist of an escape character, %, followed by a 2-character hexadecimal value. Special characters, such as parentheses, spaces, ampersands, and the like, that might interfere with processing the data string, are coded in this way for transfer; they must be translated back to their original forms for processing. That is what DECODE does.

The code to do this begins innocently enough, using the key operator to return the list of keys from the associative array, %attr_value and the foreach operator to step through that list, referencing each key value in turn through the scalar, $key. In the next two statements, the substitution magic is done to translate all of the 3-character hex codes back into their original 1-character forms. The first substitution is performed on the $key and, hence, on the key element of the associative array, %attr_value. The second uses the transformed key as the index into the associative array and transforms the corresponding value element. Thus, each key and each corresponding value are transformed separately, requiring two substitution operations for the pair.

Now for the assault on the peak. The substitution operator, like the translate operator, takes two patterns, delimited by slashes. It looks for an instance of the first pattern in the target string and substitutes an instance of the second pattern for it. The pattern that is looked for here is %(..). The percent sign is a literal and is looked for, explicitly. The two periods (..) are matched by any two characters. The parentheses around the two periods tells Perl to "remember" those two characters so that they can be referred to later, in this context through the variable, $1. Thus, the string, %28, which is the coded representation for a left parenthesis, would be matched by this pattern and the 28 would be assigned as the value to the variable, $1. When such a pattern is found, the operator substitutes what follows, delimited by the slashes, for the 3-character string.

What is substituted here is pack("c",hex($1)). pack takes two arguments, a format control string and a list of values, and creates a single string from those values. The format control string is defined to be a single character, denoted by the"c", and the list of values is the single value, $1, which is the hex code for the character to be translated.

Note that what is produced as a result of the substitution is a Perl operator, pack. The final e tells Perl to execute that operation and substitute the results of the operation in the place where the pattern is found. The g at the end of the expressions says that the substitution should be made for all occurrences of the pattern. Finally, the =~ operator directs the substitution to the desired string.

To sum up, the DECODE block goes through the associative array of attributes and their corresponding values one row at a time, looks for all instances of special characters -- coded as the escape character, %, followed by a 2-digit hex value -- and replaces each such 3-character sequence with the appropriate single (special) character; it does this, first, for each key in the associative array and, then, for the associated value indexed by that key.

Echo STDIN Variables Program

#!/usr/local/bin/perl
#
#     INPUT data
$in_string = <STDIN>
#
#     DEPLUS $in_string
$in_string =~ tr/\+/ /s; # translate and squeeze multiple spaces
#
#     TOKENIZE attr/val strings
@attr_val_strings = split (/&/, $in_string);
#
#     SPLIT attr/val strings and put into assoc. array
foreach $out_str (@attr_val_strings) {
@pair = split (/=/, $out_str);
$attr_value{$pair[0]} = $pair[1];
  }
#
#     DECODE special characters
foreach $key (keys %attr_value) {
  $key =~ s/%(..)/pack("c",hex($1))/ge;
  $attr_value{$key} =~ s/%(..)/pack("c",hex($1))/ge;
  }

#     OUTPUT section

#     generate header lines
print "200 ok\n";
print "content-type: text/html\n\n";

#     GENERATE report, in HTML
print "<HTML>\n";

print "<HEAD>\n";
print "<TITLE>stdin vars.</TITLE>\n";
print "<H2>Print CGI STDIN Variables</H2>\n";
print "</HEAD>\n";

print "<BODY>\n";
print "<HR>\n";
print "<H3>STDIN Variables</H3>\n";
print "<UL>\n";
foreach $key (keys %attr_value) {
  print "<LI>$key = $attr_value{$key}\n";
  }
print "</UL>\n";
print "</BODY>\n";

print "</HTML>\n";


Write and test an Echo STDIN Variables script. You can also execute the program, above: Echo STDIN Variables.