Common Gateway Interface (CGI)

Most queries currently made to WWW servers fetch static data stored in a portion of the file system associated with the server. The CGI interface provides a means for a client to request that an arbitrary program be executed by the server. The reason for running that program can be to produce side effects, such as updating a data base or sending e-mail to someone, but more often the program is run in order to return data directly to the client/user in the form of an HTML document generated by the program. The architecture of the CGI with respect to overall Web architecture is shown in the following figure:

CGI Architecture

To use the CGI interface, one must understand four things:

where to place programs, and with what permissions, within the server's file space so that they can be executed as CGI programs
how to send input from the user to a CGI program
how to process user input within the CGI program
how to generate and return information/documents to the user
Each of these topics will be discussed, below. The version of CGI that will be discussed is CGI/1.1. More information regarding CGI can be found through the NCSA CGI overview page. Through it you can go to a basic introduction, a primer for the novice writer of CGI programs, and a more detailed interface specification.
In this and subsequent discussion of Web programming, Perl will be the languge of choice. For a tutorial on CGI/Perl programming, see the Perl/CGI Tutorial. There are several archives of Perl programs. One that emphasizes CGI applications is provided by S. E. Brenner in England. A source for v5 Perl programs is the Whitehead Institute at MIT. You can also get all of the Perl programs included in O'Reilly's Learning Perl text.

1. Mechanics

For the server to locate and execute a program through the CGI, it is normally placed in a special directory known by the server to contain executable programs. In most environments, this directory is named cgi-bin. In most environment, most users do not have write access to this directory because of potential security exposure, and they will have to have a system administrator put their programs there for them.
To support experimentation and non-monitored use of CGI programs, some facilities run additional servers. At the author's department, that server is designated wwwx.cs.unc.edu. Unlike conventional configurations, wwws allows users to place their cgi programs in any directory so long as the filename for a cgi program includes a .cgi suffix -- for example, program1.cgi.
For purposes of this course, you should create two directories under your members subdirectory: a perl directory for your experimentation with perl and a cgi-bin directory for your actual cgi programs. When you create these directories, be sure that anyuser has read, list rights and that the file is executable.

2. Input

There are two basic ways in which data are passed from a server (which, presumably, gets it from the client/user) to the cgi program. The first is through environment variables; the second is through the standard input file, STDIN.
Environment Variables

Environment variables are global variables set by the server that are then inherited by the cgi program. The names of these variables are fixed, and the cgi program must access them through those assigned names. The list of environment variables is the following:

HTTP_USER_AGENT
SERVER_NAME
QUERY_STRING
SERVER_PORT
HTTP_ACCEPT
SERVER_PROTOCOL
PATH_INFO
REMOTE_ADDR
DOCUMENT_ROOT
PATH
PATH_TRANSLATED
GATEWAY_INTERFACE
REQUEST_METHOD
SCRIPT_NAME
SERVER_SOFTWARE
REMOTE_HOST
Values are assigned to environment variables by the server before the cgi program begins execution and, thus, are available to it when it begins. Example values are the following:
Example values

HTTP_USER_AGENT = Mozilla/1.1N (Macintosh; I; 68K)
SERVER_NAME = wwwx.cs.unc.edu
QUERY_STRING = querry-string-added-to-end-of-URL
SERVER_PORT = 80
HTTP_ACCEPT = */*, image/gif, image/x-xbitmap, image/jpeg
SERVER_PROTOCOL = HTTP/1.0
PATH_INFO = /additional/info/added/to/path
REMOTE_ADDR = 152.2.132.132
DOCUMENT_ROOT = /afs/unc/proj/wwwc-f95
PATH = /usr/ucb:/bin:/usr/bin:/usr/afsws/bin:/usr/bin/X11:/usr/local/bin/X11R5
:/usr/local/bin/X11:/usr/etc:/usr/local/bin:/usr/5bin:/usr/local/contrib
/mod/bin://bin
PATH_TRANSLATED = /afs/unc/proj/wwwc-f95/additional/info/added/to/path
GATEWAY_INTERFACE = CGI/1.1
REQUEST_METHOD = GET
SCRIPT_NAME = /wwwc-bin/smith/cgi_env_vars
SERVER_SOFTWARE = NCSA/1.4.2
REMOTE_HOST = mac-ara-port2

Let me call your attention to two values in particular. The variable, QUERY_STRING, is set to whatever string follows a question mark (?) occurring at the end of the URL. Such values are typically sent as a result of a FORM that uses METHOD=GET; it often represent a query string, such as a query to a database, depending on the function of the FORM. You can, of course, manually enter such a string directly in the URL, which is what I did to generate the sample values shown above.
The second variable to note is PATH_INFO. Like QUERY_STRING, it is additional information passed to the cgi program through the URL. In this case, it is additional information added to the URL immediately following the path to the cgi program (and before the QUERY_STRING, should one be present). You have seen this convention used with respect to imagemaps, where the first part of the URL is the path to the imagemap program and what follows is the path to a map file.
Both of these conventions are pretty klugey. A more flexible and cleaner approach is to use STDIN, described below. In the meantime, you can experiment with the different options through this example form.

STDIN

If the method used by a form is POST, instead of GETSTDIN. From within your program, you can then read the data and process it accordingly.
The data you read from STDIN must be parsed into attribute/value pairs. This will require that you:

split the data by ampersand (&) into attribute=value pairs
if working in Perl, you may wish to place the attribute=value pairs into an associative array by splitting on =
translate pluses (+) to spaces
translate special characters in hex to their regular character form
This decoding and parsing process is discussed in detail in the Perl/CGI Tutorial
You can experiment with input sent with METHOD=POST using this example form.

3. Processing the data

Once you have parsed any input from STDIN, you are ready to process it and any data received through an environment variable. At this point, you are in a conventional programming context and your program can do virtually anything a conventional program written in the language you are using can do.

4. Sending data back to the user/client

When you are ready to send data back to the user, the cgi program can do so by writing the data to STDOUT. Thus, you just generate print statements as if the data were being sent to a terminal or to a printer. However, since you are sending your data to a Web client, your program must speak the vernacular. That means it should be in HTML and the HTML should be preceded by server header lines. This may sound complicated, but it's not.
The server header lines are the following:

Status (just the numeric return code followed by the brief text explanation, since the server will add the server version)
Content-type, in the usual MIME form
Location, (optional) a URL to be followed and returned to the client
blank line (CRLF)
After the header lines, the cgi program generates its data, but with HTML tags interspersed, just as if you were writing conventional HTML static files. These lines of generated HTML data will then be returned by the server to the client/user where they will be displayed just as if the data were a static page.