Common Gateway Interface

 The Common Gateway Interface, more commonly referred to as CGI, enables a Web server to launch a program in response to a browser query, instead of fetching a file. The program run through CGI can perform most of the functions any program can provide. Thus, one common CGI application is database access. Once the data have been generated, the CGI program must construct a response to be returned to the client. Most often this is in the form of generated HTML.
In the sections that follow, the main components and facilities of CGI are discussed. They differ in minor details with respect to the programming language in which the CGI program is written. Since Perl is the most commonly used language for CGI applications, the discussion will follow the conventions used for Perl.



Overview

The architecture of the CGI with respect to overall Web architecture is shown in the figure, below.
CGI Architecture. A Web browser contacts a Web server through the Internet. Instead of fetching a file, the server passes the client's query to a program through the CGI. The CGI program processes the query and passes the resulting data, usually in the form of generated HTML, back through the interface for return to the client.
To use the CGI interface, one must understand four things:
          1. where to place programs, and with what permissions, within the server's file space so that they can be executed as CGI programs
          2. how to send input from the user to a CGI program
          3. how to process user input within the CGI program
          4. how to generate and return information/documents to the user
 
Each of these topics is discussed, below. The version of CGI that will be discussed is CGI/1.1. More information regarding CGI can be found through the NCSA CGI overview page. Through it you can go to a basic introduction, a primer for the novice writer of CGI programs, and a more detailed interface specification.

Mechanics

For the server to locate and execute a program through the CGI, it is normally placed in a special directory known by the server to contain executable programs. In most environments, this directory is named cgi-bin. Regular users do not usually have write access to the cgi-bin directory. The primary reason for this is to provide administrative control over the programs that are put into it since a program can hang, bringing down the whole server.
Alternatively, some servers can be configured so that they recognize any file with a .cgi extension as a CGI program; for example, program1.cgi. With this option set, CGI programs can be placed in any directory that is accessible to the server. In the author's environment, a special server has been setup for instructional use so that students may develop and debug their CGI programs without fear of disrupting the production server.
Files that are to be accessed by a CGI program in Unix environments must be accessible by anyuser. Setting the proper access permissions is important.

Input

Data to be processed by a CGI program is most often produced by a form that has been displayed on the user's Web browser. That form may include various data fields, checkboxes, and other input options available through the form HTML tag. When the user submits the form, the data specified through the various form fields are consolidated and encoded into a single character string, usually referred to as the query.
Forms must include a METHOD attribute that is set to either GET or POST by the author of the form. That method determines how the query is sent to the server. If the method is GET, the query is appended to the end of the URL that designates the server and CGI program, separated by a question mark (?) from the path segment of the URL. This URL is included as part of the header of the HTTP message. If the method is POST, the query is placed in the data portion of the HTTP message.
Once the HTTP message arrives at the server, the server starts the CGI program designated in the URL included in the HTTP header. It passes the query to the CGI program in one of two ways. If the method is GET, it passes the query through an environment variable. If it is POST, it passes the string through the standard input file.

Environment Variables

Environment variables are global variables, set by the server, that are available to the CGI program. The names of these variables are fixed, and the CGI program must access them through those names. The list of environment variables includes the following:
Example values for environment variables include the following:
Note the variable, QUERY_STRING. It is set to whatever string follows a question mark (?) occurring at the end of the URL. As noted earlier, this value is set by a form that uses METHOD=GET. However, one can also manually enter such a string directly in the URL.

STDIN

If the method used by a form is POST, instead of GET, input is sent by the server to the CGI program through the standard input file, STDIN. The CGI program must read the data from that file. Once read in, the query string must be decoded and parsed. The details of this are explained in the on-line CGI tutorial.

Processing the data

Once the query string has been decoded and parsed, the CGI program is ready to perform whatever action has been requested. At this point, the program can do virtually anything that a conventional program can do. This includes accessing a database, computing the value of a function, or generating HTML strings.
When processing is complete, the CGI program passes its data to the server for delivery back to the browser from where the request originated.

Sending data back to the user/client

The CGI program communicates with the user through the server by writing data to the standard output file, STDOUT. The program does this by generating print statements as if the data were being sent to a terminal or to a printer. However, since it is sending data to a Web client, it must speak the vernacular. That means HTML. The HTML can be preceded by HTTP header lines, but if they are omitted the server will normally add them.
The data from the CGI program is received by the server and included as the body of an HTTP message it constructs. That message is then returned to the Web browser from where the original request came. 


Additional Reading

If you would like to learn more about CGI, there are several sources you may wish to consult.
The actual specification for CGI as well as other helpful documents can be found at the W3C's page on CGI at http://www.w3.org/CGI/. Gundavaram, S., (1996), CGI Programming on the World Wide Web. Sebastapol, CA: O'Reilly & Associates is a comprehensive introduction to the subject. The NCSA offers several online tutorials that can be accessed from their CGI overview page at http://hoohoo.ncsa.uiuc.edu/cgi/.