Thursday, April 21, 2011

Basics of CGI Scripting in Perl and Python: Part 01

Screencasts are available for Sections 1, 234, 5 and 6 of this post.


1. What is CGI?

CGI stands for "Common Gateway Interface." It is a set of protocols for generating dynamic web content. Dynamic content differs from static content in that it is automatically generated by applications without any manual human intervention. The meaning of the term 'Common' in the CGI  acronym  means that it is not specific to any operating system or programming language.

CGI is said to have been jointly developed by the National Center for Supercomputing Applicaitons (NCSA) at the University of Illinois at Urbana-Champaign and CERN, the European Organization for Nuclear Research, in the early 1990's. The primary objective of CGI was dynamic content generation. It was quickly adopted as an unofficial web communication standard worldwide because of its simplicity, brevity, and informal specification.

2. Client Server Interaction 
 
A typical logical sequence of steps to generate dynamic content (e.g., a web page or an XML summary of some event) is as follows: 1) take an input from a client; 2) process that input; 3) generate dynamic content; and 4) send the content back to the client. We can make the above steps more concrete by saying that a web server executes a CGI program in response to an HTTP request from a web browser. Such requests are called methods. For CGI programmers, the most most important methods are GET and POST.

When GET is used, the input is sent to the server as part of the URL. The input can be empty, in which case a CGI script specified by the URL runs without any input. When the input is not empty, it is appended at the end of the URL after the ? delimiter. The input proper is a sequence of name-value pairs separated by the ampersand (&). Each name value pair has the form name=value.

Suppose we have written a Perl CGI script env_vars.pl that displays all CGI global variables that the client and server use to communicate with and discover information about each other. Suppose we also know that the URL for our script is http://localhost:8000/cgi-bin/env_vars.pl. If we enter "http://localhost:8000/cgi-bin/env_vars.pl" in our browser, env_vars.pl runs with no input. If, however, we enter "http://localhost:8000/cgi-bin/env_vars.pl?x=1&y=2", then the input is "x=1&y=2" and consists of two name-value pairs: "x=1" and "y=2". The value of the input is stored in the CGI variable QUERY_STRING.

3. CGI Development Process

The CGI development process consists of three basic steps: 1) deploy a web server; 2) write and debug the CGI scripts offline; and 3) deploy and test the scripts on the server. If you want to learn the basics of CGI scripting and Perl/Python, one option is the use the CGIHTTPServer that comes with Python. This simple server can be put to productive use for rapidly prototyping your ideas and learning CGI/HTTP intricacies. Once you have a prototype, you can port it to the Apache HTTP web server or some other industrial strength server that will meet your specific computational and security requirements.

3.1. Deploying Python's CGIHTTPServer

You have to create a directory where you want to run  the server. For example, /home/vladimir/code/python/cgi on Linux/Unix (or its equivalent on Windows). In that directory, create the sub-directory cgi-bin. This is is the directory where you will be placing your CGI scripts that the Python CGIHTTPServer will run. On Linux/Unix, chmod +x each script to make sure that it is executable. To start the server:
  • $ cd /home/vladimir/code/python/cgi (this is the directory where you will run the server and that has cgi-bin as its sub-directory);
  • $ python -m CGIHTTPServer
    You should see Serving HTTP on 0.0.0.0 port 8000 in your terminal, as shown in Figure 1, which means the server is up and ready to run your CGI scripts. Click on the image to enlarge it in your browser.


    Figure 1. Starting Python's CGIHTTPServer on Ubuntu.


    3.2. Writing CGI Scripts

    Let us write two basic CGI scripts: one in Perl (hello.pl), the other in Python (hello.py). Each script will display a simple greeting message in the browser window. Specifically, the script hello.pl generates the following HTML.

    <html>
    <head><title>Perl Script on Python Server </title></head>


    <body>
    <h2>Hello from Perl Script on Python Server</h2>
    </body>
    </html>


    And here is the HTML generated by hello.py.

    <html>
    <head><title>Python Script on Python Server</title></head>


    <body>
    <h2>Hello from Python Script on Python Server</h2>
    </body>
    </html>

    4. Testing CGI Scripts in the Browser

    Place  hello.pl and hello.py in the cgi-bin directory and chmod +x (or Windows equivalent) each  script. Open your web browser and enter http://localhost:8000:/cgi-bin/hello.pl for the URL. Figure 2 shows what the generated HTML looks like  in  my Firefox on Ubuntu. Click on the image to enlarge it in your browser.


    Figure 2. Output of hello.pl in Firefox on Ubuntu.

    To run hello.py, enter http://localhost:8000:/cgi-bin/hello.pl for the URL in your browser. Figure 3 shows what the generated web page looks like in my Firefox on Ubuntu. Click on the image to enlarge it in your browser.

    Figure 3. Output of hello.py in Firefox on Ubuntu.

    5. Perl's CGI.pm Module

    We can, if we want, to write our CGI scripts with print statements and various string manipulation facilities. For simple scripts, such as hello.py, this may not be a problem. For larger CGI programs, things will get tedious fast because of code duplication. To address this problem and help CGI developers share their coding efforts, the Perl community developed CGI.pm, a module for processing HTTP requests and responses, handling form submissions, file uploads, cookies, query strings, HTTP header preparation, and many other CGI-related tasks. One advantage of using CGI.pm is that it has been developed and refined for over a decade and has been deployed on thousands of websites. To use  CGI.pm in your code, place this use statement at the beginning of your Perl file:

    use CGI qw( :standard );

    Let us start with a simple CGI script, call it  cgi_play.pl,  that uses three sub-routines from CGI.pm: header(), start_html(), and end_html(). These subroutines produce three parts of a typical HTML page. The  script  cgi_play.pl  generates the following HTML:


    <!DOCTYPE html
     PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
    <head>
    <title>Sample Title</title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    </head>
    <body>
    
    </body>
    </html>


    Let us write another CGI script that displays all the global CGI environment variables. These variables are part of the CGI protocol and enable browsers and servers to communicate and exchange information.

    The script, called   env_vars.pl,  generates a two column HTML table and displays the names of the CGI environment variables in the left column and their values in the right column. An HTML table is marked with <table> and </table> tags. A table consists of rows. A row is marked with <tr> and </tr> tags (tr stands for table row). A row is divided into data cells. A data cell is marked with <td> and </td> tags (td stands for table data). Figure 4 shows what the generated web page looks like in my Firefox on Ubuntu for the url http://localhost:8000/cgi-bin/env_vars.pl. Click on the image to enlarge it in your browser.


    Figure 4. Output of env_vars.pl in Firefox on Ubuntu.

    6. Writing a Perl GET Script with an Input Parameter

    Let us write a simple CGI script with one input parameter to get more practice with input parameters to GET scripts briefly discussed in Section 2 of this document. Our script takes one numeric parameter n from the user and generate the first n elements of the Fibonacci sequence. We will save our script in   getfibseq.pl. Figure 5 shows what the generated web page looks like in my Firefox on Ubuntu for the url http://localhost:8000:/cgi-bin/getfibseq.pl?n=5. Click on the image to enlarge it in your browser.

    Figure 5. Output of getfibseq.pl in Firefox on Ubuntu.


    If we look at the page source code (View | Page Source in Firefox), we should see something like this.

    <html>
    <head>
    <title>FIBONACCI SEQUENCE</title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    </head>
    <body>
    <h2>First 5 elements of the Fibonacci sequence</h2><br /><br />0 1 1 2 3 
    <br /><br />
    </body>
    </html>


    7. References
     
    1. http://perldoc.perl.org/CGI.html
    2. http://en.wikipedia.org/wiki/Common_Gateway_Interface
    3. Thomas Boutell. CGI Programming in C & Perl, Addison-Wesley Developers Press.
    4. Dietel, Dietel, Nieto, McPhie. Perl How to Program, Prentice Hall.

    7.Next: Basics of CGI Scripting in Perl and Python: Part 02

    Written and posted by Vladimir Kulyukin

    Comments, bugs to vladimir dot kulyukin at gmail dot com.