|
|
The Common Gateway Interface
You are here: irt.org | Articles | CGI & Perl | The Common Gateway Interface Published on: Saturday 18th April 1998 By: Jason Nugent
The Common Gateway InterfaceThis is the second article in the CGI/Perl series for the JavaScript No Content Web Site. This article will hopefully de-mystify some of the details concerning exactly how information is passed back and forth from the browser to the server. Like the title says, we are going to talk about passing information back and forth through the Common Gateway Interface. Before we delve into our first example, I feel that it is probably important to mention one of the most important tags in HTML for CGI, and that is the <FORM> tag. Rather than discussing all the different types of form <INPUT> items, I am going to concentrate on two specific attributes of the <FORM> tag - the ACTION attribute and the METHOD attribute. The ACTION attributeWith CGI, this attribute most often points to the location of a CGI program (or "script") that will get run on the server when the form is submitted to it when the user clicks on the "submit" button. Typically, these attributes have values like:
The important thing here is the /cgi-bin/script.pl section of the attribute. What this is telling the browser is to submit the information contained in the form to a script located in a directory (/cgi-bin), called "script.pl". Typically, most servers are set up with directory mappings that know that /cgi-bin actually points to (or maps) to another directory on the server where CGI scripts are allowed to be executed. Generally, a user is only able to execute CGI scripts in a central "bin" directory. This is important since it can be a major security risk if a system administrator allows CGI scripts to run from any directory on the server without knowing what the script may do. Only allow this if you completely trust your users, or if your security is so poor in other locations that another hole won't make much difference. That's a joke. Some Internet Service Providers (ISPs) will let users run their own scripts from inside their public_html directories, in a subdirectory called cgi-bin. For a user with an account on www.somewhere.com, the action for a script might look like:
This tells the browser to submit the form information to a script residing on www.somewhere.com, in user's cgi-bin directory, inside his public_html directory. It's important to make the script executable for everyone, by logging in to your server, going into your cgi-bin directory and typing:
This command gives people the ability to run your script. Without it, they would simply get an error saying that permission was not available. GET vs POSTThe next attribute that we are going to take a look at is the METHOD attribute. This attribute controls how the information is sent to the server. A typical METHOD attribute looks like this:
or also
If a METHOD is not specified as part of an ACTION, it defaults to GET. "What's the difference," you say? Quite a bit. Let's look at them one at a time. First, though, it's important to remember to provide your form items with unique and meaningful names. The NAME attribute of your form does this, and this is what is submitted to the CGI script as part of the name=value pair. It is very important to remember to put something in these for each and every item in your form. GETGET sends the form information back to the server as part of the URL when requesting the CGI script. That's not as bad as it sounds, but an example might help. If you ever used a large search engine like Webcrawler, you would have noticed how convoluted the URLs look after your search results have been returned. Typically, they look like this:
This one is a bit simple, but do you see that part after the question mark? That is the information that was contained in the form, in this case, the form must have contained something like an text field called "firstname", and the user entered "Jason" as it's value. In CGI terminology, the information contained in the part of the URL after the question mark is called the QUERY_STRING, which consists of a string of name=value pairs separated by ampersands (&). If your form contained two pieces of information, the URL would look like this:
There are both advantages and disadvantages of using GET to submit information. Advantages include the ability to "bookmark" search results, since the submitted information is part of the URL, as well as create hypertext links which submit information to CGI scripts. The biggest disadvantage to using GET is that the QUERY_STRING is limited to the input buffer size of your server. This is typically something like 1024 bytes, which means that it is possible to submit too much information and lose some along the way. That is where POST comes in. POSTWith POST, there is no information added to the URL when a CGI script is called. Instead, the information is sent after all your request headers have been sent to the server. The length of the information (in bytes) is also sent to the server, to let the CGI script know how much information it has to read in via STDIN (standard in). We'll see how to get at this in a future article. URL EncodingI should really point out that in both cases, the information sent to the browser is "URL encoded" to make it legal. Remember, things like spaces, periods, and slashes are illegal characters in a URL so they must be encoded in order to be transported across the web. These items are converted to their hexadecimal equivalents by the browser. It is up to the CGI program to convert them back, and up to you to write the code that will do that. Let's look at another URL again, this time with a space in it. Let's say you had a field on a form called myname, and you entered "Jason Nugent". Your URL to the script (if you used GET) would look like:
See the + sign in there? Without it, the URL would be invalid. With it, you can safely send it across the web. But then you have to decode it. If you submitted something like ~filename, the resulting URL to the script would be:
In this case, a tilde must be converted to its hexadecimal equivalent. For a tilde, its corresponding character is %7E. A Bit of PerlSo, where are we now? Well, we've managed to send our information across the web to our server but now we have to do something with it. This is where our script starts to come in. Since I mentioned in the last article that I was going to be using Perl for these examples, I suppose I'd better get at it. After all, you've read this much, yes? I think that for this article I will just show the beginnings of how a Perl CGI script works, which will keep this article to a readable length. The first line of all Perl scripts that are going to run as CGI scripts contain a special line that points to the Perl interpreter on your server. These lines, without exception, look like this:
or maybe
The #! symbol tells the server that this line contains the location of the Perl interpreter. In these cases, the interpreter is located in /usr/bin (or in the second case) /usr/local/bin. This sometimes varies from system to system and the easiest way to find out is to type:
on your server. This will tell you what to use in your script. The -w flag after the location is a special flag for Perl that puts Perl into "warning mode". ALWAYS use this, as it makes the interpreter spit out error messages that are much more descriptive than it would normally, which helps immensely when debugging your scripts. It also tells you about things like variables that have not been initialized which sometimes means that you have a typo somewhere. Most Perl programmers will not consider a Perl script "professional" unless this is included. I agree. There are other pragmas that you can add to your script to make sure they run cleanly, and we will see them later on in another article. Printing to the browserAll documents that are sent back to the browser must have an accompanying MIME (Mulitpart Internet Mail Extension) type. This tells the browser what to do with the information it is receiving. When printing back to the browser, you must tell it what you are doing. For now, we will say that you will be sending back HTML text. The first line, in all print cases, must be a MIME type. This will also show you what Perl's print statement looks like, too. Your first printable line in a Perl script should be something along the lines of:
"Content-type: text/html" is the MIME type of the document the browser is going to receive. You've told it to expect a text/html document in this case. Notice that there are two \n characters after it, though. These are newline characters. They are REQUIRED, and 90 percent of the time when your script does not work and you've checked the syntax and everything seems to be fine, you've forgotten these. The MIME type is sent back as part of the header information of the document, and a blank line must appear after it to tell Perl that the header is finished. A \n character is how you represent a carriage return. So, the simplest CGI script we have can have that does something might be:
Notice that each command in Perl is terminated with a semi-colon. Below is a form that contains a single button. When clicked, it will run the script above and will print "Hello, World!" in your browser window.
The Next ArticleThe next article will deal more on the server side, specifically, extracting (decoding) the information sent to the browser so it is useable. Perl techniques to do this will include arrays, hashes, environmental variables, a few control structures, and the split command. Feedback on 'The Common Gateway Interface'
View the profile on Jason Nugent and the list of other Articles by Jason Nugent. |
-- div -->
|