Chinaunix首页 | 论坛 | 博客
  • 博客访问: 978141
  • 博文数量: 214
  • 博客积分: 10173
  • 博客等级: 上将
  • 技术积分: 1867
  • 用 户 组: 普通用户
  • 注册时间: 2007-06-18 13:48
文章分类

全部博文(214)

文章存档

2012年(1)

2010年(13)

2009年(5)

2008年(98)

2007年(97)

分类: LINUX

2008-01-01 22:10:22

Programming CGI in C/C++

Welcome to the world of CGI! I wrote this tutorial because I was extremely disappointed to find very few resources on this subject anywhere on the WWW. If you have any questions, comments, gripes, or corrections, please feel free to e-mail me at .

I am not, and will not claim to be an expert with C, C++ or CGI. For a programmer, there are many ways to achieve the same result, and many are going to be better than mine. The examples on this page are intended to illustrate a point, and I do not claim them to be efficient or even beautiful examples of good coding practice. I have learned quite a bit more over the past year (Updated 06/10/98) and I hope to get some of my information posted whenever I can find the time to do so.

I plan to organize this tutorial better and include information on Dynamic memory allocation, cookies, and update the current code to correct a few bugs and attempt to show better coding practice. I also plan to rid myself of these damned PRE Tags ;) They illustrate my first inexperienced work with HTML. Thanks for your patience.

The very first thing you should know is that I will be using the POST method of handling data (defined under FORM in the HTML code). The other method is the GET method, I don't use it because the amount of data that can be retrieved from a GET is considerably smaller than what you can get from a POST. POST passes data to the CGI program using a a stream (in the program, stdin).

I must assume you understand enough C to understand this as I do not have the time to write a C tutorial to explain it. The data from the GET method can be accessed through an environment variable called QUERY_STRING. An environment variable is a variable that stores data external to the programs running on an operating system (they exist at the level of the shell). A good example of an environment variable is the familiar DOS PATH (which was copied from UNIX).

The GCC (GNU C++) compiler can be found on a wide range of UNIX systems, and I suggest you use it. It is an ANSI compliant compiler which has been distributed as public domain software and can be found packaged with many free Unix clones (such as Linux). One of the questions I most frequently get asked (and one I asked myself when I first began) is "Can I compile this on my local machine and upload it to the server?" The answer to this is no. Unless you are using the same version of Unix or Linux as your ISP as your development machine, it will not work.

Operating systems are very different from each other, from the way their code compiles to the way it executes. There are many flavors of UNIX, and while a few are compatible, most you may come across are not. Wintel (Intel machines running windows) is very different from UNIX, and no executable you compile on it will run on a UNIX platform. This can only be achieved using some kind of cross-platform compiler (I have heard of these, but their price is generally forbidding enough to forget about taking that path). It all boils down to one simple fact. A CGI compiled on the platform is is to be used will always work better than one you try to compile elsewhere.

Obtaining the data from the server


Now, to get the data from the server you must know about environment variables. You must get the length of data to read from the server so that you can tell it how much to read. This information is provided in an environment variable called "CONTENT_LENGTH", and the following code excerpt will get the number contained in that variable:

main()
   {
   double contentlength;
   const char *len1 = getenv("CONTENT_LENGTH");
   contentlength=strtol(len1, NULL);
   }
That code excerpt will get the number of bytes to read from the server through the environment variable "CONTENT_LENGTH". It will place the number in a double variable called contentlength. Make sure it's at least a double variable because you might be reading a LOT of data, and so the number of bytes can get pretty big. Now you have to get the data from the server into one LONG string, the following code is a combination of the previous code and the code to read from the server. Just start reading from stdin until the length specified in "CONTENT_LENGTH" is reached. There is no need to tell stdin to point to where the data is, just start reading from it. The code goes as follows:

main()
   {
   char *endptr;
   int i;
   double contentlength;
   char buff[10000];
   const char *len1 = getenv("CONTENT_LENGTH");
   contentlength=strtol(len1, &endptr, 10);
   fread(buff, contentlength, 1, stdin);
   }
You now have the URL encoded data from the whole form in one variable, I named it buff[] (as in a data buffer). Now that it has stopped reading from the server, your CGI has all of the information the user just submitted to your form.

URL ENCODING


Now you must receive a crash-course in URL encoding. URL encoding is a way of making a single, consecutive string, out of all the data submitted to a form. There are NO spaces in URL encoding. Your form may be something like the following example of HTML code provided, please click "View Source" in Netscape or Internet Explorer to see the code behind the form data below.

Although rather lame, the system this runs on no longer allows me to run CGI so this won't work, but I'll leave in the sample anyway.

Var1 =
name =
address =
The user would simply input the information and click "Submit". You can erase the defaults and type your own stuff in, then click the submit button to show an example of this. Go ahed and play with it now, this is a working example. Much more fun than a book huh? The data provided to this script is as follows: the option chosen is Two, the name field has "Odysseus" and the address is "Calypso's island". Here is the URL encoded data from that:

Var1=Two&name=Odysseus&address=Calypso%27s+island

I used this code to write the CGI you used for the form above:

#include 
#include 
#include 
main()
   {
   char *endptr;
   int i;
   double contentlength;
   char buff[10000];
   char a,b;
   const char *len1 = getenv("CONTENT_LENGTH");
   contentlength=strtol(len1, &endptr, 10);
   fread(buff, contentlength, 1, stdin);
   printf("Content-type: text/html\n\n%s",buff);
   }
Ok, now you need a brief rundown of how URL encoding works. All of the variables provided in the form (named Var1, name, and address) are set equal to the data they will return, each variable separated by an & (ampersand). All spaces, as I have pointed out before, have been removed and replaced with +'s (plus symbols). Any extraneous ASCII data is converted to hex, as is the ' (apostrophe) in "Calypso's island". Here is a small chunk of code that will decode all of the hex and put the URL string, with the hex and spaces decoded into a variable called buff2[]. The string with the original URL encoded is called buff[]. Please take note of the comment "/* Prevent user from altering URL delimiter sequence */". This means that if a user types an & (ampersand) or an = (equals sign) into your program, it won't mess up the natural delimiting structure of the URL encoding scheme. This also means, on the downside, that you'll have to pass each separate variable string through another loop similar to this one to change the &'s and ='s back into their respective characters.

   for (x = 0, y = 0; x < strlen(buff); x++, y++)
      {
      switch (buff[x])
	 {
	 /* Convert all + chars to space chars */
	 case '+':
	 buff2[y] = ' ';
	 break;

	 /* Convert all %xy hex codes into ASCII chars */
	 case '%':

	 /* Copy the two bytes following the % */
	 strncpy(hexstr, &buff[x + 1], 2);

	 /* Skip over the hex */
	 x = x + 2;

	 /* Convert the hex to ASCII */
         /* Prevent user from altering URL delimiter sequence */
         if( ((strcmp(hexstr,"26")==0)) || ((strcmp(hexstr,"3D")==0)) )
            {
            buff2[y]='%';
            y++;
            strcpy(buff2,hexstr);
            y=y+2;
            break;
            }

	 buff2[y] = (char)strtol(hexstr, NULL, 16);
	 break;

	 /* Make an exact copy of anything else */
	 default:
	 buff2[y] = buff[x];
	 break;
	 }
      }
That's about all there is to URL encoding. Also note that if you decide you want a CheckBox on your form that if a user does not check it neither it's name, nor a tag indicating that it is unchecked are included in the URL encoded string. It will appear as if it doesn't exist at all. You must code around this fallacy (or stupidity in my own terms) and find your own solution to the problem.

Sending data to the console


The next thing you need to know is how to send data to the console (the web browser). This is useful in sending an error message to the user that tells them your CGI has had an error and to contact you about fixing it. For this particular example, a simple programming statement should suffice:

printf("Content-type: text/html\n\n This is a sample message.\n");

You must have "Content-type: text/html\n\n" INCLUDING the two \n's those are crucial in letting the server know you are about to send data to the user. You can also use HTML code in the message, like so:

printf("Content-type: text/html\n\n This would appear in Italics .");

Redirection


Now, one final thing about the coding of CGI. Redirection. If you output your data to an HTML file and want the user automatically to be booted to a page, you would use redirection. This simple statement will redirect the web browser to a hypothetical page called ~whoeveryouare/page.html:

printf("Location: ~whoeveryouare/page.html\n\n");

Again, the two \n's on the end are crucial in executing the command, don't forget them. Another little fallacy of CGI that I should report is that you must either redirect the user to another page, or print something to the console. The important point here is that the CGI must output SOMETHING. It doesn't matter if you print to their console a message like "Operation complete, please return to..." or if you simply redirect them back to your homepage, all that is important is that you output some data to the user. Failure to follow this simple principle may cause you hundreds of "Server Error 500" messages until you realize your error. This has happened to me on more than one occasion, bad habits are hard to break!

How to set it all up under UNIX


The last topic I must cover (before you go searching for something a little easier to understand) is setting it all up on a unix machine. First you must know how to compile the code. YOU MUST COMPILE THE CODE ON THE MACHINE YOU INTEND TO RUN IT ON!!! This is NOT perl, it is C and it must be compiled. It is generally three times faster for this very reason, the server does not have to interpret the code every single time it is used. It simply runs the CGI like any other program on the system (just like you run a program in DOS). If you've ever written in Qbasic, you know that it may take a while for the code to be parsed by the editor. This is the exact same concept. The command to compile a script goes as follows:

gcc -o youroutputcgi.cgi yourscript.c

That will take the source code you wrote (the ".c" or ".cpp" file) and compile it into a cgi program called "youroutputcgi.cgi", replace the name for the file with whatever aesthetic name pleases you. Just be sure it is a ".cgi" file. If it still doesn't seem to work, contact your ISP because he/she may have the system configured so that only one extention (like .CGI or .EXE) can be executed. Also, if you have a C/C++ ANSI compiler locally (on your own computer) then you may want to try compiling it before compiling on the UNIX server. You can use this to catch those stupid little errors in your program that make you want to hit your computer (or yourself) repeatedly with your fluffy stress bat. But you must still remember that the executable you make on your machine WILL NOT run on a unix machine, you have to compile it on the UNIX server.

You MUST already have any output files created, or file creation priveleges set for the directory so that the program can create it's own. Whichever you choose, one Unix command will do the trick. To open up full read, write, and execute access to a file or directory use:

CHMOD ugo=rwx yourfile.html
or
CHMOD ugo=rwx yourdirectory

The first will allow your output file (if you want to create an HTML document on the fly) full access, and the second will make a directory full access so you can create outputable files from inside the program. The ugo=rwx breakdown is as follows:

u=user - yourself, how much access do you want to your own file :)
g=group - all of the members of your group
o=other - internet surfers, those you intend the data to reach.

r=read - read access to a file allows a user to read what's in the file
w=write - write access is for files that must be written to (like an output file)
x=execute - you must set the execute status on your script to run it, like the following:

CHMOD ugo=x yourcgi.cgi

Give whatever access to the file or directory you deem neccessary, after all you can create a security breach if you open up all of your files to write access. Someone could overwrite YOUR code! Also you do not have to set the user, group, and other attributes at the same time, you can just set one or two, like so:

CHMOD o=x yourcgi.cgi
CHMOD uo=rw guestbook.html

The first one would obviously make the CGI you wrote available to users on the internet as an executable program, and the second would configure a file named "guestbook.html" as a read and write access file for yourself and internet users. The first one will tell the server that this file is a program instead of a text document. You MUST at least set this attribute on for the CGI itself. Well that's about all you need to know to get into the exciting world of CGI, and you can use a language you already know, C. The rest of what you do is up to you, as it is now your in your hands to develop what YOU want. Now start coding!



I will eventually put a cookie reference in here, but until then try this link: Until I get my page up you will have to live with that one. The actual cookie is simply a file stored on a user's computer that will keep any information you want so that when they return, your program can read in whatever data was stored. You can use a cookie for about anything. Cookies are returned through the environment variable "HTTP_COOKIE".
阅读(2824) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~