A guide to HTML and CGI scripts

By Mike Smith
M.A.Smith@brighton.ac.uk University of Brighton UK.

Contents

* Introduction to HTML
* Simple formatting tags
* Logical formatting tags
* Paragraph and line breaks
* Headings and rulers
* Insertion of in-line images
* A background image
* Creating a list of items
* Hyper text links
* URL
* Tables
* Single cell
* Row(s) of cells
* Heading to a column
* Spanning rows and columns
* Using HTML special characters
    * Literal copy of text
* Inserting an e-mail address
* Form filling
* Simple forms
* Multiple elements
* Multiple lines of input
* Check boxes
* Radio button
* Pop up list
* Reset values
* Image maps
* Hidden form element
* CGI scripts
* Decoding data sent to a CGI script
* Script to record users of web page
* Post vs. get
* Check list


Warning if you are not using a browser that supports tables
such as Netscape 1.1 or later then this page
will probably be very difficult to read.

Preface

WWW (World Wide Web) pages are written using HTML (HyperText Markup Language). HTML tags control in part the representation of the WWW page when viewed with a web browser. Examples of browsers used to view web pages include:

This document describes some of the version 3 features of HTML and extensions recognized by Netscape 1.1 or later browsers.

In describing HTML the aim has been to concentrate on the more widely used and useful features. However, in many ways this still remains a personal selection.

The following symbol is used in this document:

Try it

It is a hypertext link to a web page containing examples of HTML features under discussion. The reader can if they wish modify the text to try out their own ideas of style and formatting.

The best way of using this feature is to select "New window with this Link" (Available on Netscape). Then after trying out the features close the window to continue browsing the original document.




Index Introduction to HTML

HTML (HyperText Markup Language) is a markup language which consists of tags embedded in the text of a document. The browser reading the document interprets these markup tags to help format the document for subsequent display to a reader. However, many of the decisions about layout are made by the browser. Remember, web browsers are available for a wide variety of computer systems.

The browser thus displays the document with regard to features that the viewer selects either explicitly or implicitly. Factors affecting the layout and presentation include:

The browser, ignores extra spaces and new lines between words and markup tags when reading the document. Thus, the following three text fragments will be formatted identically.

Fragment 1 Fragment 2 Fragment 3
The browser will ignore
new lines and extra
spaces in the text.
The browser will
ignore new lines and
extra spaces in the text.
The browser will
ignore new lines and
extra    spaces in the text.

to produce the following:

The browser will ignore new lines and extra spaces in the text.

Try it

The markup language is made up of tags such as <B> which requests text that follows to be in bold type. This bolding is turned off by the inverse markup tag </B>.

In writing a tag, the case of the letters in the tag name is unimportant so that <B> and <b> represent the same tag.

The basic layout of an HTML document and the resultant information displayed by a browser such as netscape is shown below:

Displayed by browser HTML markup required
An example of a simple web page.
 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
 <HTML>

  <HEAD>
   <TITLE>Title of the web page </TITLE>
  </HEAD>

  <BODY>
  An example of a simple
  <B>web</B>
  page.
  </BODY>

 </HTML>
 

The tags used are:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML //EN">
Informs the browser that this is an HTML document. This is an SGML tag to identify the version of HTML being used, in this case just HTML. Version 3 browsers (e.g. Netscape) are happy about this pretense that this document is just plain HTML.
<HTML> </HTML>
Defines the extent of the HTML markup text
<HEAD> </HEAD>
Contains descriptions of the HTML page. This meta information is not displayed as part of the web page.
<TITLE> </TITLE>
Describes the title of the page. This description is usually displayed by the browser as the title of the window in which the web page is displayed. This information is also used by some search engines to compile an index of web pages.
<BODY> </BODY>
Delimits the body of the web page. In the body is the text to be displayed as well as HTML markup tags to hint at the format of the text.
<B> </B>
Displays the enclosed text in a bold typeface.

Note:
The tags can be written in upper-case or a mixture of upper- and lower-case or just lower-case. For example, <HTML>, <HtMl> and <html> all represent the same tag.




Index Simple formatting tags

The following are some of the simple formatting tags available in HTML.

Formatted text HTML markup required
The text is bolded.

 The <B>text</B> is bolded.
 
The text is italicized.

 The <I>text</I> is italicized.
 
The text is in a teletype font.

 The <TT>text</TT> is in a teletype font.
 
The text is 2 sizes larger. The size attribute may also be an absolute value in the range 1 .. 7.

 The <FONT SIZE=+2>text</FONT>
 is 2 sizes larger.
 The size attribute may also
 be an absolute value in the range 1 .. 7.
 
Use the e-mail address
M.A.Smith@brighton.ac.uk
to contact me.

 Use the e-mail address
 <ADDRESS>M.A.Smith@brighton.ac.uk</ADDRESS>
 to contact me.
 

Note:
How the formatting is turned off by the inverse of the HTML formatting tag.

Try it




Index Logical formatting tags

The following are some of the logical formatting tags in HTML. These should be used to describe a logical unit of your document.

Formatted text HTML markup required
The following is a citation.

 The following is 
<CITE>a citation.</CITE>
Represents computer code

 <CODE>
 Represents computer code
 </CODE>
 
A sequence of literal characters

 A sequence of
 <SAMP>literal characters</SAMP>
 
Note:
This is a blockquote of some text

 Note:<BLOCKQUOTE>This is a
 blockquote of some text
 <BLOCKQUOTE>
 
The following is a definition

 The following is 
<DFN>a definition</DFN>
The following text is emphasized.

 The following <EM>text</EM>
is emphasized.
keyboard characters.

 <KBD>keyboard characters</KBD>
 
The following text is strongly emphasized

 The following <STRONG>text</STRONG>
is strongly emphasized
The following name is a program variable

 The following <VAR>name</VAR>
is a program variable

Try it




Index Paragraph and line break

A new paragraph is started with the <P> tag, and may be optionally terminated with the inverse paragraph tag </P>. However, it is usual not to specify the inverse paragraph tag </P>

A line break is created by the <BR> tag, which has no inverse tag.

Formatted text HTML markup required
Last sentence of a paragraph

The first line of a new paragraph.


 Last sentence of a paragraph
 <P>
 The first line of a new paragraph.
 
A line of text.
On a new line.

 A line of text.
 <BR>
 On a new line.
 

Try it




Index Headings and rulers

A heading in the text is created with the <H1> tag. There are in fact six heading tags <H1> the largest to <H6> the smallest.

Formatted text HTML markup required

An H1 heading


 <H1>An H1 heading</H1>
 

An H3 heading


 <H3>An H3 heading</H3>
 
An H6 heading

 <H6>An H6 heading</H6>
 

A heading causes a line break before and after the heading text. For example:

Formatted text HTML markup required
Just before the heading.

An H4 heading

Just after the heading.
 Just before the heading.
 <H4>An H4 heading</H4>
 Just after the heading.
 

Try it




A horizontal ruler across the page can be created with the <HR> tag. For example:

Formatted text HTML markup required
End of a section
New section

 End of a section<HR>New section
 
Only 40% of width
New section
 Only 40% of width<HR WIDTH=40%>New section
 
The size of the ruler
New section

 The size of the ruler<HR SIZE=10>New section
 

Note:
The size of the ruler is by default specified in pixels.

Try it




Index Insertion of in-line images

As well as text, images may be inserted into the document. An image may be held in several formats, though the main ones used are GIF and JPEG. Due to limited bandwidth, JPEG with its high compression of picture data and its ability to represent 24 bit colour images is the best to use. Even though the JPEG compression is lossy the degradation of picture quality is not very noticeable to the human eye.

However, if the picture is very small or a graphical image then the GIF format may be the best.

Inserted image HTML markup required
masjpeg
 <IMG SRC="mas_fn50.jpg"
      ALT="mas" ALIGN=TOP>jpeg
 
A gif image with a transparent background.
 A gif image
 <IMG SRC="sdot.gif" ALIGN=TOP>
 with a transparent background
 
mas Text can be made to flow around an image on the left hand side by using the attribute ALIGN=RIGHT.
 <IMG SRC="mas_fn50.jpg"
      ALT="mas" ALIGN=RIGHT>
 Text can be made to flow around
 an image on the left hand side
 by using the attribute
 ALIGN=RIGHT.
 

Note:
There are many graphic editing programs which allow you to specify that one of the colours of a gif image is transparent when displayed by a browser. For example, LView Pro 1.B2 and Paint Shop Pro 3.11, allow the creation of a transparent colour in a gif image. However, in order to do this, the version used for the saved gif image must be 89a.

An image may be used as the displayed item for a hypertext link. For example: <A HREF="..."> <IMG SRC="..."> </A>.

The attributes of the <IMG> tag include:

ALT="mas"
Causes "mas" to be displayed if the browser can not display images or the display of images has been suppressed.
ALIGN=TOP
Causes any following text to be displayed aligned with the top of the picture.
ALIGN=BOTTOM.
Causes any following text to be displayed aligned with the bottom of the picture.
ALIGN=MIDDLE.
Causes any following text to be displayed aligned with the middle of the picture.
ALIGN=LEFT
Cause the image to be left aligned on the page. Text is flowed around the image on the right hand side.
ALIGN=RIGHT
Cause the image to be right aligned on the page. Text is flowed around the image on the left hand side.

Try it




Index Insertion of a background image

The document may be given a background image. The selected image is tiled across the document and then the text of the document is written over the image(s). It is thus important to choose a background image that will not be too distracting for the reader. The background image is achieved by adding a background attribute to the BODY markup tag. For example:

<BODY BACKGROUND="backgrd.jpg">




Index Creating a list of items

There are several types of list, an un-ordered list can be created by the following markup:

Formatted text HTML markup required
  • Item one of list
  • Item two of list

 <UL>
 <LI>Item one of list
 <LI>Item two of list
 </UL>
 

Try it

An ordered list is similar to an un-ordered list, except that each entry is consecutively numbered.

Formatted text HTML markup required
  1. Item one of list
  2. Item two of list

 <OL>
 <LI>Item one of list
 <LI>Item two of list
 </OL>
 

Try it

A definition list allows a list with a backward hanging indent to be created.

Formatted text HTML markup required
Definition tag.
Text of the definition list. Which may stretch over several lines.
Another definition tag.
Text of the definition list.

  <DL>
   <DT>
   Definition tag.
   <DD>
   Text of the definition list.
   Which may stretch over several
   lines.
   <DT>
   Another definition tag.
   <DD>
   Text of the definition list.
  </DL>
 

Note:
It is a common practice for the definition tag with an image or even have it prefixed with an image.

Try it




Index Hyper text links

A hyper text link allows a browser of the document to navigate either: to a new point in the document or to navigate to a different document. A named point in a document is specified with an anchor tag which has the attribute NAME. For example:

Formatted text HTML markup required
Here <A NAME="marker"> Here </A>

Note:
There does not need to be any text to visibly name the anchor point.

To effect a transfer to a named anchor point, the HREF form of the anchor tag is used. For example:

Formatted text HTML markup required
Transfer to anchor <A HREF="#marker"> Transfer to anchor </A>

The # before the name of the hypertext link tells the browser that the link is to a named point in a document. As no document name is specified before the # the hypertext link is to a point in the current document. It is usual for the browser to visibly highlight the hypertext link.

Hypertext links may also link to other documents, in which case the HREF component names the document. If the file is held on another machine then a URL (Uniform Resource Locator) is used to describe the location of the document. For example:

Hyper text link to the file
file.html
Hyper text link to the file
file.html held on another machine using an URL
<A HREF="file.html"> Name </A> <A HREF="http://host/file.html"> Name </A>

To go to a named point in a file the format of the anchor is:

Hyper text link to the file
file.html at point mark
Hyper text link to the file
file.html at point mark held on another machine using an URL
<A HREF="file.html#mark"> Name </A> <A HREF="http://host/file.html#mark"> Name </A>

Try it




Index URL (Uniform Resource Locator)

The URL is used to specify the location of a file held on a remote machine. This is composed of several distinct components. For example, the URL http://host/file.html is composed of the following components.

http
The protocol that is to be used to access the file. In this case the HyperText Transfer Protocol. There are other protocols, but they are used infrequently.
host
The name of the machine, this can be either a symbol name such as snowwhite.it.brighton.ac.uk or a numeric IP (Internet Protocol) address such as 193.200.1.1.
file.html
The path name of the file to which the hypertext link is to be made. This is relative to the base directory in which web pages are held. The location of this directory htdocs is defined by the person who has set up the web server.

Unix based web servers allow a convenient shortcut to accessing files placed in the directory public_html in the user's home directory. For example, the file home.html placed in the directory public_html in mas's home directory on the server machine snowwhite.it.brighton.ac.uk, can be accessed with the URL http://snowwhite.it.brighton.ac.uk/~mas/home.html

In creating the URL the characters space, =, +, <, >, %, ", /, and ? should not be used. If you do need to include these characters then represent them by the % symbol followed by the hexadecimal value of the character. The space character however, can also be represented by the character +. For example, http://snowwhite.it.brighton.ac.uk/%25.html represents the URL http://snowwhite.it.brighton.ac.uk/%.html.

Other access protocols can be specified by an URL. For example, to access files via the FTP (File Transfer Protocol) an URL of the form:

ftp://ftp.brighton.ac.uk/pub/mas/ada95

is used. This specifies that a link to the file/directory mas/ada95 on the machine ftp.brighton.ac.uk should be made. If mas/ada95 is a directory the viewer is presented with a list of files in the directory, otherwise the file will be displayed/transferred to you.

The full specification for an URL is:

Protocol Format Notes
HTTP http://host[:port]/path -
FTP ftp://[username[:password]@host/path If no username is specified the user anonymous is used.
Gopher gopher://host[:port]/[type[item]] -

Note:
In describing the format of an URL items enclosed in [ ]'s are optional.




Index Tables

Index Single cell

A table is created using the <TABLE> markup tag. The simplest table consists of a single data cell. The markup <TD> defines the start of a table data cell. Netscape requires a termination tag </TD>.

The status of the tag </TD> is somewhat unclear. Netscape 1.1 requires the end of table data cell tag </TD> to terminate the cell in certain circumstances. Other browsers sense the end of the cell by the </TABLE> or other formatting tag. The tag </TD> is not defined in the HTML specification. As unrecognized tags are ignored by a browser putting the tag in does no harm.

Formatted text HTML markup required
Text in a table

 <TABLE BORDER CELLPADDING=2>
 <TD>
  Text in a table
 </TD>
 </TABLE>
 

The following extra components can be added to a table tag:

UNITS=
Either en, relative or pixels. This defines the units that will be used in defining other attributes. The default is pixels.
BORDER
Specifies that a border is to be placed around the table cells. The width of the border is optionally specified with BORDER=n.
CELLPADDING
Specifies the gap to be placed around the table contents.

If the text in the data cell is to be left aligned the additional component ALIGN=LEFT can be added to the <TD> tag. For example:

<TD ALIGN=LEFT>

Other alignments are ALIGN=RIGHT, ALIGN=CENTER.

Try it




Index Row(s) of cells

A row of cells can be created by repeating the <TD> tag.

Formatted text HTML markup required
Data cell 1 Data cell 2

 <TABLE BORDER CELLPADDING=2>
  <TD> Data cell 1 </TD>
  <TD> Data cell 2 </TD>
 </TABLE>
 

Try it




To form a table of many rows, the markup tag <TR> is inserted where each new row in the table starts.

Formatted text HTML markup required
Data cell 1 Data cell 2
Data cell 3 Data cell 4

 <TABLE BORDER CELLPADDING=2>
  <TD> Data cell 1 </TD>
  <TD> Data cell 2 </TD>
 <TR>
  <TD> Data cell 3 </TD>
  <TD> Data cell 4 </TD>
 </TABLE>
 

Try it




Index Heading to a column

The tag <TH> may be used instead of <TD> if the cell is a header to a column of cells. For example:

Formatted text HTML markup required
Mnemonic Expansion
HTML Hyper Text
Markup Language

 <TABLE BORDER CELLPADDING=2>
  <TH ALIGN=LEFT> Mnemonic</TH>
  <TH ALIGN=LEFT> Expansion </TH>
 <TR>
  <TD> HTML</TD>
  <TD> Hyper Text<BR>Markup Language</TD>
 </TABLE>
 




Index Spanning rows and columns

The elements ROWSPAN and COLSPAN of the HTML tags <TD> and <TH> are used to form data cells which span more than one row or column. For example:

Formatted text HTML markup required
Language Encapsulation
Ada 95 Using Class
C++ Package

 <TABLE BORDER CELLPADDING=2>
  <TH ALIGN=LEFT>Language</TH>
  <TH ALIGN=LEFT COLSPAN=2>Encapsulation</TH>
 <TR>
  <TD> Ada 95</TD>
  <TD ROWSPAN=2>Using</TD>
  <TD> Class</TD>
 <TR>
  <TD> C++</TD>
  <TD> Package</TD>
 </TABLE>
 

Try it




Index Using HTML special characters

The markup language uses the character < to start a markup tag. The consequence of this is that < can not be used to represent the less than character directly in a web page. The HTML markup languages defines an escape sequences of characters to represent such special characters.

The following are some of the character sequences used to represent characters that have a special meaning in the HTML language.

Character Represent by sequence Character Represent by sequence
< &lt; > &gt;
& &amp; " &quot;

Thus to include <BODY> as part of the text of a document the sequence &lt;BODY&gt; can be written.

Characters not in the normal ASCII character set are also represented by an escape sequence. This is so that they may be typed on computer systems which do not directly support the input of such characters. For example:

Character Represent by sequence Character Represent by sequence
à &agrave; ç &ccedil;




Index Literal copy of text

Text can be copied literally by using the <PRE> tag. However, any HTML tags in the text will be processed. For example:

Formatted text HTML markup required

    *
   ***
  *****
 

 <PRE>
    *
   *<B>*<B>*
  *****
 </PRE>
 

The copied text is presented using a non proportional font.

Try it




Index Inserting an e-mail address

An e-mail address may be inserted into the document using a special form of the anchor markup tag. A user selecting this will be presented with a form on which to prepare their message for eventual sending in this case to the e-mail address:

M.A.Smith@brighton.ac.uk

Formatted text HTML markup required
Mail me

 <A HREF="mailto:M.A.Smith@brighton.ac.uk">Mail me</A>
 

Of course your browser has to support this facility.




Index Form filling

A web page can request input from the user who is browsing the page. After the user has finishing filling in the form, the entered data is sent to a CGI (Common Gateway Interface) script for processing. The CGI script returns as its result a text stream representing a web page. This web page contains all the normal text plus markup tags of a conventual web page. The only difference is that it is prefixed with the text:

Content-type: text/html

A form is introduced by the tag <FORM> and terminated by the inverse tag </FORM>. The attributes of the <FORM> tag include:

ACTION="http://host/cgi-bin/script_name"
After the form has been filled in, the entered data is sent to the named CGI script for processing.

The script is confined to being in the cgi-bin directory or nominee. The location of the cgi-bin directory is defined by the web administrator.

Index Simple forms

A form to request the user to enter text which is to be sent to the CGI script mas_form is shown below. This is introduced by the <INPUT> tag.

Generated form HTML markup required

 <FORM ACTION="http://host/cgi-bin/mas_form">
 <INPUT TYPE="text" NAME="name"
        SIZE=20 VALUE="Your name">
 </FORM>
 

To activate the CGI program enter textual information into the text box and then press "Enter" on the keyboard. The input data is sent to the CGI script in the form:

name=Your+name

The attributes of the <INPUT> tag include:

NAME="name"
Names the argument which is sent to the CGI script
VALUE="Your name"
The value of the argument.
SIZE=20
The width of the input area.

Try it

In sending the data to the CGI script there are various character mappings of the input data to ease later processing. For example:

Input character Sent to CGI script Input character Sent to CGI script
space + % %25
= %3D & %38
Line Feed %0A Carriage Return %0D

Note:
How some input characters are represented by their hexadecimal representation. Which is indicated by the sequence %HH, where H is a hexadecimal digit.

A form to request a password or any secret text to be entered is:

Generated form HTML markup required
Enter PIN Number

 <FORM ACTION="http://host/cgi-bin/mas_form">
 Enter PIN Number<BR>
 <INPUT TYPE="password" NAME="Password"
        SIZE=20 VALUE="">
 </FORM>
 

Warning: This is not secure, unless the data is encrypted before being sent over the internet. Even if it is encrypted, the encryption may still be broken.

Try it




Index Multiple elements

More specialized forms can be designed, which contain multiple elements. In these forms an additional tag <INPUT TYPE="submit"> is used to cause the submission of the input data to the CGI script. The attribute TYPE="submit" identifies the type of input action. For example, the <FORM> tag encloses an <INPUT> tag of the form:

<INPUT TYPE="submit" NAME="button" VALUE="Send">

Which when pressed will send in addition to any information entered in the form the additional message button=Send. There may be several of these input tags with in a form. The VALUE attribute identifies which <INPUT> tag has been selected.

For example, the form below is composed of several buttons.

Generated form HTML markup required

 <FORM ACTION="http://host/cgi-bin/mas/mas_form">
 <INPUT TYPE="submit" NAME="button" VALUE=" A ">
 <INPUT TYPE="submit" NAME="button" VALUE=" B ">
 </FORM>
 

Which when a button is pressed will send a message of the form:

button=+A+

to the CGI script if button "A" is pressed.

Try it




Index Multiple lines of input text

A form to request the user to input multiple lines of input uses the <TEXTAREA> tag. So that the user can enter multiple lines the <INPUT TYPE="submit"> tag is used to signal when the form has been completed. For example:

Generated form HTML markup required


 <FORM ACTION="http://host/cgi-bin/mas_form">
 <TEXTAREA NAME="feedback" ROWS=5 COLS=20>
 My thoughts so far are:
 </TEXTAREA>
 <BR>
 <INPUT TYPE="submit" NAME="button"
        VALUE="Send">
 </FORM>
 

Note:
As there may be many input lines there is an inverse </TEXTAREA> tag to signify the end of the initial value.

The attributes of the <TEXTAREA> tag include:

ROWS=n
Defines the number of rows of the input area.
COLS=n
Defines the number of columns of the input area.

When there are several elements in a form the data sent to the CGI script is composed of the individual elements concatenated together with an &. For example, when the Send button is pressed and no changes have been made to the form data then the following information will be sent to the CGI script:

feedback=My+thoughts+so+far&button=Send

Try it




Index Radio button

A form to request the user to select from one of a series of radio buttons uses the <INPUT> tag with an attribute of TYPE="radio". An example of a radio button input form to select the sex of a person is shown below:

Generated form HTML markup required
Male
Female

 <FORM ACTION="http://host/cgi-bin/mas_form">
 <INPUT TYPE="radio" NAME="sex" VALUE="M">Male<BR>
 <INPUT TYPE="radio" NAME="sex" VALUE="W">Female<BR>
 <INPUT TYPE="submit" NAME="button" VALUE="Send">
 </FORM>
 

The optional attribute CHECKED can be added to one of the <INPUT> radio tags to set a default selection. For example:

Generated form HTML markup required
Age
<18
18-65
65+

 <FORM ACTION="http://host/cgi-bin/mas_form">
 Age<BR>
 <INPUT TYPE="radio" NAME="age" VALUE="a">&lt;18<BR>
 <INPUT TYPE="radio" NAME="age" VALUE="b"
        CHECKED>18-65<BR>
 <INPUT TYPE="radio" NAME="age" VALUE="c">65+<BR>
 <INPUT TYPE="submit" NAME="button" VALUE="Send">
 </FORM>
 

Try it




Index Check boxes

A form to allow the user to select one or more check boxes uses the <INPUT> tag with an attribute of TYPE="checkbox". An example of a checkbox form is shown below:

Generated form HTML markup required
Use
Ada 95
C++
COBOL

 <FORM ACTION="http://host/cgi-bin/mas_form">
 Use<BR>
 <INPUT TYPE="checkbox" NAME="use"
        VALUE="Ada 95" CHECKED>Ada 95<BR>
 <INPUT TYPE="checkbox" NAME="use"
        VALUE="C++" CHECKED>C++<BR>
 <INPUT TYPE="checkbox" NAME="use"
        VALUE="COBOL">COBOL<BR>
 <INPUT TYPE="submit" NAME="button"
        VALUE="Send">
 </FORM>
 

The following extra attributes can be added to the input tag for a check box.

CHECKED
The initial state for this check box is that the box is checked.

Try it




Index Pop up list

A form to allow the user to select an item from a pop-up list uses the <SELECT> tag. An example of a pop-up list is shown below:

Generated form HTML markup required
Media used is


 <FORM ACTION="http://host/cgi-bin/mas_form">
 Media used is<BR>
 <SELECT NAME="Media">
         <OPTION SELECTED> Disk
         <OPTION> Floppy disk
         <OPTION> DAT tape
 </SELECT>
 <BR>
 <INPUT TYPE="submit" NAME="button"
        VALUE="Send">
 </FORM>
 

The <SELECT> tag encloses the tag:

<OPTION>
Which names a value in the pop-up list.

The OPTION tag may have an attribute of SELECTED to define the initial value of the pop-up list.

Try it




Index Reset values

The <INPUT> tag with an attribute of TYPE="reset" is used to reset the values in a form back to their default value. For example, the following form may be reset to its initial values by pressing the "reset" button.

Generated form HTML markup required
I like to drink:
Coffee
Tea


 <FORM ACTION="http://host/cgi-bin/mas_form">
 I like to drink:<BR>
 <INPUT TYPE="checkbox" NAME="Like"
        VALUE="Coffee" >Coffee<BR>
 <INPUT TYPE="checkbox" NAME="Like"
        VALUE="Tea">Tea<BR>
 <INPUT TYPE="reset" VALUE="Reset"><BR>
 <INPUT TYPE="submit" NAME="button"
        VALUE="Send">
 </FORM>
 

Try it




Index Image maps

An image map is an image that when clicked on sends all the data that has been entered into the form plus the x,y co-ordinates of the position clicked on to a CGI script. For example:

Generated form HTML markup required

 <FORM ACTION="http://host/cgi-bin/mas_form">
 <INPUT NAME="image" TYPE="IMAGE"
        SRC="../../pic/mas_fn50.jpg" ALIGN=TOP>
 </FORM>
 

When a point on the image is selected (clicked) the position is sent to a CGI script.

A common use of an image map is to create customized buttons, or regions in an image that allow a user to navigate to a new document.

Try it




Index Hidden field in a form

The protocol used to communicate with a CGI script is stateless, that is no information is remembered about the transaction. To preserve state information for later recovery a hidden field in a form can be created which can hold state information.

For example, in playing the game of noughts and crosses a CGI script is used to respond with the computers move as a web page. In this generated web page is a hidden field which contains state information about the current position of the game. When a player responds with a new move, the state information is used by the CGI script to reconstruct the last position.

To make this process secure the state information would not be a record of the moves made, but an encrypted index to where the current position of the game was held on the server.

Generated form HTML markup required
Move

 <FORM ACTION="http://host/cgi-bin/mas_form">
 Move<BR>
 <INPUT TYPE="hidden" NAME="game" VALUE="P123456">
 <INPUT TYPE="text" NAME="move" SIZE=2>
 </FORM>
 

Of course several of these HTML form tags may be combined together to produce a form that requests several pieces of data.

Try it




Index CGI scripts

A CGI script file can be an executable program or a JCL command file. The script is executed when an anchor tag <A ... > or an image tag <IMG ...> refers to the script file rather than a normal file. The determination of whether this is a CGI script file or just an HTML file is made on the physical placement of the file on the server. Usually this placement is in the cgi-bin directory. However the exact location of this directory on the server machine is determined by the web administrator. For example, an anchor tag to execute the CGI script dynamic_page on the server machine is:

<A HREF="http://machine/cgi-bin/dynamic_page">Dynamic page</A>

When the web server process a request to fetch a file, if the requested file is in the servers nominated cgi-bin directory then as long as this file is marked as being executable the script will be run on the server. If the file is not executable then an error will be reported.

The script eventually returns an HTML page or image to be displayed as the result of its execution. When a CGI script file executes it may access environment variables to discover additional information about the process that it is to perform. The first line of the returned data must be:

Type of returned data Text
An HTML page Content-type: text/html
A gif image Content-type: image/gif

A simple CGI script on a unix system to return a list of the current users who are logged onto that system is:


#!/bin/sh
cat <<+END+
Content-type: text/html


<HTML>  
<HEAD>  
</HEAD> 
<BODY>  
<H2>Users logged on the server are:</H2>
<PRE>
+END+
who
cat <<+END+
</PRE>
</BODY>
</HTML>
+END+

Note:
The JCL (Job Control Language) command cat << +END+ lists the following text up to but not including +END+ onto the standard output.
The JCL command who lists the current users who are logged onto the system.
Allowing users to create their own CGI scripts can lead to security problems on the server.
The major environment variables that can be accessed by the CGI script when it executes are:

Environment variable Contains
QUERY_STRING Data sent to the CGI script, by its caller. This may be the output from a form, or other dynamically or statically generated data.
REMOTE_ADDR The internet address of the host machine making the request.

A C++ program mas_env.cpp when run prints many of the environment variables available to a CGI script.

CGI scripts can be written in any language. For example, a CGI script to return the contents of the environment variable QUERY_STRING can be written in Ada 95.




Index Decoding data sent to a CGI script

When a form is used, the information collected in the form is sent to the CGI script for processing. This information is placed in the environment variable QUERY_STRING.

To pass information explicitly to the environment variable QUERY_STRING a modified form of an anchor tag is used. In this modified anchor tag, the data to be sent to the environment variable QUERY_STRING is appended after the URL which denotes the CGI script. The character ? is used to separate the URL denoting the CGI script and the data that is to be sent to the script. For example:

<A HREF="/cgi-bin/script?name=Your+name&action=find"> Link </A>

The data "name=Your+name&action=find" is placed in the environment variable QUERY_STRING and the cgi script script executed.

A class written in C++ composed of the specification parse.h and implementation parse.cpp is used to extract the individual components in the QUERY_STRING . The members of this class are:

Method Responsibility
Parse Set the string that will be parsed.
set Set a different string to be parsed.
get_item Return the string associated with the keyword passed as a parameter. If no data return NULL.
get_item_n Return the string associated with the keyword passed as a parameter. If no data then return the null string.

When using the member functions get_item and get_item_n the optional second parameter specifies which occurrence of the string associated with a keyword to return. This is to allow the recovery of information attached to identical keywords. In addition the returned string will have had the following substitutions made on it.

Note:
The definition of NO_MAP will cause the code for ~username processing to be not included. This is so that the code can be compiled for machines, which do not support the system function map_uname defined in the header file pwd.h.

For example, if the QUERY_STRING contained:

tag=one&name=mike&action=%2B10%25&tag=two&log=~mas/log&tag=three

Then the following program when compiled and run:

enum bool { false, true };

#include <iostream.h>
#include <stdlib.h>

#include "parse.h"
#include "parse.cpp"

void main()
{
  char *query_str = getenv("QUERY_STRING");

  Parse list( query_str );

  cout << "name  = " << list.get_item_n( "name" ) << "\n";
  cout << "action= " << list.get_item_n( "action" ) << "\n";
  cout << "log   = " << list.get_item_n( "log", 1, true ) << "\n";
  for ( int i=1; i<=4; i++ )
  {
    cout << "tag  (" << i << ") = ";
    cout << list.get_item_n( "tag" , i ) << "\n";
  }
}

would produce the following output:

name  = mike
action= +10%
log   = /usr/staff/mas/log
tag  (1) = one
tag  (2) = two
tag  (3) = three
tag  (4) = 




Index Script to record users of web page

By using an URL denoting a CGI script in an <IMG> tag additional processing can be performed before the image is delivered. This additional processing records details about the current viewer of the web page. Additional information is sent to the CGI script to specify the exact details of the action to take. For example:

Formatted text HTML markup required

<IMG SRC="/cgi-bin/mas_rec?page=HTML&file=log&img=dot.gif"
     ALT="Record not made">

The CGI script mas_rec written in C++ is sent the following information:

Parameter name Specifies
file The name of the file in which the usage information will be appended.
page A name for the page that will recorded in the log.
img The image that will be loaded.

Of course for this to work, the viewer of the page must be viewing and hence loading images. Several reasons why images may not be loaded include:




Index Post vs. Get

So far the method used to send information to the CGI script has been GET. When the method GET is used the data sent is placed in the environment variable QUERY_STRING for the CGI script to process.

An alternative method is to use POST. When the method POST is used the data is sent by a separate stream and becomes the standard input to the CGI script. The method used is specified on the <FORM ..> tag using the attribute METHOD="get" or METHOD="post". The default method is GET.

For example:

Generated form HTML markup required


 <FORM METHOD="get"
       ACTION="http://host/cgi-bin/mas_form">
 <INPUT TYPE="text" NAME="name"
        SIZE=20 VALUE="Try it (get)">
 </FORM>
 


 <FORM METHOD="post"
       ACTION="http://host/cgi-bin/mas_form">
 <INPUT TYPE="text" NAME="name"
        SIZE=20 VALUE="Try it (post)">
 </FORM>
 

When using the POST attribute, the following environment variables are set:

Environment variable Contains
CONTENT_LENGTH The length of the data sent via the standard input to the CGI program.
CONTENT_TYPE The MIME type of the data.

Try it




Index Check list

It is important to make sure that the CGI script is:


Warning these web pages use HTML table tags extensively. Netscape 1.1 or later supports this facility.
The material in these web page(s) is © M.A.Smith - August 1995 last modified 10 December 95