
For a more user-level view of WebGlimpse, see the WebGlimpse home page.
This document is an overview of the WebGlimpse code; for more specific details, see the comments in the code itself.
It is assumed that the reader knows:
If line searching is done (option lines=1), things become a bit more
complicated. webglimpse calls mfs (also in the cgi-bin directory),
and this in turn, does a bit of checking, and calls getfile.
(Earlier versions of mfs and getfile had a security problem that allowed
someone to sneak by them and grab any world-readable file on
the system. We believe we solved this problem.)
getfile copies the file line-by-line (and prepends
<pre> if it's not an html file) and highlights the line that matches the
query. It also inserts a 'name' anchor and the browser should then
jump to that line.
webglimpse-fullsearch
This script dynamically creates a large search box with many options.
It remembers the referring page using the "file" option is set
(i.e., if the code in .wgbox.html is written
correctly), and it will show the neighborhood if the option
shownh is set to 1 in the search string
(i.e., http://...?file=...&shownh=1).
This script does no call to glimpse itself -- it calls webglimpse.
Executables
The executables are by far the most complicated part of webglimpse.
These allow you to manipulate archives and create/modify the search
boxes.
confarc
This script prompts the user for the archive directory, tries to
read a configuration file (archive.cfg) from that directory (if there
is one, it reads in the default settings), then prompts the user
for several settings.
The information is then stored into the archive.cfg file.
If this is the first configuration of the archive (i.e., the
configuration file didn't exist before),
it copies over the files from the
distribution directory.
archive.cfg is pretty self-explained. You can change it at any time
and rerun wgreindex.
makecron
This generates the wgreindex script (see generated files
below). It takes the archive directory as the only argument, reads
the configuration file (archive.cfg)
there, and creates the wgreindex script there.
addsearch
addsearch is a very powerful program. It can operate in two modes:
insert and remove. It inserts or removes the .wgbox.html html 'snippet'
into all the files listed in the .wg_madenh file (see
generated files).
In insert mode, it reads the .wgbox.html file and inserts this file (with the appropriate variable substitutions) into the files before the first <!GH_SEARCH>, </body>, or </html> tag or EOF it sees (in that order). If it sees a <!GH_SEARCH> tag it replaces everything up to the <!GH_END> tag (including both tags). When the box is inserted, the tags are inserted appropriately as well (to delimit the box for removal or replacement).
In remove mode, it simply looks for the tags as before,
and replaces it with nothing (effectively removing all tags).
makenh
This is by far the most complicated script.
makenh is in charge of traversing the archive,
figuring out which pages to index, which pages to fetch from
remote servers, and how to create the neighborhoods.
The input to makenh is complicated as well.
See the diagram below to see which files are read
in and which files makenh produces.
wginstall
This is the perl installation script. The code is self-explanatory.
See the installation manual for more information.
wginstall.server
This is a 'sub-script' of wginstall, and is used to parse the
configuration file of the http daemon and get the settings for the
server. After it gets the necessary information, it writes the
information to the .wgsiteconf file in the $WEBGLIMPSE_HOME directory.
Change on 5/16... Udi rewrote html2txt in C (now html2txt.c in the /lib directory) and it works faster. By default, the glimpse_filters file in the /dist directory refers to html2txt in the /lib directory, so the filters will be used for html files.
(This diagram is slightly out of date -- gettitles and wgmapfile
are no longer used; the information is obtained directly from the
glimpse index.)