
WebGlimpse Installation
After you download the tar file, uncompressed it, and untar'ed it,
you have a directory called webglimpse.
To install WebGlimpse run the wginstall script:
./wginstall [osf|sunos|solaris]
(Most of WebGlimpse is written in Perl.
There are only two relatively small C programs in the distribution,
and we expect them to be ported easily to other platforms.)
Wginstall will ask you a few questions, and will store the results
in the config file. If you stop and run wginstall again, it will
remember the previous settings.
Some notes about the questions:
-
The 'home' directory is the one where all related programs
will be kept. The current directory is the default.
-
The cgi-bin directory created as part of the untar'ing is fine
to use as the default cgi-bin directory. If for some reason you
want to move them to another directory you need to create it first.
-
The script alias (relative url path) is the name as appeared in
your httpd ScriptAlias config. You probably want to add a new name
for WebGlimpse and alias it to the cgi-bin directory given above.
-
WebGlimpse searches for the available versions of perl, glimpse,
glimpseindex, cat, and gunzip.
After answering the question, wginstall will compile the two
C programs (ignore the gcc warnings if you can) and will call
wginstall.server to set up the server side.
You are now ready to construct your first archive.
Building and indexing an Archive
Run confarc.
Again, you will be asked questions:
-
The directory where the index (and some other files) will reside.
-
You need to give the part of the URL that gets to the directory above.
Do not end it with / (WebGlimpse will add the right file names
to get to stuff in that directory.)
-
The title of the archive
-
How to define the neighborhoods.
WebGlimpse supports search by neighborhoods. It is one of its
distinguishing features. At the moment, we support two types of
definitions for neighborhoods:
- by number of followed links from current page (default is 2).
- by considering all files in any nested directory below the current one
as part of the neighborhood (similar to GlimpseHTTP).
We expect to provide more flexible ways to define neighborhoods.
(Feel free to suggest!)
-
WebGlimpse will collect automatically remote pages that are linked
from pages it scans during the indexing process. It will not
follow remote links recursively! In other words, no matter
how the neighborhoods are defined, WebGlimpse will collect only
remote links that are explicitly listed.
If you prefer not to follow remote links this is the time to say so.
-
confarc now constructs some files. The most important ones for
you are .wgfilter-index and .wgfilter-box.
The first file (-index) provides a way to exclude files from
being index. The rules are similar to the way Harvest excludes
its collection. The default file is pretty straightforward.
It works by pattern matching to the file names.
The second file (.wgfilter-box) provides a way to exclude some html files
from adding the WebGlimpse search box. Same rules.
(Obviously, if a file is not to be indexed, no search box will be
added to it.)
You probably want to stop confarc at this point, look at the filter
files and set them before continuing with the indexing.
-
Finally comes the moment you've been waiting for: You now enter
the URLs from which WebGlimpse will do the collection and
indexing. WebGlimpse will follow all links (recursively) from
the ones you give (except for remote links).
-
When you are done entering the URLs you can now start the indexing
process.
- Good luck.
confarc uses two main scripts: makenh (which computes neighborhoods
to index) and addsearch (which adds the appropriate search boxes).
Each of them can be run separately now or later.
To remove any of the added search boxes, run rmarc.

Written by Udi Manber
glimpse@cs.arizona.edu