Security warning: Please don't use any version older than 1.11!

Installation and configuration of gzip_cnc

Configuration

gzip_cnc's behaviour is specified by a number of parameters. With a little luck the predefined default values may already suffice to run the program, but more likely these values have to be adapted to the realities of your web space.

For this adaption two methods are available:

The second method has the advantage that your don't have to repeat previous modifications when switching to some future program version; it may also be more comfortable for some users not having to apply changes to some foreign program source code (running the risk of inserting errors by mistake). But configuration via Environment variables requires additional entitlements that cannot be taken for granted in each web server configuration; it may well be that this is no available option for some users.

The following sections of this document describe

At the end of the chapter you will find a description how a configuration using Environment variables should look like and which requirements have to be met for it.

Specifying the compression level

my $gzip_quality         = 9;

This parameter specifies the compression level to be used, thus selecting the compression quality.

The gzip algorithm (not matter by which tool it has been implemented) allows specifying a value from 0 to 9 whereas

As each file content will be compressed only once by gzip_cnc and subsequently just taken from its cache tree we should be able to afford the best compression level available here.
(Compare this to mod_gzip which is happy with level 6 to keep the resulting CPU load for the server within certain limits.)

Environment variable: GZIP_CNC_QUALITY.

Specifying the path for the compression program gzip

If the module Compress::Zlib is installed on the server then this part of the configuration has no effect - gzip_cnc tries to use this Perl module with priority and only as a substitute will it fall back to using an external program.
(gzip_cnc will inform the user during its self test function about the compressing tool selected on this server.)

But if indeed a system command (for UNIX) rsp. separate program (for Windows) is required for compression then gzip_cnc does not rely on the Apache web server being configured to supply a sufficient list of program directories via the PATH Environment variable but it insists on directly accessing this compression program via its fully qualified path name.

my $gzip_path            = '/usr/bin/gzip';

As these path names are reasonably standardized in UNIX systems one can be hopeful that the preselected value might work without change. If gzip_cnc would refuse to compress anything but write corresponding messages into its log file instead then the required program may well be missing (under the path name specified).

gzip_cnc has preselected a path that looks reasonable for a gzip command on UNIX servers but cannot guarantee that your server has been installed suitably. If you have dialog access (telnet, ssh) to the server for your web space you may open a shell (of your choice) there and check under which path name this shell would locate the command gzip, by executing the command which gzip.

Under Windows the use of the popular ActivePerl Perl interpreter is recommended which already ships the Compress::Zlib module. But even under windows you may use a separate program (gzip.exe) for compressing. (In fact gzip_cnc has been developed and tested this way on a Windows platform.)

Environment variable: GZIP_CNC_PROGRAM.

Specifying the root directory for the cache tree

my $cache_directory      = '';

This parameter specifies the fully qualified path name of the directory inside of which gzip_cnc will build up its cache directory tree for compressed versions of all files having been requested.

If an empty string has been specified here then gzip_cnc will automatically use the directory .gzip_cnc_cache inside the DOCUMENT_ROOT of the domain in question. In this case the cache files reside within the URL tree as well and therefore are directly accessible via URL but can be handled correctly only in case of a proper configuration of the web server and furthermore only by sufficiently capable browsers. Therefore it is recommended to create the cache directory outside the URL space.

gzip_cnc itself will create all necessary directories for the cache on demand. If this fails (e. g. in case of lacking write access) then gzip_cnc will serve the requested data in uncompressed form and notes this fact inside its log file using a corresponding status code.

Environment variable: GZIP_CNC_CACHE.

Specifying gzip_cnc's own log file

my $logfile_path         = '';

This parameter specifies the fully qualified path name of the log file to be created by gzip_cnc.

If an empty string has been specified here then gzip_cnc will not create a log file.

If a path name has been specified then gzip_cnc attempts to create a log file there (and all directories required for this path name as well if necessary) rsp. extend an already existing file; if this fails (e. g. in case of lacking write access) then no log messages will be created (without any explicit warning for the user).

Environment variable: GZIP_CNC_LOGFILE.

Specifying the error document to be used

my $error404_handler     = '';

This parameter specifies the error handling page to be used by gzip_cnc in case of accessing a non-existant file.

An Apache handler is activated when a large part of URL access processing is completed (e. g. at this moment URL translations of all kind are already performed: URL rewriting, directory defaults, Content Negotiation, ...). But by no means the Apache server has checked whether the actually requested file exists at all. So this test has to be performed by gzip_cnc itself.

If in fact the requested file doesn't exist then normally the Apache server would evaluate an Administrator's error handling page definition and serve its content to the client. But note that gzip_cnc isn't an Apache module and cannot invoke an internal HTTP subrequest in case your error document would be a CGI program or a SSI document or maybe selected dynamically via Content Negotiation - and it cannot provide the information about which page has been originally requested the same way (as Environment variable) as Apache itself would do. "Houston, we have a problem."

gzip_cnc cannot give you all you might want - but it lets you choose what is most important for you:

Environment variable: GZIP_CNC_404_HANDLER.

Specifying the MIME type to be served

my $mime_type            = 'text/html';

This parameter specifies the MIME type to be sent by gzip_cnc as HTTP header for the responses being served.

Being an Apache handler, gzip_cnc has no access to the Apache configuration that contains a mapping of the requested file's name to a MIME type. Thus an appropriate value has to defined by the program itself.

For future program versions one could imagine to provide some table mapping file name patterns to MIME types. But currently the corresponding effort doesn't seem justifyable:

Environment variable: GZIP_CNC_MIMETYPE.

Sending gzip_cnc's own HTTP headers

my $send_own_headers      = 1;

This parameter selects whether gzip_cnc is entitled to send own HTTP headers additionally to the other informations. These HTTP headers are:

If this value has been set to 0 then these two HTTP headers are not sent. And as both Path headers might be helpful for understanding the access procedure but would be telling about your server's configuration these are being sent only if gzip_cnc has its self test mode enabled as well.

Environment variable: GZIP_CNC_OWNHEADERS.

Activating the self test mode

my $enable_self_test_mode = 1;

This parameter selects whether the self test mode of gzip_cnc is available.

During the installation and testing phase it is reasonable to use this feature to get hints about the validity of the configured parameter values. But the output in self test mode reveals actual absolute path names on the server and therefore may be deactivated if advisable due to security considerations.

Environment variable: GZIP_CNC_SELFTEST.

Submitting a browser cache validity interval

my $cache_expire_seconds  = 86400;

This parameter specifies how many seconds a browser is allowed to keep the content of the served page in its cache without testing the validity of the content by check-back at the HTTP server.

Actually browser caching is not a part of gzip_cnc's responsibility. On the other hand no HTTP data transfer can be compressed more effectively than one that hasn't even been requested by the browser at all. And especially for small and frequently used pages (like navigation elements in case of using framesets) it may well help a lot if the browser has been configured as to trust the validity interval transmitted by the HTTP headers

instead of requesting the same page from the server again and again.

On the other hand, if a very large value has been specified here and URLs did change then files no longer in use have to be kept available at least for the defined validity period, as the browser caches may still contain references to these files.
But if subsequent changes can be ruled out for the content of existing documents then even a very large value may be reasonable for this parameter.

If this value has been set to 0 then these two HTTP headers are not sent.

Environment variable: GZIP_CNC_EXPIRES.

Configuration via Environment variables

Instead of changing the program's source code the parameter values may as well be set via Environment variables in the web server configuration.

For this the following prerequisites have to be granted:

For setting an Environment variable the Apache directive SetEnv is available.

It is permissible to set these Environment variables in the same .htaccess file that implements the gzip_cnc activation as well. As an alternative it would suffice to specify these Environment variables only for the URL of the gzip_cnc program; this would reduce the risk to undesirably influence the behaviour of other CGI applications.

The default values of all parameters correspond to this Apache configuration:

<Files gzip_cnc.pl>
 SetEnv GZIP_CNC_QUALITY      9
 SetEnv GZIP_CNC_PROGRAM      /usr/bin/gzip
 SetEnv GZIP_CNC_CACHE        ""
 SetEnv GZIP_CNC_LOGFILE      ""
 SetEnv GZIP_CNC_404_HANDLER  ""
 SetEnv GZIP_CNC_MIMETYPE     text/html
 SetEnv GZIP_CNC_OWNHEADERS   1
 SetEnv GZIP_CNC_SELFTEST     1
 SetEnv GZIP_CNC_EXPIRES      86400
</Files>

These directives may have to be adapted to the requirements of your installation.

Installation of the program as CGI script

gzip_cnc has to be installed as CGI script in your web space.

How this is to be done depends on the characteristics of your web space in so many ways that a detailled description isn't possible at this point.

Therefore only some hints can be given:

Self test of the CGI script

As soon as the CGI script has been installed successfully it may be accessed by the browser via the corresponding URL.

The script will automatically detect that it has not been activated as Apache handler and thus has nothing to compress. In this case the script performs a self test instead - it

and displays the results of these tests to the caller as a web page.

If anything has not been configured correctly there is a good chance the self test function will note it - at a point in time where this webspace's normal operation hasn't been affected yet.

The self test function will cause a message output into the log file (and thus possibly the implicit creation of this file during the first request).

gzip_cnc activation

Embedding the handler

Finally the CGI script has to be embedded into the Apache configuration as handler and be connected with the files to be handled.

This is usually done by a .htaccess file. The directive

  Action text/html /cgi-bin/gzip_cnc.pl

connects the CGI script /cgi-bin/gzip_cnc.pl with all files of the MIME type text/html (according to the remaining Apache configuration - which files this may ever be).

If this connection is to be established for files matching some distinct name pattern only, like *.html, then the embedding of the handler may be formulated conditionally using the syntax

<Files *.html>
  Action text/html /cgi-bin/gzip_cnc.pl
</Files>

For more than one name pattern, like *.htm and *.html, an embedding of the type

<Files ~ \.html?$>
  Action text/html /cgi-bin/gzip_cnc.pl
</Files>

might be used - just like many other forms of embedding described in the Apache documentation in depth.

Coexistance of gzip_cnc and Server Side Includes

It is explicitly pointed out that you must not be serve in compressed form files whose content is to be included into other documents via SSI. The SSI handler would insert the compressed content into an otherwise uncompressed page and serve the result to the browser which would not be able to understand these data.

Thus it is recommended to use a separate name extension for these files (that contain only parts of HTML documents anyway). gzip_cnc is not able to find out whether a normal request or an Apache internal subrequest is to be handled.
(mod_gzip, being an Apache module, is able to tell these request types from each other and only compresses responses to 'normal' requests.)

Testing the installation

If anything hasn't worked during installation but the CGI script has been activated for the whole webspace by binding all documents to this handler then in the worst case not a single static HTML page can be served any more until the error has been corrected or this binding has been disabled again (by removing the corresponding entry from the .htaccess file - changes in these files come into operation immediately, they don't require a web server restart).

Because of this it is recommended to firstly build up a test directory with a handful of documents, to create the .htaccess file there as described and to test what happens:

  1. Do the pages still render correctly in the browser?
  2. Is the CGI script indeed being used as handler by the Apache web server, i. e.:
    1. Did it create files within the cache directory?
    2. Did it write messages into the gzip_cnc log file?
    3. Did the size of the served documents shrink a lot as expected?
      (Some browsers display this size during the transfer; a glimpse into the Apache log file access_log may reveal this as well.)
    4. Are now actually gzip compressed data being served to the browser?
      (This can be checked by a glimpse to HTTP headers and data of the web server's response by the use of some appropriate program - just like the serving of the additional HTTP headers of gzip_cnc.)

Only after everything has worked as expected the content of this .htaccess file should be transferred into the root directory of the file tree to be handled. From this moment on the handler will take action for the whole directory tree (and populate the cache directory with subdirectories and files little by little).

Because of the experience with the version 1.10 I explicitly suggest to doublecheck the detection of illegal accesses.

(Michael Schröpl, 2004-01-11)