Installation and configuration of gzip_cnc
Configuration
gzip_cnc's behaviour is specified by a number of parameters. With a little luck the predefined default values may already suffice to run the program, but more likely these values have to be adapted to the realities of your web space.
For this adaption two methods are available:
- Modifying the program. At the start of the gzip_cnc source code there are a couple of variable definitions that contain the corresponding parameter values during runtime of the program. Each variable definition contains the assignment of its default value that may be adapted by modifying the corresponding line of code to your own requirements.
- Adjustment via Environment variables. Each parameter value specified inside the source code may be overridden by setting an Environment variable in the web server configuration; thus the program's behaviour can be adapted to your own requirements without changing the program code.
The second method has the advantage that your don't have to repeat previous modifications when switching to some future program version; it may also be more comfortable for some users not having to apply changes to some foreign program source code (running the risk of inserting errors by mistake). But configuration via Environment variables requires additional entitlements that cannot be taken for granted in each web server configuration; it may well be that this is no available option for some users.
The following sections of this document describe
- each parameter and its effect,
- the name of the corresponding program variable as well as
- the name of the corresponding Environment variable.
At the end of the chapter you will find a description how a configuration using Environment variables should look like and which requirements have to be met for it.
Specifying the compression level
my $gzip_quality = 9;
This parameter specifies the compression level to be used, thus selecting the compression quality.
The gzip
algorithm (not matter by which tool it has been implemented) allows specifying a value from 0
to 9
whereas
0
means the least compression effect but the shortest processing time and9
means the best compression effect but the longest processing time.
As each file content will be compressed only once by gzip_cnc and subsequently just taken from its cache tree we should be able to afford the best compression level available here.
(Compare this to mod_gzip which is happy with level 6 to keep the resulting CPU load for the server within certain limits.)
Environment variable: GZIP_CNC_QUALITY
.
Specifying the path for the compression program gzip
If the module Compress::Zlib
is installed on the server then this part of the configuration has no effect - gzip_cnc tries to use this Perl module with priority and only as a substitute will it fall back to using an external program.
(gzip_cnc will inform the user during its self test function about the compressing tool selected on this server.)
But if indeed a system command (for UNIX) rsp. separate program (for Windows) is required for compression then gzip_cnc does not rely on the Apache web server being configured to supply a sufficient list of program directories via the PATH
Environment variable but it insists on directly accessing this compression program via its fully qualified path name.
my $gzip_path = '/usr/bin/gzip';
As these path names are reasonably standardized in UNIX systems one can be hopeful that the preselected value might work without change. If gzip_cnc would refuse to compress anything but write corresponding messages into its log file instead then the required program may well be missing (under the path name specified).
gzip_cnc has preselected a path that looks reasonable for a gzip
command on UNIX servers but cannot guarantee that your server has been installed suitably. If you have dialog access (telnet, ssh) to the server for your web space you may open a shell (of your choice) there and check under which path name this shell would locate the command gzip
, by executing the command which gzip
.
Under Windows the use of the popular ActivePerl Perl interpreter is recommended which already ships the Compress::Zlib
module. But even under windows you may use a separate program (gzip.exe
) for compressing. (In fact gzip_cnc has been developed and tested this way on a Windows platform.)
Environment variable: GZIP_CNC_PROGRAM
.
Specifying the root directory for the cache tree
my $cache_directory = '';
This parameter specifies the fully qualified path name of the directory inside of which gzip_cnc will build up its cache directory tree for compressed versions of all files having been requested.
If an empty string has been specified here then gzip_cnc will automatically use the directory .gzip_cnc_cache
inside the DOCUMENT_ROOT
of the domain in question. In this case the cache files reside within the URL tree as well and therefore are directly accessible via URL but can be handled correctly only in case of a proper configuration of the web server and furthermore only by sufficiently capable browsers. Therefore it is recommended to create the cache directory outside the URL space.
gzip_cnc itself will create all necessary directories for the cache on demand. If this fails (e. g. in case of lacking write access) then gzip_cnc will serve the requested data in uncompressed form and notes this fact inside its log file using a corresponding status code.
Environment variable: GZIP_CNC_CACHE
.
Specifying gzip_cnc's own log file
my $logfile_path = '';
This parameter specifies the fully qualified path name of the log file to be created by gzip_cnc.
If an empty string has been specified here then gzip_cnc will not create a log file.
If a path name has been specified then gzip_cnc attempts to create a log file there (and all directories required for this path name as well if necessary) rsp. extend an already existing file; if this fails (e. g. in case of lacking write access) then no log messages will be created (without any explicit warning for the user).
Environment variable: GZIP_CNC_LOGFILE
.
Specifying the error document to be used
my $error404_handler = '';
This parameter specifies the error handling page to be used by gzip_cnc in case of accessing a non-existant file.
An Apache handler is activated when a large part of URL access processing is completed (e. g. at this moment URL translations of all kind are already performed: URL rewriting, directory defaults, Content Negotiation, ...). But by no means the Apache server has checked whether the actually requested file exists at all. So this test has to be performed by gzip_cnc itself.
If in fact the requested file doesn't exist then normally the Apache server would evaluate an Administrator's error handling page definition and serve its content to the client. But note that gzip_cnc isn't an Apache module and cannot invoke an internal HTTP subrequest in case your error document would be a CGI program or a SSI document or maybe selected dynamically via Content Negotiation - and it cannot provide the information about which page has been originally requested the same way (as Environment variable) as Apache itself would do. "Houston, we have a problem."
gzip_cnc cannot give you all you might want - but it lets you choose what is most important for you:
- If it is most important for you to return a HTTP status code 404 in case of a missing page then you cannot use a dynamic error document, only a static one. In this case you specify the fully qualified path name of the file to be served as error document as value for the
$error404_handler
parameter. gzip_cnc will then open this file, read the content and serve it together with the appropriate HTTP headers for the event. - If it is most important for you to get control over the error event, i. e. get your dynamic handler invoked, then you cannot send the HTTP status code 404 to the client. In this case you specify the fully qualified URL of the page to be requested via HTTP as error document (starting with
http://
) as value for the$error404_handler
parameter. gzip_cnc will then send a HTTP 302 response causing a HTTP redirection to your error document, and even supply the information about the URL that has been accessed in the original request, by assigning the value of the URL to the CGI parameterurl=
and append both to the query string of the error document. Thus your error document might parse the query string and specifically react upon the event. - If all of the above doesn't matter to you, then just leave this parameter value empty. gzip_cnc will then serve its own little error document (hard-coded into the Perl script) which at least tells the user which document has been accessed but not found. This is the least comfortable alternative for your visitors but might still suffice.
Environment variable: GZIP_CNC_404_HANDLER
.
Specifying the MIME type to be served
my $mime_type = 'text/html';
This parameter specifies the MIME type to be sent by gzip_cnc as HTTP header for the responses being served.
Being an Apache handler, gzip_cnc has no access to the Apache configuration that contains a mapping of the requested file's name to a MIME type. Thus an appropriate value has to defined by the program itself.
For future program versions one could imagine to provide some table mapping file name patterns to MIME types. But currently the corresponding effort doesn't seem justifyable:
- One one hand there are only few MIME types other than
text/html
whose serving in compressed form- would be possible without risk (CSS and JavaScript are critical due to browser bugs)
- and appear promising as well (many graphics and multimedia formats are already stored in compressed form)
- on the other hand it isn't really a problem for the user to install separate gzip_cnc copies for each MIME type, adapt this parameter value accordingly within each copy and specify several mappings of file names to handler instance via Apache configuration.
Environment variable: GZIP_CNC_MIMETYPE
.
Sending gzip_cnc's own HTTP headers
my $send_own_headers = 1;
This parameter selects whether gzip_cnc is entitled to send own HTTP headers additionally to the other informations. These HTTP headers are:
X-Gzipcnc-Original-File-Size
specifies the original size of the document being served.X-Gzipcnc-Version
specifies the version number of the gzip_cnc script in use.X-Gzipcnc-Path-Translated
specifies the resulting file name after all translations performed by the Apache server before invoking gzip_cnc.X-Gzipcnc-Path-Info
specifies the URL of the original request.
If this value has been set to 0
then these two HTTP headers are not sent. And as both Path
headers might be helpful for understanding the access procedure but would be telling about your server's configuration these are being sent only if gzip_cnc has its self test mode enabled as well.
Environment variable: GZIP_CNC_OWNHEADERS
.
Activating the self test mode
my $enable_self_test_mode = 1;
This parameter selects whether the self test mode of gzip_cnc is available.
During the installation and testing phase it is reasonable to use this feature to get hints about the validity of the configured parameter values. But the output in self test mode reveals actual absolute path names on the server and therefore may be deactivated if advisable due to security considerations.
Environment variable: GZIP_CNC_SELFTEST
.
Submitting a browser cache validity interval
my $cache_expire_seconds = 86400;
This parameter specifies how many seconds a browser is allowed to keep the content of the served page in its cache without testing the validity of the content by check-back at the HTTP server.
Actually browser caching is not a part of gzip_cnc's responsibility. On the other hand no HTTP data transfer can be compressed more effectively than one that hasn't even been requested by the browser at all. And especially for small and frequently used pages (like navigation elements in case of using framesets) it may well help a lot if the browser has been configured as to trust the validity interval transmitted by the HTTP headers
Cache-Control: public,max-age=
<seconds> (for HTTP/1.1 clients) andExpires:
<date specification> (for HTTP/1.0 clients)
instead of requesting the same page from the server again and again.
On the other hand, if a very large value has been specified here and URLs did change then files no longer in use have to be kept available at least for the defined validity period, as the browser caches may still contain references to these files.
But if subsequent changes can be ruled out for the content of existing documents then even a very large value may be reasonable for this parameter.
If this value has been set to 0
then these two HTTP headers are not sent.
Environment variable: GZIP_CNC_EXPIRES
.
Configuration via Environment variables
Instead of changing the program's source code the parameter values may as well be set via Environment variables in the web server configuration.
For this the following prerequisites have to be granted:
- The Apache module
mod_env
must be available. (This is very likely as it is a standard module.) - The feature of setting Environment variables inside of
.htaccess
files must be available - which means that the Apache server must be version 1.3.7 (from 1999) at least. - The web server configuration must allow setting Environment variables within a
.htaccess
file. (This requires theAllowOverride FileInfo
directive in the web server configuration which is already prerequisite for embedding gzip_cnc as handler - without this one the program cannot even be used.)
For setting an Environment variable the Apache directive SetEnv
is available.
It is permissible to set these Environment variables in the same .htaccess
file that implements the gzip_cnc activation as well. As an alternative it would suffice to specify these Environment variables only for the URL of the gzip_cnc program; this would reduce the risk to undesirably influence the behaviour of other CGI applications.
The default values of all parameters correspond to this Apache configuration:
<Files gzip_cnc.pl> SetEnv GZIP_CNC_QUALITY 9 SetEnv GZIP_CNC_PROGRAM /usr/bin/gzip SetEnv GZIP_CNC_CACHE "" SetEnv GZIP_CNC_LOGFILE "" SetEnv GZIP_CNC_404_HANDLER "" SetEnv GZIP_CNC_MIMETYPE text/html SetEnv GZIP_CNC_OWNHEADERS 1 SetEnv GZIP_CNC_SELFTEST 1 SetEnv GZIP_CNC_EXPIRES 86400 </Files>
These directives may have to be adapted to the requirements of your installation.
Installation of the program as CGI script
gzip_cnc has to be installed as CGI script in your web space.
How this is to be done depends on the characteristics of your web space in so many ways that a detailled description isn't possible at this point.
Therefore only some hints can be given:
- The path to the Perl interpreter
#!/usr/bin/perl
in line 1 of thegzip_cnc.pl
script may have to be customized. - In case of an installation via FTP the script file should be transferred to the server in ASCII mode (Perl interpreters on UNIX don't like DOS/Windows line breaks).
- The script file may be renamed at will if
- this were necessary for the CGI installation (some web server configurations require specific name extensions etc.) or
- you are afraid of serving informations to visitors about the server's directory structure they are not entitled to (using the self testing mode of the script).
- Some providers configure their FTP server such that the FTP client will show relative path names only (related to the root directory of the userid); in this case it isn't easy to find out the correct absolute path names for cache root directory and log file. In this case the self test mode of gzip_cnc may provide helpful additional information.
- Some providers configure their web server such that CGI scripts are executed under the web server's own userid instead of using the userid that owns their file. This userid must have write access to the cache root directory and the log file; therefore it may be necessary to set both objects on the server writeable for all users (
chmod 777
).
Self test of the CGI script
As soon as the CGI script has been installed successfully it may be accessed by the browser via the corresponding URL.
The script will automatically detect that it has not been activated as Apache handler and thus has nothing to compress. In this case the script performs a self test instead - it
- shows the complete path name of
- its own script file and
- the document root directory of this domain
- display all specified parameter values and
- checks the availability of
- the
gzip
program, - the Perl module
Compress::Zlib
as well as - the cache root directory
- the
and displays the results of these tests to the caller as a web page.
If anything has not been configured correctly there is a good chance the self test function will note it - at a point in time where this webspace's normal operation hasn't been affected yet.
The self test function will cause a message output into the log file (and thus possibly the implicit creation of this file during the first request).
gzip_cnc activation
Embedding the handler
Finally the CGI script has to be embedded into the Apache configuration as handler and be connected with the files to be handled.
This is usually done by a .htaccess
file. The directive
Action text/html /cgi-bin/gzip_cnc.pl
connects the CGI script /cgi-bin/gzip_cnc.pl
with all files of the MIME type text/html
(according to the remaining Apache configuration - which files this may ever be).
If this connection is to be established for files matching some distinct name pattern only, like *.html
, then the embedding of the handler may be formulated conditionally using the syntax
<Files *.html>
Action text/html /cgi-bin/gzip_cnc.pl
</Files>
For more than one name pattern, like *.htm
and *.html
, an embedding of the type
<Files ~ \.html?$>
Action text/html /cgi-bin/gzip_cnc.pl
</Files>
might be used - just like many other forms of embedding described in the Apache documentation in depth.
Coexistance of gzip_cnc and Server Side Includes
It is explicitly pointed out that you must not be serve in compressed form files whose content is to be included into other documents via SSI. The SSI handler would insert the compressed content into an otherwise uncompressed page and serve the result to the browser which would not be able to understand these data.
Thus it is recommended to use a separate name extension for these files (that contain only parts of HTML documents anyway). gzip_cnc is not able to find out whether a normal request or an Apache internal subrequest is to be handled.
(mod_gzip, being an Apache module, is able to tell these request types from each other and only compresses responses to 'normal' requests.)
Testing the installation
If anything hasn't worked during installation but the CGI script has been activated for the whole webspace by binding all documents to this handler then in the worst case not a single static HTML page can be served any more until the error has been corrected or this binding has been disabled again (by removing the corresponding entry from the .htaccess
file - changes in these files come into operation immediately, they don't require a web server restart).
Because of this it is recommended to firstly build up a test directory with a handful of documents, to create the .htaccess
file there as described and to test what happens:
- Do the pages still render correctly in the browser?
- Is the CGI script indeed being used as handler by the Apache web server, i. e.:
- Did it create files within the cache directory?
- Did it write messages into the gzip_cnc log file?
- Did the size of the served documents shrink a lot as expected?
(Some browsers display this size during the transfer; a glimpse into the Apache log fileaccess_log
may reveal this as well.) - Are now actually gzip compressed data being served to the browser?
(This can be checked by a glimpse to HTTP headers and data of the web server's response by the use of some appropriate program - just like the serving of the additional HTTP headers of gzip_cnc.)
Only after everything has worked as expected the content of this .htaccess
file should be transferred into the root directory of the file tree to be handled. From this moment on the handler will take action for the whole directory tree (and populate the cache directory with subdirectories and files little by little).
Because of the experience with the version 1.10 I explicitly suggest to doublecheck the detection of illegal accesses.
(Michael Schröpl, 2004-01-11)